第一部分

配置管理ha

理解不同层面的高可用性

应用层

应用系统层

虚拟层面

物理层面

不同层面的高可用性技术

1.examples of high availability at the application layer include oracle real application clusters(rac).(应用层技术集群)

2.at the os layer,solutions include os clustering functionality,such as windows failover clustering(wfc) for windows server.(操作系统层面)

3.the virtualizaition layer offers a number of features for high availablity,including vsphere high avaliability(ha) and vsphere fault tolerance(ft).(vsphere ha ft)

4.high availability at the physical layer is achieved through redundant hardware-multiple network interface cards(nics) or host bus adapters(hbas),multiple storage area network(san) switches and fabrics,redundant power supplies,and so forth.(物理层面)

vsphere ha介绍

the vsphere ha feature is designed to provide an automatic restart of the vms that were running on an esxi host at the time it became unavailable.(一台esxi主机掉电,它的虚拟机会在另一台esxi主机重启,存在vm down机时间)

vsphere5 ha的增强特性

vsphere ha uses a new vmware-developed tool known as fault domain manager(fdm),which offers a couple of significant improvements over:(fdm技术提供的增强特性)

1.fdm uses a master/slave architecture that does not rely on primary/secondary host designations.

2.fdm uses both the management network and storage devices for communication.(同时使用管理网络和存储设备进行通讯)

3.fdm introduces support for ipv6

4.fdm addresses the issues of both network partition(网络分割) and network isolation(网络隔离).

master的作用

when vsphere ha is enabled,the vsphere ha agents participate in an election to pick a vsphere ha master.the vsphere ha master is responsible for a number of key tasks within a vsphere ha-enabled cluster;

1.the vsphere ha master monitors slave hosts and will restart vms in the event of a slave host failure.(master监控slave主机,当slave主机出现故障时重启虚拟机)

2.the vsphere ha master monitors the power state of all protected vms,if a protected vm fils,it will restart the vm.(master监控所有被包含巡检的电源状态,如果被包含的虚拟机出现故障,它将重启这个虚拟机)

3.the vsphere ha master manages the list of hosts that are members of the cluster and manages the process of adding and removing hosts from the cluster.(master管理在cluster内部的主机清单,并且对添加和删除cluster内部的主机进行管理)

4.the vsphere ha master manages the list of protected vms,it updates this list after each user-initiated power-on or power-off operation.these updates are at the request of vcenter server,which requests the master to protect or unprotect vms(master管理被包含虚拟机的清单,在每一次用户发起开关机操作时,更新这个清单,vcenter会要求master包含或者不包含某些虚拟机)

5.the vsphere ha master caches the cluster configuration.the master notifies and informs slave hosts of changes in the cluster configuration.(master缓存cluster配置,master通知和提醒slave主机,cluster配置的修改)

6.the vsphere ha master host sends heartbeat messages to the slave hosts so that the slave hosts know the master is alive.(master发送心跳信息给slave主机,让slave主机知道master的存在)

7.the vsphere ha master reports state information to vcenter server.vcenter server typically communicates only with the master.(master报告状态信息给vcenter,vcenter正常情况下只和master通讯)

slave的作用

1.a slave host watches the runtime state of the vms running locally on that host.significant changes in the runtime state of these vms are forwarded to the vsphere ha master.(slave主机监视本地运行的虚拟机状态,把这些虚拟机运行状态的显著变化发送给master.)

2.vsphere ha slaves monitor the health of the master.if the master fails,slaves will participate in a new master election.(slave监控master的健康状态,如果master出现故障,slave将会参与master的选举)

3.vsphere ha slave hosts implement vsphere ha features that don't require central coordination by the master.this includes vm health monitoring.(slave运用vsphere ha特性,这些特性不需要master的协调.这些特性包括"vm health monitoring"

查看master和slave状态

两种网络问题

vsphere ha使用管理网络和存储设备来联系,当master通过管理网络联系不到slave时,master能够检查heartbeat datastores,然后通过heartbeat datastores来检查slave是否存活----这个功能来帮助vsphere ha处理判断network partition和network isolation.

network partition:一个或多个slave通过网络联系不到master,即使它们的网络连接没有问题.这种情况下,vsphere ha能够使用heartbeat datastores来检测分离的主机(上面的slaves)是否存活以及是否包含它们里面的虚拟机.

network isolation:一个或多个slave丢失了所有的管理网络连接,这样的slave即不能联系到master也不能联系到其他esxi hosts.这种情况下,slave主机通过hearbeat datastores来通知master它已经是隔离状态,具体上这个slave是通过使用一个特殊的二进制件:host-x-poweron来通知,vsphere ha master能够采取适当的措施来确保保护vms.

vsphere ha对vm的包含

vsphere ha保护vms具体过程:

当一个slave已经检测到自己是网络隔离状态,它会生成一个特殊二进制文件host-x-poweron文件在heartbeat datastores上,master看到这个标志,它就知道了slave已经是isolation状态,然后master通过vsphere ha锁定其他文件(datastores上的其他文件).当slave主机看到这些文件已经被锁定,它知道master正在执行重启vms的相应.

然后slave才可以执行配置过的隔离相应动作.(如关机或者关闭电源)

vsphere ha有一定的局限性:

1.vsphere ha只能提供vm的fo,而不能提供service和application的fo.

2.vsphere ha不能提供快速的切换,因为启动vm的时间是未知的.

激活ha需要满足的条件

1.all hosts in a vsphere ha-enabled cluster must have access to the same shared storage locations used by all vms on the cluster.this includes any fibre channel,fcoe,iscsi,and nfs datastores used by vms.(能够访问相同的共享存储)

2.all hosts in a vsphere ha cluster should have an identical virtual networking configuration.if a new switch is added to one host,the same new switch should be added to all hosts in the cluster.if you are using a vsphere distributed switch(vds),all hosts should be participating in the same vds.(相同的虚拟网络配置,最好都是同一个分布式交换机的成员)

推荐使用冗余管理网络,如果不使用会报错误,需要在ha高级选项添加das.ignoreredundantnetwarning=true(注意使用中文的vsphere client不要在没有冗余网络告警)

重新配置ha

host monitoring介绍

deselecting enable host monitoring when performing network maintenance will prevent vsphere ha from unnecessarily triggering network isolation or network partition responses.

禁用host monitoring就不会发送任何心跳信息,仅仅只是在网络维护时禁用此选项,阻止不必要的network isolation和network partition相应.

admission control介绍

1.enable:disallow vm power-on operations that violate availability constraints.(当违反可用性约束条件时,禁止虚拟机的开机操作)

2.disable:allow vm power-on operations that violate availability constraints(当违反可用性约束条件时,允许虚拟机的开机操作)

admission controal policy介绍

1.the first option,host failures the cluster tolerates,allows you to secify how many host failures the cluster should be configured to withstand.

2.the second option,percentage of cluster resources reserved as failover spare capacity,allows you to specify a percentage of the cluster's total resources that should be used for spare capacity in the event of a failue.you can specify different percentages for cpu and memory.

3.the third option,specify failover hosts,allows you to specify one or more esxi hosts as failover hosts.these hosts are used as spare capacity,and in the event of a failure,vsphere ha will use these hosts to restart vms.

vm restart priority介绍

the vm restart priority options for vms in a vsphere ha-enabled cluster include low,medium,high,and disabled,for those vms that should be brought up first,the restart priority should be set to high.for those vms that should be brought up if resources are available,the restart priority can be set to medium or low.for those vms that will not be missed for a period of time and should not be brought online during the period of reduced resource availability,the restart priority should be set to disabled.you can define a default restart priority for the entire cluster as well as define a per-vm restart priority.

当虚拟机(vm)出现network isolation时,应该采取的行为leave powered on,power off,shut down.

vm monitoring介绍

the vmware tools provide a series of heartbeats from the guest os up to the esxi host on which that vm is running.by monitoring these heartbeats in conjunction with disk and network i/o activity,vsphere ha can attempt to determine if the guest os has failed.if there are no vmware tools hearbeats,no disk i/o,and no network i/o for a period of time,then vsphere ha-if vm monitoring is enabled-will restart the vm under the assumption that the guest os has failed.

vsphere ha also has application monitoring.this functionality requires third-party software to take advantage of apis built into vmware tools to provide application-specific heartbeats to vsphere ha.by leveraging these apis,third-party software developers can further extend the functionality of vsphere ha to protect against the failure of specific applications.