案例:WLC HA主WLC进入维护模式
案例场景:
如图所示,7609-1和7609-2分别是网络中的核心设备,起了HSRP,7609-1连接的是WLC-1,,7609-2连接的是WLC-2,WLC1和WLC2的RP口相互连接。
WLC的管理地址为192.168.53.1/24,而RMI地址分别为192.168.53.3和192.168.53.4.
关键知识:RMI和RP
Redundancy Management Interface
The IP address on this interface should be configured in the same subnet as the management interface. This interface will check the health of the Active WLC via network infrastructure once the Active WLC does not respond to Keepalive messages on the Redundant Port. This provides an additional health check of the network and Active WLC, and confirms if switchover should or should not be executed. Also, the Standby WLC uses this interface in order to source ICMP ping packets to check gateway reachability. This interface is also used in order to send notifications from the Active WLC to the Standby WLC in the event of Box failure or Manual Reset. The Standby WLC will use this interface in order to communicate to Syslog, the NTP server, and the TFTP server for any configuration upload.
Redundancy Port
This interface has a very important role in the new HA architecture. Bulk configuration during boot up and incremental configuration are synced from the Active WLC to the Standby WLC using the Redundant Port. WLCs in a HA setup will use this port to perform HA role negotiation. The Redundancy Port is also used in order to check peer reachability sending UDP keep-alive messages every 100 msec (default timer) from the Standby WLC to the Active WLC. Also, in the event of a box failure, the Active WLC will send notification to the Standby WLC via the Redundant Port. If the NTP server is not configured, a manual time sync is performed from the Active WLC to the Standby WLC on the Redundant Port. This port in case of standalone controller and redundancy VLAN in case of WISM-2 will be assigned an auto generated IP Address where last 2 octets are picked from the last 2 octets of Redundancy Management Interface (the first 2 octets are always 169.254).
故障情况:
在7609上出现资源占用100%的情况(例如CPU),无线的业务流量受到了影响无法正常使用。管理流量应该也受到了影响,可能导致了WLC之间RMI通信可能出现了问题,从而导致主备之间的HA状态有所异常。可能发生了SSO切换。
为了缓解核心的问题,将7609设备重启,重启之后,两台连接WLC的板卡都down了,于是,主设备发现所有上行接口全部down掉,无法通过网关和备份设备的RMI接口通信,自己直接进入维护模式,且所有的接口处于管理down的状态,所以mgmt的ip不通,最后临时使用备机先维持无线网络,等待7609的板卡更换。
故障恢复:
7609的板卡更换完毕,下一步尝试恢复WLC HA。
在更换7609的时候尝试重启过主WLC(未连接任何线缆),最终还是进入了维护模式,因为所有的端口(port)都是down的。
恢复步骤:
1、尝试主WLC在独立状态下是否可以正常启动。关闭SSO(config redundancy mode disable),然后重启设备(reset system),设备重启后,发现可以正常进入。配置也在。
2、尝试恢复WLC HA。将备机断电,并恢复之前的连线,连接到7609-2。
3、在主WLC开启port(config port adminmode all enable),然后开启SSO(config redundancy mode sso)。
4、立即将主WLC连接7609-1的线缆恢复,之后立即开启备WLC的电源。
5、观察主备WLC的启动和协商过程。
6、检查WLC HA情况,检查AP join情况,业务情况等。
Other:在备用设备正常的情况下,可以直接将主WLC恢复连接,然后重新启动,这样主WLC应该会进入standb-hot的模式,后续在Active设备上输入 redundancy force-switchover 手动切换一下。