[原]工欲善其事,必先利其器,记一次处理Oracle Listener挂掉的处理过程

      国庆节凌晨0:36收到邮件报网通Tomcat服务器挂掉,Tomcat总是那么脆弱,重启是家常便饭了,但这次和往常不一样,WEB服务始终没有起来,马上重启整台服务器再启动Tomcat,故障依旧。

      马上查看Tomcat日志:

2010-10-1 1:10:42 com.fsm.util.SearchDispatchFilter init
信息: SearchDispatchFilter.init() done
***********************创建数据源********************
createDataSource:DriverClassName=oracle.jdbc.driver.OracleDriver
createDataSource:Url=jdbc:oracle:thin:@db_server:1521/ora8i
***************************************************

      过了3分钟,才出现久违“Server startup”:

[FSM-ERROR]:2010-10-01 01:13:55,394-org.hibernate.util.JDBCExceptionReporter.logExceptions(LINE:78)
Cannot create PoolableConnectionFactory (Io 异常: The Network Adapter could not establish the connection)
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter INFO: loaded (conf ok)
2010-10-1 1:13:58 org.apache.coyote.http11.Http11BaseProtocol start
信息: Starting Coyote HTTP/1.1 on http-80
2010-10-1 1:13:58 org.apache.jk.common.ChannelSocket init
信息: JK: ajp13 listening on /0.0.0.0:8009
2010-10-1 1:13:58 org.apache.jk.server.JkMain start
信息: Jk running ID=0 time=0/84  config=null
2010-10-1 1:13:58 org.apache.catalina.storeconfig.StoreLoader load
信息: Find registry server-registry.xml at classpath resource
2010-10-1 1:13:58 org.apache.catalina.startup.Catalina start
信息: Server startup in 205603 ms

      通常Tomcat也就是启动20来秒,这次却启动了200多秒,不太正常,再往上看出现了“JDBC”和“The Network Adapter could not establish the connection”,99%就是与数据库建立连接时候出现了问题,马上telnet数据库的1521端口,可连接是通的:

[root@primary_node bin]# telnet db_server 1521
Trying 192.168.4.20...
Connected to db_server (192.168.4.20).
Escape character is '^]'.
aaaaa
aaa
aaaaa
aaaaaa
aaaaa

      上面那堆'”a”是我输入的,检查连接是否真的通了。

      于是将矛头指向配置文件和运行所需的jar包上,也没有发现最近被修改过的痕迹,将另一台服务器的配置文件和jar包copy过来,重启Tomcat,故障依旧。此时已是2:11分,此时该台网通Tomcat服务器的VIP早就被电信服务器抢过去了,服务没有受到影响,此时精神上有点吃不消了,赶紧睡觉去,获取周公能给我灵感。

 

      早上7:25分醒来,第一件事睁开眼睛,第二件事登录到服务器上继续调试问题,首先将数据库IP指向测试数据库,启动Tomcat,WEB服务跑起来了,这个现象再次让我把矛头指向Oracle服务器,由于服务器上没有安装Oracle的客户端只能通过telnet这种原始的方式进行测试,telnet测试的Oracle服务器:

[root@primary_node bin]# telnet test_db_server 1521
Trying 192.168.4.74...
Connected to db_server (192.168.4.74).
Escape character is '^]'.
aaaaa
aaa
Connection closed by foreign host.

      奇怪了,测试用的Oracle服务器会主动close连接,而生产用的Oracle服务器不会主动close连接,终于找到突破口了。 登录到Oracle服务器,查看以下listener的状态:

[oracle@wz_oracle1 adump]$ lsnrctl status 

LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:08:09

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521)))

       Hang住,取消,直接  lsnrctl stop ,还是Hang住,出必杀 killall –9 tnslsnr,再启动:

[oracle@wz_oracle1 adump]$ lsnrctl start 

LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:10:17

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Starting /u01/app/oracle/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 10.2.0.2.0 - Production
System parameter file is /u01/app/oracle/network/admin/listener.ora
Log messages written to /u01/app/oracle/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.2.0 - Production
Start Date                01-OCT-2010 09:10:17
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "ora8i" has 1 instance(s).
  Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service...
Service "shdb" has 1 instance(s).
  Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

      搞定,重启Tomcat,一切恢复正常。

 

      这次故障处理时间长达3小时,最开始以为是一个普通的故障,重启一下Tomcat就能完事,但没那么简单,然后被 telnet 的现象所迷惑,走入了一条长长的弯路,“工欲善其事,必先利其器”如果当时找一台服务器用sqlplus 远程连一下数据库就不用折腾那么久了。

posted @ 2010-10-01 14:58  killkill  阅读(1971)  评论(0编辑  收藏  举报