DNS设置引起的登录延迟
一套Linux上的11.1.0.7系统,操作系统管理人员最近对该服务器上的网络配置文件/etc/nsswitch.conf进行了调整,调整前其主机名解析选项为"hosts:files dns" ,调整后被修改成了"hosts:files [NOTFOUND=continue] dns";此后应用人员尝试在该主机上使用
"sqlplus username/password@connect_string"远程登录数据库都会出现多达五分钟的延迟,使用lsnrctl status命令查看监听器状态,发现LISTENER一切正常;初步可以判断是dns解析导致了长时间的延迟。
针对以上问题,首先想到的是设置client端Oracle network trace以了解造成延迟的具体原因,在$ORACLE_HOME/network/admin/sqlnet.ora配置文件中加入以下记录:
TRACE_LEVEL_CLIENT = 16 TRACE_FILE_CLIENT = client TRACE_DIRECTORY_CLIENT = [any valid directory path] TRACE_TIMESTAMP_CLIENT = ON DIAG_ADR_ENABLED=off之后再次尝试登录就会触发Oracle Network Trace文件在$TRACE_DIRECTORY_CLIENT指定的目录下产生(如果DIAG_ADR_ENABLED未设置为false,那么11g下TRACE_DIRECTORY_CLIENT并不生效,而会产生在11g特有的diag目录下)。 登录测试产生的trace文件记录:
[02-SEP-2010 07:36:57:719] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=m218279apss2012-vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MOTOIDP.MOT.COM)(INSTANCE_NAME=MOTOIDP1)(CID=(PROGRAM=sqlplus)(HOST=m218279apss2012.mot.com)(USER=oraoid)))) [02-SEP-2010 07:36:57:719] nttbnd2addr: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: getaddrinfo() failed with error -2 [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit [02-SEP-2010 07:36:57:719] nttbnd2addr: looking up IP addr for host: m218279apss2012-vip [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit [02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: exit [02-SEP-2010 07:36:57:719] nttbnd2addr: exit .................. [02-SEP-2010 07:36:57:722] nspsend: packet dump [02-SEP-2010 07:36:57:722] nspsend: 01 1E 00 00 01 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 01 3A 01 2C 0C 41 20 00 |.:.,.A..| [02-SEP-2010 07:36:57:722] nspsend: 7F FF 7F 08 00 00 01 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 E4 00 3A 00 00 02 00 |...:....| [02-SEP-2010 07:36:57:722] nspsend: 41 41 00 00 00 00 00 00 |AA......| [02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 00 28 44 45 53 43 52 |..(DESCR| [02-SEP-2010 07:36:57:722] nspsend: 49 50 54 49 4F 4E 3D 28 |IPTION=(| [02-SEP-2010 07:36:57:722] nspsend: 41 44 44 52 45 53 53 3D |ADDRESS=| [02-SEP-2010 07:36:57:722] nspsend: 28 50 52 4F 54 4F 43 4F |(PROTOCO| [02-SEP-2010 07:36:57:722] nspsend: 4C 3D 54 43 50 29 28 48 |L=TCP)(H| [02-SEP-2010 07:36:57:722] nspsend: 4F 53 54 3D 6D 32 31 38 |OST=m218| [02-SEP-2010 07:36:57:722] nspsend: 32 37 39 61 70 73 73 32 |279apss2| [02-SEP-2010 07:36:57:722] nspsend: 30 31 32 2D 76 69 70 29 |012-vip)| [02-SEP-2010 07:36:57:722] nspsend: 28 50 4F 52 54 3D 31 35 |(PORT=15| [02-SEP-2010 07:36:57:722] nspsend: 32 31 29 29 28 43 4F 4E |21))(CON| [02-SEP-2010 07:36:57:722] nspsend: 4E 45 43 54 5F 44 41 54 |NECT_DAT| [02-SEP-2010 07:36:57:722] nspsend: 41 3D 28 53 45 52 56 45 |A=(SERVE| [02-SEP-2010 07:36:57:722] nspsend: 52 3D 44 45 44 49 43 41 |R=DEDICA| [02-SEP-2010 07:36:57:722] nspsend: 54 45 44 29 28 53 45 52 |TED)(SER| [02-SEP-2010 07:36:57:722] nspsend: 56 49 43 45 5F 4E 41 4D |VICE_NAM| [02-SEP-2010 07:36:57:722] nspsend: 45 3D 4D 4F 54 4F 49 44 |E=MOTOID| [02-SEP-2010 07:36:57:722] nspsend: 50 2E 4D 4F 54 2E 43 4F |P.MOT.CO| [02-SEP-2010 07:36:57:722] nspsend: 4D 29 28 49 4E 53 54 41 |M)(INSTA| [02-SEP-2010 07:36:57:722] nspsend: 4E 43 45 5F 4E 41 4D 45 |NCE_NAME| [02-SEP-2010 07:36:57:722] nspsend: 3D 4D 4F 54 4F 49 44 50 |=MOTOIDP| [02-SEP-2010 07:36:57:722] nspsend: 31 29 28 43 49 44 3D 28 |1)(CID=(| [02-SEP-2010 07:36:57:722] nspsend: 50 52 4F 47 52 41 4D 3D |PROGRAM=| [02-SEP-2010 07:36:57:722] nspsend: 73 71 6C 70 6C 75 73 29 |sqlplus)| [02-SEP-2010 07:36:57:722] nspsend: 28 48 4F 53 54 3D 6D 32 |(HOST=m2| [02-SEP-2010 07:36:57:722] nspsend: 31 38 32 37 39 61 70 73 |18279aps| [02-SEP-2010 07:36:57:722] nspsend: 73 32 30 31 32 2E 6D 6F |s2012.mo| [02-SEP-2010 07:36:57:722] nspsend: 74 2E 63 6F 6D 29 28 55 |t.com)(U| [02-SEP-2010 07:36:57:722] nspsend: 53 45 52 3D 6F 72 61 6F |SER=orao| [02-SEP-2010 07:36:57:722] nspsend: 69 64 29 29 29 29 |id)))) | [02-SEP-2010 07:36:57:722] nspsend: 286 bytes to transport [02-SEP-2010 07:36:57:722] nspsend: normal exit [02-SEP-2010 07:36:57:722] nscon: exit (0) [02-SEP-2010 07:36:57:722] nsdo: nsctxrnk=0 [02-SEP-2010 07:36:57:722] nsdo: normal exit [02-SEP-2010 07:36:57:722] nsdo: entry [02-SEP-2010 07:36:57:722] nsdo: cid=0, opcode=68, *bl=512, *what=9, uflgs=0x0, cflgs=0x3 [02-SEP-2010 07:36:57:722] nsdo: rank=64, nsctxrnk=0 [02-SEP-2010 07:36:57:722] nsdo: nsctx: state=2, flg=0x4005, mvd=0 [02-SEP-2010 07:36:57:722] nsdo: gtn=10, gtc=10, ptn=10, ptc=8155 [02-SEP-2010 07:36:57:722] nscon: entry [02-SEP-2010 07:36:57:722] nscon: recving a packet [02-SEP-2010 07:36:57:722] nsprecv: entry [02-SEP-2010 07:36:57:722] nsprecv: reading from transport... [02-SEP-2010 07:36:57:722] nttrd: entry [02-SEP-2010 07:41:57:741] nttrd: socket 9 had bytes read=8 [02-SEP-2010 07:41:57:741] nttrd: exit [02-SEP-2010 07:41:57:741] nsprecv: 8 bytes from transport [02-SEP-2010 07:41:57:741] nsprecv: tlen=8, plen=8, type=11 [02-SEP-2010 07:41:57:741] nsprecv: packet dump [02-SEP-2010 07:41:57:741] nsprecv: 00 08 00 00 0B 00 00 00 |........| [02-SEP-2010 07:41:57:741] nsprecv: normal exit可以看到以上出现了"snlinGetAddrInfo: getaddrinfo() failed with error -2"的记录,通过关键词"snlinGetAddrInfo"在MOS上搜索可以找到以下Note:
Remote Connections Take Very Long to Establish这个文档指出snlinGetAddrInfo函数用以主机名到ip地址的映射,并建议使用在/etc/nsswitch.conf网络配置文件中设置过程为"hosts: files [NOTFOUND=continue] dns",而我们恰恰正是这样做的!另一个文档[ID 803838.1]指出在其他UNIX平台上,可以使用"ipnodes: files [NOTFOUND=continue] dns"方式避免延迟问题,但Linux平台上并没有ipnodes方式。 11g中对hosts的解析实在变得有些西斯底里,这个case通过在移除nsswitch.conf中hosts的dns选项最后解决了,对于Oracle使用最简单的文件解析方式似乎仍是最稳妥的办法,不管版本有多新。Applies to:
Oracle Net Services - Version: 11.1.0.6 to 11.1.0.7 - Release: 11.1 to 11.1 Information in this document applies to any platform.Symptoms
When you connect remotely, it takes very long until the connection is established. Once connected, everything works fine. Local bequeath connections work in a timely manner. Resolution of hostnames is done via a centralized DNS server.Cause
Listener traces show the listener is wasting time while calling the "snlinGetAddrInfo" function :2009-07-08 13:58:35.135311 : nttcnp:exit 2009-07-08 13:58:35.135327 : snlinGetAddrInfo:entry 2009-07-08 13:58:55.135643 : snlinGetAddrInfo:getaddrinfo() failed with error -5 2009-07-08 13:58:55.135703 : snlinGetAddrInfo:exit
Solution
We need to make sure that the DNS server(s) configured are reachable. The "snlinGetAddrInfo" function is a TCP layer function which Oracle uses (since version 11g) for hostname-to-IP mappings. When the DNS server is unreachable, the listener will wait for some time until it will time out and fail over to the next method configured for resolution of hostnames (usually the local "hosts" file).Note : On UNIX systems, the order of methods used for resolution of hostnames can be specified via the /etc/nsswitch.conf and /etc/host.conf files :
on Linux systems, you can specify the order by setting "hosts" to "files" and / or "dns"Example for /etc/nsswitch.conf : hosts: files [NOTFOUND=continue] dns
on HP-UX and Solaris systems, you need to specify the order using the "ipnodes" keywordExample for /etc/nsswitch.conf : ipnodes: files [NOTFOUND=continue] dnsNote : Since 11.2, this behaviour has changed and naming lookup is no longer performed at this stage (the changes was addressed through unpublished Bug 9593134).
posted on 2010-09-06 12:09 Oracle和MySQL 阅读(650) 评论(0) 编辑 收藏 举报