Why IPv6 gateway missing

Problem Symptom


In a native IPv6 network, all the servers receive IPv6 prefix from IPv6 router, and subsequently generate an Ipv6 address as well as adding a default IPv6 gateway pointing to the IPv6 router.

The problem here is gateway missing, whatever it was created by hand or generated automatically. 

 

Troubleshooting


1. Get TCPIP in-depth trace, 

netsh trace start tracefile=c:\ipv6trace.etl provider="Microsoft-Windows-TCPIP" keywords=0xffffffffffffffff level=0xff report=yes maxsize=4096 

And decode it, 

netsh trace convert c:\ipv6trace.etl c:\ipv6trace.txt

Log Sample, (Route Add/Delete logging)

[0]0000.0000::‎2017‎-‎10‎-‎24 00:20:57.535 [Microsoft-Windows-TCPIP]IP: Received router advertisement  on Interface = 14 from SourceIpAddress = fe80::d493:91b0:c6a2:7b68 for TargetIpAddress = ff02::1. 
[0]0000.0000::‎2017‎-‎10‎-‎24 00:20:57.535 [Microsoft-Windows-TCPIP]IP: Route 0xFFFFE000007525C0 created on interface 14. Protocol = IPv6, DestinationPrefix = 0.0.0.0 (Ignore IPv4 address), IPv6 address =  :: /0, Nexthop = 0.0.0.0 (Ignore IPv4 address), IPv6 address =  fe80::d493:91b0:c6a2:7b68. 
[0]0000.0000::‎2017‎-‎10‎-‎24 00:21:10.543 [Microsoft-Windows-TCPIP]IP: Received router advertisement  on Interface = 14 from SourceIpAddress = fe80::d493:91b0:c6a2:7b68 for TargetIpAddress = ff02::1. 
[0]0000.0000::‎2017‎-‎10‎-‎24 00:21:10.543 [Microsoft-Windows-TCPIP]IP: Route 0xFFFFE000007525C0 deleted on interface 14, Protocol = IPv6, DestinationPrefix = 0.0.0.0 (Ignore IPv4 address), IPv6 address =  :: /0, Nexthop = 0.0.0.0 (Ignore IPv4 address), IPv6 address =  fe80::d493:91b0:c6a2:7b68. 

2. Get a memory dump when gateway missing, 

From dump analysis, we can find some deleted routes from dump, and noticed it should be removed by function: DeleteUnicastRoute. Next, we need to trace the route delete behavior.

0: kd> dt -r1 ffffe00048c26000+0x2d8-0x88 _IPV6_UNICAST_ROUTE
+0x040 SitePrefixLength : 0 ''
+0x064 ValidLifetime : 0 <== deleted
+0x068 PreferredLifetime : 0

3. Create a private TCPIP.sys driver, trigger a BSOD when route entry get deleted, 

logman create trace "minio_netio" -p "Microsoft-Windows-TCPIP" 0x0000000000000020 0x5 -nb 400 400 -bs 1024 -mode BufferOnly -max 4096 -ets

 

Root Cause Analysis


Cause 1. Network device sent out RA with Router Life Time 0, and that trigger the gateway missing,

1: kd> kL
# Child-SP          RetAddr           Call Site
00 ffffd001`a2396698 fffff801`58100e1b nt!KeBugCheck
01 ffffd001`a23966a0 fffff801`58101913 tcpip!IppLogRouteChangeEvents+0x2a3
02 ffffd001`a2396830 fffff801`58101df4 tcpip!IppDeleteUnicastRoute+0x2f
03 ffffd001`a2396860 fffff801`58108b76 tcpip!IppDereferenceRouteForUser+0x80
04 ffffd001`a23968b0 fffff801`58108c74 tcpip!IppCommitSetAllRouteParameters+0x2b6
05 ffffd001`a2396910 fffff801`58108d65 tcpip!IppUpdateUnicastRouteUnderLock+0xa4
06 ffffd001`a2396990 fffff801`58124d71 tcpip!IppUpdateUnicastRoute+0xd5
07 ffffd001`a2396a40 fffff801`5814e270 tcpip!IppUpdateAutoConfiguredRoute+0xb5
08 ffffd001`a2396ac0 fffff801`581597dc tcpip!Ipv6pHandleRouterAdvertisement+0x68c
09 ffffd001`a2396ca0 fffff801`5811db65 tcpip!Icmpv6ReceiveDatagrams+0x3b4
0a ffffd001`a2396d30 fffff801`5811e73b tcpip!IppDeliverListToProtocol+0x39
0b ffffd001`a2396de0 fffff801`5811ed59 tcpip!IppProcessDeliverList+0x6f
0c ffffd001`a2396e40 fffff801`580fbecf tcpip!IppReceiveHeaderBatch+0x2d9
0d ffffd001`a2396ef0 fffff801`5817c644 tcpip!IppFlcReceivePacketsCore+0x1527
0e ffffd001`a2397170 fffff801`5817bc06 tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x8c0
0f ffffd001`a2397260 fffff803`aaaa69f3 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x27e
10 ffffd001`a2397300 fffff801`5817bcbc nt!KeExpandKernelStackAndCalloutInternal+0xf3
11 (Inline Function) --------`-------- tcpip!NetioExpandKernelStackAndCallout+0x47
12 ffffd001`a23973f0 fffff801`578e1a53 tcpip!FlReceiveNetBufferListChain+0xa4
13 ffffd001`a2397470 fffff801`578e1e7f NDIS!ndisMIndicateNetBufferListsToOpen+0x123
14 (Inline Function) --------`-------- NDIS!ndisIndicateSortedNetBufferLists+0x41
15 (Inline Function) --------`-------- NDIS!ndisMDispatchReceiveNetBufferListsInternal+0x1e4
16 ffffd001`a2397530 fffff801`578e2094 NDIS!ndisMTopReceiveNetBufferLists+0x22f
17 (Inline Function) --------`-------- NDIS!ndisInvokeNextReceiveHandler+0x2f
18 (Inline Function) --------`-------- NDIS!ndisMIndicateReceiveNetBufferListsInternal+0x84
19 ffffd001`a23975c0 fffff801`58867387 NDIS!NdisMIndicateReceiveNetBufferLists+0x114
1a ffffd001`a23977b0 fffff801`58868b2d vmxnet3n61x64+0xc387
1b ffffd001`a23978b0 fffff801`578e3e12 vmxnet3n61x64+0xdb2d
1c (Inline Function) --------`-------- NDIS!ndisMiniportDpc+0x110
1d ffffd001`a23978f0 fffff803`aaaea400 NDIS!ndisInterruptDpc+0x1a3
1e ffffd001`a23979d0 fffff803`aaae9747 nt!KiExecuteAllDpcs+0x1b0
1f ffffd001`a2397b20 fffff803`aabc98ea nt!KiRetireDpcList+0xd7
20 ffffd001`a2397da0 00000000`00000000 nt!KiIdleLoop+0x5a

And we are deleting the exact default route,

1: kd> !netioext.routes
Route            Comp IfIndex Metric PathCount State Destination Prefix                       NextHop                   
---------------- ---- ------- ------ --------- ----- ---------------------------------------- -------------------------- 
ffffe000bf1a1250    1      12    256        59 Alive ::/0                                     fe80::xxxx:xxxx:xxxx:xxRT                     
ffffe000be7da040    1       1    256         1 Alive ::1/128                                  Local address
1: kd> dv
Route = 0xffffe000`bf1a1250

The request was coming from the default gateway, and here is the packet details, and I manually parsed it below, 

1: kd> db 0xffffdff03f39b000 L0x76
ffffdff0`3f39b000  33 33 00 00 00 01 00 00-xx RT xM AC 86 dd 6e 00  33......s.....n.
                   ---DEST-MAC------ ----SRC-MAC------ -V6-- -Ver-
ffffdff0`3f39b010  00 00 00 40 3a ff fe 80-00 00 00 00 00 00 00 xx  ...@:...........
                         -LEN- -ICMP ------SRC IPV6 ADDRESS-------
ffffdff0`3f39b020  xx xx xx xx xx RT ff 02-00 00 00 00 00 00 00 00  s...............
                   ----------------- -----DEST IPV6 Address-------
ffffdff0`3f39b030  00 00 00 00 00 01 86 00-f6 00 40 00 00 00 00 0d  ..........@.....
                   -----------------                   -Router Life Time
ffffdff0`3f39b040  bb a0 00 00 00 00 01 01-00 00 xx RT xM AC 03 04  ..........s.....
                                     -SRC Link-Layer Addr--- -Prefix Information
ffffdff0`3f39b050  40 80 00 27 8d 00 00 09-3a 80 00 00 00 00 20 01  @..'....:..... .
                   --    ----------- -----------             --Prefix
ffffdff0`3f39b060  xx xx xx xx xx FX 00 00-00 00 00 00 00 00 05 01  ..`.............
                   ----------------------------------------- 
ffffdff0`3f39b070  00 00 00 00 05 dc                                ......

 

Cause 2. Network device sent out NA with IsRouter flag false (conflicting with Router). That NA Source address matches to default gateway IPv6 address, 

the default gateway was deleted because the following NA was received, 

1: kd> db ffffdff03f34a800 L56
ffffdff0`3f34a800  33 33 00 00 00 01 00 00-xx RT xM AC 86 dd 60 00  33......^.....`.
                   ---DEST-MAC------ ----SRC-MAC------ -V6-- -Ver-
ffffdff0`3f34a810  00 00 00 20 3a ff fe 80-00 00 00 00 00 00 xx xx  ... :......... .
                         -LEN- -ICMP ------SRC IPV6 ADDRESS-------
ffffdff0`3f34a820  xx xx xx xx xx RT ff 02-00 00 00 00 00 00 00 00  ................
                   ----------------- -----DEST IPV6 Address-------
ffffdff0`3f34a830  00 00 00 00 00 01 88 00-18 78 20 00 00 00 fe 80  .........x .....
                   ----------------- NA          -NA-FLAG--- -----
ffffdff0`3f34a840  00 00 00 00 00 00 xx xx-xx xx xx xx xx RT 02 01  ...... .........
                   -------TARGET--IPV6—ADDRESS--------------------
ffffdff0`3f34a850  00 00 xx xx xx xx                                ..^...
                   ---MAC-ADDRESS---

As per definition above, NA Flag 0x20000000 means the device is proactively announcing it is not a Router, and this will invalid default gateway settings, because the gateway must be a Router,

Routing info,

1: kd> !netioext.routes
Route            Comp IfIndex Metric PathCount State Destination Prefix                       NextHop                   
---------------- ---- ------- ------ --------- ----- ---------------------------------------- -------------------------- 
ffffe000bd20b510    1      12    256        68 Alive ::/0                                     fe80::xxxx:xxxx:xxxx:xxRT
The theory is that, fe80::xxxx:xxxx:xxxx:xxRT was a Router as per RA, but we received an unsolicited NA also indicating the same neighbor is not a Router, that will trigger the OS to purge the default gateway route. 1: kd> kL # Child-SP RetAddr Call Site 00 ffffd001`7b99a7f8 fffff801`f3d2de1b nt!KeBugCheck 01 ffffd001`7b99a800 fffff801`f3d2e913 tcpip!IppLogRouteChangeEvents+0x2a3 02 ffffd001`7b99a990 fffff801`f3d3d0b9 tcpip!IppDeleteUnicastRoute+0x2f 03 ffffd001`7b99a9c0 fffff801`f3d3d582 tcpip!IppInvalidateRouter+0xbd 04 ffffd001`7b99aa50 fffff801`f3d84b68 tcpip!IppHandleNeighborAdvertisement+0x49e 05 ffffd001`7b99abb0 fffff801`f3d867c2 tcpip!Ipv6pHandleNeighborAdvertisement+0x26c 06 ffffd001`7b99aca0 fffff801`f3d4ab65 tcpip!Icmpv6ReceiveDatagrams+0x39a 07 ffffd001`7b99ad30 fffff801`f3d4b73b tcpip!IppDeliverListToProtocol+0x39 08 ffffd001`7b99ade0 fffff801`f3d4bd59 tcpip!IppProcessDeliverList+0x6f 09 ffffd001`7b99ae40 fffff801`f3d28ecf tcpip!IppReceiveHeaderBatch+0x2d9 0a ffffd001`7b99aef0 fffff801`f3da9644 tcpip!IppFlcReceivePacketsCore+0x1527 0b ffffd001`7b99b170 fffff801`f3da8c06 tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x8c0 0c ffffd001`7b99b260 fffff803`fe72a813 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x27e 0d ffffd001`7b99b300 fffff801`f3da8cbc nt!KeExpandKernelStackAndCalloutInternal+0xf3 0e (Inline Function) --------`-------- tcpip!NetioExpandKernelStackAndCallout+0x47 0f ffffd001`7b99b3f0 fffff801`f36d7a53 tcpip!FlReceiveNetBufferListChain+0xa4 10 ffffd001`7b99b470 fffff801`f36d7e7f NDIS!ndisMIndicateNetBufferListsToOpen+0x123 11 (Inline Function) --------`-------- NDIS!ndisIndicateSortedNetBufferLists+0x41 12 (Inline Function) --------`-------- NDIS!ndisMDispatchReceiveNetBufferListsInternal+0x1e4 13 ffffd001`7b99b530 fffff801`f36d86b2 NDIS!ndisMTopReceiveNetBufferLists+0x22f *** ERROR: Module load completed but symbols could not be loaded for vmxnet3n61x64.sys 14 (Inline Function) --------`-------- NDIS!ndisIterativeDPInvokeHandlerOnTracker+0x2d3 15 (Inline Function) --------`-------- NDIS!ndisInvokeNextReceiveHandler+0x64d 16 (Inline Function) --------`-------- NDIS!ndisMIndicateReceiveNetBufferListsInternal+0x6a2 17 ffffd001`7b99b5c0 fffff801`f34cc387 NDIS!NdisMIndicateReceiveNetBufferLists+0x732 18 ffffd001`7b99b7b0 fffff801`f34cdb2d vmxnet3n61x64+0xc387 19 ffffd001`7b99b8b0 fffff801`f36d9e12 vmxnet3n61x64+0xdb2d 1a (Inline Function) --------`-------- NDIS!ndisMiniportDpc+0x110 1b ffffd001`7b99b8f0 fffff803`fe6af6f0 NDIS!ndisInterruptDpc+0x1a3 1c ffffd001`7b99b9d0 fffff803`fe6aea37 nt!KiExecuteAllDpcs+0x1b0 1d ffffd001`7b99bb20 fffff803`fe7d0dea nt!KiRetireDpcList+0xd7 1e ffffd001`7b99bda0 00000000`00000000 nt!KiIdleLoop+0x5a As per RFC 4861, page 63 - 65, 7.2.5. Receipt of Neighbor Advertisements,

OS should delete the route entry based on the reception.

 

Extra


When server received more than 100 auto-created route entries (included the deleted one), new route will not be accepted until, 

a. Unplug/plug the network cable,
b. Disable/enable NIC.
c. Promote the server as a Router.

 

posted @ 2017-10-24 17:22  小鼹鼠的玩具  阅读(700)  评论(0编辑  收藏  举报