博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

[转]Troubleshooting MSDTC issues with the DTCPing tool

Posted on 2011-11-03 15:29  nzperfect  阅读(4691)  评论(0编辑  收藏  举报

http://blogs.msdn.com/b/distributedservices/archive/2008/11/12/troubleshooting-msdtc-issues-with-the-dtcping-tool.aspx

Every day the Distributed Services support team in Microsoft helps customers in troubleshooting some of the most common Distributed Transaction errors which are a direct result of MSRPC (Microsoft Remote Procedure Call) communication failing in a network because of some Security\Firewall settings. On an application layer (like SQL), these are the common error messages that will be bubbled up.

  • Server: Msg 7391, Level 16, State 1, Line 2 The operation could not be performed because the OLE DB provider 'SQLOLEDB' was unable to begin a distributed transaction. OLE/DB provider returned message: New transaction cannot enlist in the specified transaction coordinator.
  • New transaction cannot enlist in the specified transaction coordinator (0x8004d00a)
  • The transaction has already been implicitly or explicitly committed or aborted (0x8004d00e)

If you encounter one of the above error messages while using Distributed Transactions from your application, feel free to use the DTCPING tool to find out where the problem lies. This blog explains how to use the DTCPing tool to narrow down the source of the problem and how to fix it.

Distributed Transactions (specifically OleTx transactions) use the MSRPC protocol to talk to MSDTC on the other machine. To make sure that the two machines are able to communicate with each other using the MSRPC protocol, you can run the DTCPING tool on both the machines to test whether the normal RPC communication is working fine or not. Before talking about the various errors that are thrown by this tool, it is very important to understand the right way to run the tool properly so that we can get the right output from the tool

  1. Distributed Transactions come in to picture whenever there is more than one server participating in a transaction. If there is just a single server involved in the distributed transaction and you are still getting some errors while running distributed transactions, then this is not the right article to focus upon. Once you have determined the right servers that are participating in the distributed transaction, launch the DTCPING.EXE tool simultaneously on both of the machines. DTCPING.EXE should be running at the same time on both the machines before you put the right server name and click the PING button.
  2. In the Remote Server Name section of the DTCPING tool you should only put the NETBIOS name of the server with which you are trying to run distributed transaction. Any test which is done after specifying the IP address of the server or the FQDN of the server is an invalid test. You MUST provide the NETBIOS name of the server against which you are trying to run distributed transactions as MSDTC uses MSRPC as the underlying mechanism and MSRPC works on NETBIOS name resolution only.
  3. In a Cluster - On a clustered machine you should always put the name of the NETWORK RESOURCE on which the MSDTC resource is dependent on in the Remote Server Name field. To find out the right NETWORK NAME to use in DTCPing, open up the Cluster Administrator and go to the group in which the MSDTC Resource is present. That group should have one network name resource on which the DTC Resource is dependent on. To find out the network name, just go to the properties of the network name resource and go to the Parameters tab. The name that you see there is what is network name for this MSDTC resource. Lets understand this with an example :- lets say that you have a 2 node cluster with two nodes with the NETBIOS names as DBSERVER01 and DBSERVER02 and you are trying to run distributed transactions from a third server APPSERVER. Then to run DTCPing in the right way, you should start the DTCPING.EXE on the APPSERVER and the active node of the cluster. (By active node, I mean the node on which the DTC resource is online). Then from the Cluster Administrator go to the properties of the NETWORK NAME resource in which the Distributed Transaction Coordinator resource is present and go to the PARAMETERS tab. Note this network name. Once you have determined the right network name, launch DTCPING on the active node of the cluster and the APPSERVER and inside the DTCPing window on the APPSERVER, put in the network name of the clustered MSDTC Resource inside the (Remote Server Name) field and then click PING.
  4. You should always have only one instance of DTCPING.EXE running on the server when you are testing and for subsequent tests you should always close the DTCPING tool and open it again.

After ensuring that you have read the above points properly, you should just run the DTCPING tool and in the Remote Server name type in the right server name and click PING. If everything works fine you should see the following message being returned by the tool. (Here DTCPing was ran from machine with NETBIOS name SOURCE to the machine with NETBIOS name DESTINATION)


++++++++++++++++++++++++++++++++++++++++++++++
DTCping 1.9 Report for SOURCE
++++++++++++++++++++++++++++++++++++++++++++++
RPC server is ready
++++++++++++Validating Remote Computer Name++++++++++++
11-21, 04:31:01.455-->Start DTC connection test
Name Resolution:
DESTINATION-->65.52.22.254-->DESTINATION.contoso.com
11-21, 04:31:01.470-->Start RPC test (SOURCE-->DESTINATION)
RPC test is successful
Partner's CID:084B708C-F0C5-4E65-95F2-8E2DEF73FFF3
++++++++++++RPC test completed+++++++++++++++
++++++++++++Start DTC Binding Test +++++++++++++
Trying Bind to DESTINATION
11-21, 04:31:01.830-->SOURCE Initiating DTC Binding Test....
Test Guid:B5544E05-D64B-40AC-B283-71947914DED3
Received reverse bind call from DESTINATION
Network Name: SOURCE
Source Port: 1116
Hosting Machine:SOURCE
Binding success: SOURCE-->DESTINATION
++++++++++++DTC Binding Test END+++++++++++++

If you see the above message, go to the second server and put the name of the source server in the Remote Server Name field and hit PING and make sure that you see the same results as above. If both of the servers return success after running the DTCPING tool but the distributed transactions are still not working, then you should see the PART II of this article which talks about how to fix distributed transaction issues when the DTCPING works fine between two machines.

If the result of the tool is not success, figure out what error you are getting and follow the steps mentioned in the sections below to fix the error message.

ERROR MESSAGE 1 - gethostbyname failure

DTCPing log file: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES8618
RPC server is ready
Please Start Partner DTCPing before pinging
++++++++++++Validating Remote Computer Name++++++++++++
Please refer to following log file for details: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES861840.log
Error(0xB7) at nameping.cpp @43
-->gethostbyname failure -->183(Cannot create a file when that file already exists.)
Can not resolve abc Invalid remote host name:abc

I think this error is more than self explanatory. You will get this if the host name that you added in the Remote Server Name is not a valid host name. Make sure that the remote server name that you specified in the DTCPING tool resolves to a valid IP address by running the ping command. If that host name is not resolving to any IP address, you can try adding the host name in the hosts file and try running the tool again.

ERROR MESSAGE 2 - The RPC server is unavailable

DTCping log file: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES8618
RPC server is ready Please Start Partner DTCping before pinging
++++++++++++Validating Remote Computer Name++++++++++++
Please refer to following log file for details: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES861896.log
Invoking RPC method on turtle86
Problem:fail to invoke remote RPC method Error(0x6BA) at dtcping.cpp @303
-->RPC pinging exception -->1722(The RPC server is unavailable.)
RPC test failed

This indicates that either the port 135 or one of the ports in the DCOM port range is blocked on the firewall. To confirm this further let's say you ran DTCPING from SERVER01 to SERVER02 and you got this error. Now open up a command prompt on SERVER01 and type telnet SERVER02 135 (Before running this test just ensure that the Telnet service is started on the server. On a Windows 2008 Server, the telnet service is not installed by default and you have to install it by configuring the role services on the server.). If you see a blank window with a cursor blinking, that is enough to tell that the port is NOT blocked but if the telnet command fails with an error, you can easily infer that the port 135 is blocked and you should check with your network team to get the port 135 opened up bi-directionally on the firewall. If telnet to port 135 works just fine, then do a NETSTAT -anob on the SERVER02 and find out the port on which the DTCPING.EXE is listening. Then come back to SERVER01 and do a telnet SERVER02 <PORT_NUMBER>. If you got this error, then this test has to fail. MSDTC uses the MSRPC protocol to talk to MSDTC on the remote machine. As a result of the normal working of the MSRPC protocol, MSDTC is free to use one of the dynamic ports within the range 1024-65535. Well if MSDTC can use any one port within this range, then how should I be configuring my firewall? Should I got ahead and open up the entire range on my firewall? Then what is use of a firewall? The answer to that is - you don't have to open up this entire range on the firewall but you can restrict the RPC End Point Mapper Service to just specify a range of ports that will be used by any DCOM program. Please note again - this range impacts ALL the programs that use MSRPC and not just MSDTC. You can configure this range in the registry or in the DCOMCNFG UI. To specify this range, scroll down below to the section "RESTRICTING THE DCOM PORT RANGE" which talks about how to restrict DCOM to use a specific port range.

ERROR MESSAGE 3 - The remote procedure call failed

RPC server is ready
++++++++++++Validating Remote Computer Name++++++++++++
Problem:fail to invoke remote RPC method
Error(0x6BE) at dtcping.cpp @303
-->RPC pinging exception -->1726(The remote procedure call failed.)
RPC test failed

This error is a result of a firewall disconnecting the TCP connection between the two machines. You have to get in touch with your firewall administrators to help you figure out why the Firewall is closing the TCP connection between two machines. To troubleshoot this error you can install Network Monitor tool on both the machines and re-run the test and you should see a TCP RESET packet sent by a network device which is trying to close the connection.

ERROR MESSAGE 4 - There are no more endpoints from the endpoint mapper

DTCping log file: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES8626
RPC server is ready Please Start Partner DTCping before pinging
++++++++++++Validating Remote Computer Name++++++++++++
Please refer to following log file for details: C:\Documents and Settings\username\Desktop\DTC_PING\TURTLES86268.log
Invoking RPC method on turtle86
Problem:fail to invoke remote RPC method Error(0x6D9) at dtcping.cpp @303
-->RPC pinging exception -->1753(There are no more endpoints available from the endpoint mapper.)
RPC test failed

This error makes it appear that RPC is running out of DCOM Ports but you should not infer this error as a port exhaustion immediately. If you are able to run DTCPING.EXE on both the machines and during startup of the EXE, the tool doesn't complain of an out of port error, then this error is just a result of Firewall blocking the ports and the troubleshooting for this error should be exactly same as troubleshooting the "The RPC server is unavailable" error which is described above. Why am I saying that? Let's say that we are running out of DCOM ports (which can typically happen if you have specified a port range and the range is too small, something less than 30), then you should see an error the moment you start the DTCPING.EXE. Because that's when the DTCPing.exe will contact the End Point Mapper service (RpcSS) and ask for a dynamic port. If the DTCPing.exe starts up just fine, then it means that it got the right port allocated to it and hence there is no question of the Endpoint Mapper running out of DCOM ports.

ERROR MESSAGE 5 - ERROR MESSAGE 5 - Access is Denied

Invoking RPC method on TURTLE86
Problem:fail to invoke remote RPC method
Error(0x5) at dtcping.cpp @303
-->RPC pinging exception
-->5(Access is denied.)

This error will only occur if the destination machine is a Windows XP machine or a Windows VISTA machine. This is an additional security in the RPC layer which is configured on the client operating systems. More details on this security aspect is described in the article "RPC Interface Restriction" on Technet

To get rid of this error just follow these steps to configure the registry key and REBOOT the machine.

1. Click Start, click Run, type Regedit, and then click OK.
2. Locate and then click the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT
3. On the Edit menu, point to New, and then click Key.
Note If the RPC registry key already exists, go to step 5.
4. Type RPC, and then press ENTER.
5. Click RPC.
6. On the Edit menu, point to New, and then click DWORD Value.
7. Type RestrictRemoteClients, and then press ENTER.
8. Click RestrictRemoteClients.
9. On the Edit menu, click Modify.
10. In the Value data box, type 0, and then click OK.
Note To enable the RestrictRemoteClients setting, type1.
11. Close Registry Editor and restart the computer.

ERROR MESSAGE 6 - Not enough resources are available to complete this operation

DTCping log file: Z:\Tools\DTC_PING\TURTLE865072.log
Error(0x6B9) at rpcUtil.cpp @133
-->I_RpcServerAllocateIpPort
-->1721(Not enough resources are available to complete this operation.)
Error(0x6B9) at rpcUtil.cpp @54
-->1721(Not enough resources are available to complete this operation.)

You will get this error message immediately after starting the DTCPING window. This error message means that RPC is running out of the ports on the machine because the DCOM Port range that you defined is too less or there are a lot of other RPC applications which are using DCOM ports (Typically a DCOM or a RPC program just uses one DCOM port but it is possible for an application to acquire more than one DCOM port by calling the RPC API's directly). To fix this error message increase the port range by following these steps.

RESTRICTING THE DCOM PORT RANGE

1. Go to Start -> Run. Type in DCOMCNFG.
2. Go to the properties of the My Computer node under the Computers folder underneath Component Services.
3. Under the My Computer Properties look under the Default Protocols tab.
4. Over there make sure that Connection-oriented TCP/IP is selected and then click on Properties.
5. You will see a window like this
                        EmptyPortRange

If you don’t see a range above and the window looks exactly like the one above, that would mean that the DCOM port range is not configured on the machine.
You can click Add in the above window and type the range (let's say as 5000-5100) and say Ok. Make sure it looks like this. (Both the radio buttons should be selected for Internet Range)

                       PortRange5000
You have to configure this range on both the machines and after that you have to reboot both the servers for this Range to take effect. After doing that you have to open up the same range on your firewall bi-directionally.

Let's say you have already opened enough ports but still you get this error message. In this case you should run NETSTAT -anob on the machine which is returning this error message and try to find out which program is using all the ports. You should look for the ports that you have defined in the RPC Port range and look for all EXE's which are listening on that port range.

We hope this article gives you enough insight on how to troubleshoot issues with MSDTC using the DTCPing tool and we hope that going further you should be able to troubleshoot and diagnose issues related to MSDTC on your own. If you still need assistance from us to solve any DTCPING errors that were talked about in the above blog, please feel free to collect the DTCPING log files from both the machine (where DTCPING was ran) and open up a Support Incident with Microsoft. We (the MSDTC support team) will be more than happy to provide you a timely resolution to the problem once we have the right data to look at.

If you are interestedly to dig deeper in to the DCOM and the Firewall concepts, feel free to explore the following articles which talks about DCOM issues in general.