Modifying IL at runtime

Modifying IL at runtime

If you remember the Omniscient Debugger, it was a Java debugger that instrumented the bytecode at runtime to trace calls and monitor variables. It did so by using a custom ClassLoader.
Unfortunately the .NET classes that seemed somewhat equivalent to the Java ClassLoader are sealed, so they can't be extended. So, for a while I thought runtime instrumentation of the code wasn't possible in .NET...

A couple weeks later, I stumbled onto the NProf (open-source .NET profiler) project and wondered how they did their magic. It turns out they use the CLR Profiling APIs which are COM based and allow you to hook up into various events and get information on the runtime. It is while digging some more into these that I first found a mention of the intriguing ICorProfilerInfo::SetILFunctionBody method.

Although I still think that it is not very well documented (no MSDN reference and very few hits in Google), I have since found bits and pieces of information about this method and wrote a little program that demos its potential.

In this article, we'll go through the steps to build this simple runtime IL transformation program, to give you a better feel of what Get/SetILFunctionBody allows you to do.

Update: Follow-up articles are available (step II, step II+, step III).

Background information on the Profiling APIs
First you should have a little background on the Profiling APIs of the framework.

The SDK comes with a Tool Developers Guide. It's a directory with various documents, including the precious Profiling.doc file. Since I don't have Word on all my machines, I converted it to pdf and copied it over: CLR Profiling (Tool Developers Guide).

Two "Under the Hood" articles on MSDNMag, about the Profiling APIs:
The .NET Profiling API and the DNProfiler Tool and NET CLR Profiling Services: Track Your Managed Components to Boost Application Performance.


DNProfiler
I used the DNProfiler tool by Matt Pietrek as the foundation for the experiment. You can grab it on the MSDNMag page mentioned above.

You should be able to build DNProfiler with VS.net and run it easily. Try it on a couple simple .NET programs and look at the generated DNProfiler.out file, that contains the output of all ProfilerPrintf calls. You'll see the flood of events that the most simple program can generate.

It turns out that the main event that we'll need is JITCompilationStarted, so you can empty most of the other event methods (leave Initialize as it is, though).

Also, you don't need to receive notification from the CLR for all the events, so you can modify the profiling_on.bat batch to have "set DN_PROFILER_MASK=0x20", where 0x20 means COR_PRF_MONITOR_JIT_COMPILATION. This will tell the CLR to call all the JIT related hook functions in our profiler.


GetILFunctionBody
When the foundation is laid and we have a running profiler with a JITCompilationStarted method that gets called, we can start looking at the live IL as it gets JITed.

The ICorProfilerInfo::GetILFunctionBody allows you to do that.

Here is the code I used:

HRESULT CProfilerCallback::JITCompilationStarted(UINT functionId,
      BOOL fIsSafeToBlock)
{
  wchar_t wszClass[512];
  wchar_t wszMethod[512];

  // Uncomment the next line to set a breakpoint
  // __asm int 3

  HRESULT hr = S_OK;

  ClassID classId = 0;
  ModuleID moduleId = 0;
  mdToken tkMethod = 0;
  LPCBYTE pMethodHeader = NULL;
  ULONG iMethodSize = 0;


  //
  // Get the name of the method that is going to get JITed
  //
  if (GetMethodNameFromFunctionId(functionId, wszClass, wszMethod))
  {
   ProfilerPrintf("JITCompilationStarted: %ls::%ls\n",wszClass,wszMethod);
  } else {
   ProfilerPrintf("JITCompilationStarted\n");
  }


  //
  // Get the IL
  //
  hr = m_pICorProfilerInfo->GetFunctionInfo(functionId, &classId, &moduleId, &tkMethod );
  if (FAILED(hr))
   { goto exit; }

  hr = m_pICorProfilerInfo->GetILFunctionBody(moduleId, tkMethod, &pMethodHeader, &iMethodSize);
  if (FAILED(hr))
   { goto exit; }


  //
  // Look at the IL and print it out
  //
  IMAGE_COR_ILMETHOD* pMethod = (IMAGE_COR_ILMETHOD*)pMethodHeader;
  COR_ILMETHOD_FAT* fatImage = (COR_ILMETHOD_FAT*)&pMethod->Fat;

  if(!fatImage->IsFat()) {
   COR_ILMETHOD_TINY* tinyImage = (COR_ILMETHOD_TINY*)&pMethod->Tiny;
   //Handle Tiny method
  } else {
   //Handle Fat method
   ProfilerPrintf("Flags: %X\n", fatImage->Flags);
   ProfilerPrintf("Size: %X\n", fatImage->Size);
   ProfilerPrintf("MaxStack: %X\n", fatImage->MaxStack);
   ProfilerPrintf ("CodeSize: %X\n", fatImage->CodeSize);
   ProfilerPrintf("LocalVarSigTok: %X\n", fatImage->LocalVarSigTok);

   byte* codeBytes = fatImage->GetCode();
   ULONG codeSize = fatImage->CodeSize;

   for(ULONG i = 0; i < codeSize; i++) {
    if(codeBytes[i] > 0x0F) {
     ProfilerPrintf("codeBytes[%u] = 0x%X;\n", i, codeBytes[i]);
    } else {
     ProfilerPrintf("codeBytes[%u] = 0x0%X;\n", i, codeBytes[i]);
    }
   }
  }

exit:
  return hr;
}

This code is based on the original DNProfiler method and has code pieces from this entry and this entry from Jimski's blog.

You'll need to #include "corhlpr.h" to get access to the type definitions like COR_ILMETHOD_FAT.

The sample Hello.cs file (compiled with "csc Hello.cs"):

using System;

public class Hello
{
  public static void Main(string[] prms)
  {
   Console.WriteLine("hello world!");
   Console.WriteLine("test!");
  }
}


This brings the following DNProfiler.out file:

Initialize
JITCompilationStarted: Hello::Main
Flags: 13
Size: 3
MaxStack: 1
CodeSize: 15
LocalVarSigTok: 0
codeBytes[0] = 0x72;
codeBytes[1] = 0x01;
codeBytes[2] = 0x00;
codeBytes[3] = 0x00;
codeBytes[4] = 0x70;
codeBytes[5] = 0x28;
codeBytes[6] = 0x02;
codeBytes[7] = 0x00;
codeBytes[8] = 0x00;
codeBytes[9] = 0x0A;
codeBytes[10] = 0x72;
codeBytes[11] = 0x1B;
codeBytes[12] = 0x00;
codeBytes[13] = 0x00;
codeBytes[14] = 0x70;
codeBytes[15] = 0x28;
codeBytes[16] = 0x02;
codeBytes[17] = 0x00;
codeBytes[18] = 0x00;
codeBytes[19] = 0x0A;
codeBytes[20] = 0x2A;
Shutdown


If you run "ildasm /bytes Hello.exe", you'll see the matching bytes in the dis-assembled version of the Main method. ILdasm will give you more insight on what the bytes actually mean and how they are grouped.
The comparison of the output and the ILdasm dis-assembly suggests that switching codeBytes[1] and codeBytes[11] could lead to printing the strings in the reverse order. That's what we'll try and do :-)


SetILFunctionBody
Here is the code I used to switch the two string prints in Hello.exe:

HRESULT CProfilerCallback::JITCompilationStarted(UINT functionId,
      BOOL fIsSafeToBlock)
{
  wchar_t wszClass[512];
  wchar_t wszMethod[512];

  //__asm int 3
  HRESULT hr = S_OK;

  ClassID classId = 0;
  ModuleID moduleId = 0;
  mdToken tkMethod = 0;
  LPCBYTE pMethodHeader = NULL;
  ULONG iMethodSize = 0;

  if ( GetMethodNameFromFunctionId( functionId, wszClass, wszMethod ) )
  {
   ProfilerPrintf("JITCompilationStarted: %ls::%ls\n",wszClass,wszMethod);
  } else {
   ProfilerPrintf( "JITCompilationStarted\n" );
   goto exit;
  }
  if (wcscmp(wszClass, L"Hello") != 0 || wcscmp(wszMethod, L"Main") != 0) {
   goto exit;
  }


  //
  // Get the existing IL
  //
  hr = m_pICorProfilerInfo->GetFunctionInfo(functionId, &classId, &moduleId, &tkMethod );
  if (FAILED(hr))
   { goto exit; }

  hr = m_pICorProfilerInfo->GetILFunctionBody(moduleId, tkMethod, &pMethodHeader, &iMethodSize);
  if (FAILED(hr))
   { goto exit; }


  //
  // Print the existing IL
  //
  IMAGE_COR_ILMETHOD* pMethod = (IMAGE_COR_ILMETHOD*)pMethodHeader;
  COR_ILMETHOD_FAT* fatImage = (COR_ILMETHOD_FAT*)&pMethod->Fat;

  if(!fatImage->IsFat()) {
   COR_ILMETHOD_TINY* tinyImage = (COR_ILMETHOD_TINY*)&pMethod->Tiny;
   //Handle Tiny method
  } else {
   //Handle Fat method
   ProfilerPrintf("Flags: %X\n", fatImage->Flags);
   ProfilerPrintf("Size: %X\n", fatImage->Size);
   ProfilerPrintf("MaxStack: %X\n", fatImage->MaxStack);
   ProfilerPrintf ("CodeSize: %X\n", fatImage->CodeSize);
   ProfilerPrintf("LocalVarSigTok: %X\n", fatImage->LocalVarSigTok);

   byte* codeBytes = fatImage->GetCode();
   ULONG codeSize = fatImage->CodeSize;

   for(ULONG i = 0; i < codeSize; i++) {
    if(codeBytes[i] > 0x0F) {
     ProfilerPrintf("codeBytes[%u] = 0x%X;\n", i, codeBytes[i]);
    } else {
     ProfilerPrintf("codeBytes[%u] = 0x0%X;\n", i, codeBytes[i]);
    }
   }
  }


  //
  // Get the IL Allocator
  //
  IMethodMalloc* pIMethodMalloc = NULL;
  IMAGE_COR_ILMETHOD* pNewMethod = NULL;
  hr = m_pICorProfilerInfo->GetILFunctionBodyAllocator(moduleId, &pIMethodMalloc);
  if (FAILED(hr))
   { goto exit; }


  //
  // Allocate IL space and copy the IL in it
  //
  pNewMethod = (IMAGE_COR_ILMETHOD*) pIMethodMalloc->Alloc(iMethodSize);
  if (pNewMethod == NULL)
   { goto exit; }

  memcpy((void*)pNewMethod, (void*)pMethod, iMethodSize);


  //
  // Print IL copy, modify it and print it again
  //
  COR_ILMETHOD_FAT* newFatImage = (COR_ILMETHOD_FAT*)&pNewMethod->Fat;
  if(!newFatImage->IsFat()) {
   COR_ILMETHOD_TINY* newTinyImage = (COR_ILMETHOD_TINY*)&pNewMethod->Tiny;
   //Handle Tiny method
  } else {
   //Handle Fat method
   ProfilerPrintf("New Flags: %X\n", newFatImage->Flags);
   ProfilerPrintf("New Size: %X\n", newFatImage->Size);
   ProfilerPrintf("New MaxStack: %X\n", newFatImage->MaxStack);
   ProfilerPrintf ("New CodeSize: %X\n", newFatImage->CodeSize);
   ProfilerPrintf("New LocalVarSigTok: %X\n", newFatImage->LocalVarSigTok);

   byte* codeBytes = newFatImage->GetCode();
   ULONG codeSize = newFatImage->CodeSize;

   for(ULONG i = 0; i < codeSize; i++) {
    if(codeBytes[i] > 0x0F) {
     ProfilerPrintf("codeBytes[%u] = 0x%X;\n", i, codeBytes[i]);
    } else {
     ProfilerPrintf("codeBytes[%u] = 0x0%X;\n", i, codeBytes[i]);
    }
   }


   //
   // Tweak the IL (switch the bytes)
   //
   BYTE temp;
   temp = codeBytes[1];
   codeBytes[1] = codeBytes[11];
   codeBytes[11] = temp;


   //
   // Print the modified IL
   //
   for(ULONG i = 0; i < codeSize; i++) {
    if(codeBytes[i] > 0x0F) {
     ProfilerPrintf("codeBytes[%u] = 0x%X;\n", i, codeBytes[i]);
    } else {
     ProfilerPrintf("codeBytes[%u] = 0x0%X;\n", i, codeBytes[i]);
    }
   }
  }


  hr = m_pICorProfilerInfo->SetILFunctionBody(moduleId, tkMethod, (LPCBYTE) pNewMethod);
  if (FAILED(hr))
   { goto exit; }


  pIMethodMalloc->Release();

exit:
  return hr;
}

If you run Hello.exe with this profiler, you'll get "Test!" then "Hello World!", which confirms that the IL was modified. Success !!
You'll notice that it is completely hardcoded for the current Hello.exe example, so you shouldn't try it on other assemblies.

If you try to tweak the IL that you got out of GetILFunctionBody, you'll get an access violation because it is read-only. This is why we first make a copy of it, then tweak it and finally set it back in with SetILFunctionBody.


To infinity and beyond
So far, only a very simple assembly got runtime-modified.
But I hope this helped you realize how powerful this technique can be. For example, John Lam appears to be implementing an AOP extension to the CLR using these Profiling APIs.
A better understanding of the IL is needed in order to do more powerful modifications: calling other methods (in the current class or on a reference), adding runtime safety checks (for contract programming), monitoring methods and properties that bear a certain attribute.

Let me know of your experiments in the area,
Have fun,
Dumky

Other references
COR_ILMETHOD and other declarations from the Rotor source code.
Another Get/SetILFunctionBody sample.
A set of slides on CLAW and the Profiler APIs.
Source: http://blog.monstuff.com/archives/000058.html

posted @ 2004-05-19 13:25  dudu  阅读(2670)  评论(3编辑  收藏  举报