C#模块初始化注入

Creating a module initializer in .NET

This article will cover the process, techniques and code required to automatically resolve an embedded library in C# and inject a module initializer into the compiled assembly using IL weaving.

A technical blog post indeed, but the terms and concepts will be explained in depth so by the end of this read, you'll be weaving assemblies like a spider.

What's the officer, problem?

Say you have a utilities library that contains custom helper classes, extensions, native methods and general utility classes to support your application. That library might require functionality from other, third-party libraries to perform its tricks. For example, in your library you may have a custom compression wrapper that references a third-party library for its compression algorithm (like SharpZipLib or Protobuf-Net). Another example is a class for manipulating firewall rules that requires a .dll that may not always present on the Windows OSs you need to deploy to (yes, I'm looking at you NetFwTypeLib.dll!)

One solution to tackle this, is to deploy these third-party dependencies along with your application. However, this is not always an option and would require a change in the deployment process to facilitate every change in your library's dependencies. Moreover, you may end up dragging along a Christmas tree of third-party libraries. An alternative would be to encapsulate the third-party library file as part of the compiled utilities assembly and unpack this embedded resource when it's needed at runtime. This sounds like a great idea!

So how would this unpack mechanism work? Would each utilities class with a third-party dependency require an 'Initialize'-method to unpack it? That would work, but is unfriendly to the caller who would require an extra method call to achieve what he wants. Adding a static constructor for each class to unpack dependencies before any calls are made, might work as well, but it would be better and more generic to let the .NET runtime figure it all out with a little help.

From C# to native machine code

In order to fully understand the basic components involved in what we're trying to achieve, let's go through the process of converting C# code into native machine code.
The image below depicts, on a high-level, how the contents of a C# project are compiled and handled by the .NET Framework at runtime.

" width="450

The compiler grabs all C# source files, resources and references and compiles it into Microsoft Intermediate Language (MSIL or IL). MSIL is code that is independent of hardware and the operating system. This compilation process results in a Managed Assembly (.exe or .dll file).

An assembly is called 'managed' because it is managed by the Common Language Runtime (CLR or .NET runtime) when it is executed. The CLR uses the .NET Framework Class Libraries (FCL) and Just-in-Time (JIT) compiler to convert it to machine code that is specific, or native, for the Operating system it's running on. Of course, much more is involved, but this is the process in a nutshell just big enough to contain this nut.

The following chapters go into the details for each step required to create a module initializer to resolve embedded assemblies:

Note that all code and the full proof of concept can be downloaded from my GitHub project.

 

1. Embedding a resource into an assembly

The preparation for the embedding of a resource, is done in the Visual C# Project, while the actual embedment is executed by the compiler. After embedding a resource into the Managed Assembly, the .dll or .exe will be a little bigger in size. In this step, we want to embed an assembly in order to be able to resolve it at runtime later.

To add a .dll to your project as an embedded resource, follow the steps below:

  • In Visual Studio, right-click the project name, click 'Add', and then click 'Add Existing Item'.
  • Browse to the location of the .dll to add to your project, select it and click 'Add'.
  • Right-click the newly added file and select 'Properties'.
  • In the Properties dialog box, locate the 'Build Action' property. By default, this property is set to 'Content'. Click the property and change it to 'Embedded Resource'.

For my 'Coen.Utilities' project, I added the 'SharpZipLib.dll' and placed it inside a folder called 'Embedded':
" width="400

Compile the project and notice the output assembly file's size has increased as it now contains the included file as well. You can also use ildasm.exe (MSIL Disassembler, shipped with Visual Studio) to check the Manifest of your assembly.
My 'Coen.Utilities' increased by 200.704 bytes after embedding the SharpZipLib.dll. The manifest below shows it's indeed part of my assembly and it has a length of 0x00031000, which is 200.704 in decimal.

" width="600

Note that the path of the project-folder in which this file is stored, is added to the assembly's default namespace. Its result is the namespace in which the embedded resource can be found at runtime. In my 'Coen.Utilities' assembly with that same default namespace, an embedded resource in a project folder called 'Embedded' can be found in the namespace 'Coen.Utilities.Embedded'. The screenshot above of the manifest proves this as well. It's easy enough, but important when we build the embedded resource resolver later on.

 

2. Resolving an assembly

Resolving of assemblies is done by the CLR at runtime, which is nothing more than finding the required assemblies when they're needed. Whenever the CLR cannot find an assembly in its usual probing paths, an AssemblyResolve event will be invoked. By handling this event, the application can help the CLR locate and load an assembly, its specific version from exotic places like network locations or ... from an embedded resource.

Before we get into resolving embedded assemblies, let's see how a straight-forward runtime assembly resolving is done by .NET via the System.AppDomain.AssemblyResolve event.

Consider the following code, which is simplified for the purpose of explaining the mechanism:

static void Main()
{
	System.AppDomain.CurrentDomain.AssemblyResolve += CurrentDomainAssemblyResolve;
	
	var myClass = new MyClass();
	myClass.DoSomething();
}

private static Assembly CurrentDomainAssemblyResolve(object sender, ResolveEventArgs args)
{
	var fileName = new AssemblyName(args.Name).Name;
	var filePath = Path.Combine(AssemblyVault, fileName);
	
	if (File.Exists(filePath))
		return Assembly.LoadFile(filePath);

	return null;		
}
  • line 3 subscribes to the AssemblyResolve event with 'CurrentDomainAssemblyResolve' as a handler. This handler will be called whenever the AssemblyResolve event is invoked.
  • lines 5 and 6 just suggest myClass does something to trigger the resolve event.
  • line 11 gets the filename inside the ResolveEventArgs parameter.
  • line 12 combines the filename with a custom assembly location (AssemblyVault constant) which would make up the full file path of the assembly that's being resolved.
  • line 13 checks whether this assembly file actually exists and line 14 attempts to load it before returning it.
  • if the resolve handler was not able to locate the assembly, return null in line 17.

So far so good, now the embedded part.

 

3. Resolving an embedded assembly

As described in the previous chapter, in order to get the resolving of embedded assemblies working, we need to subscribe to the AssemblyResolve event that is invoked by the appdomain of the current thread (System.AppDomain.CurrentDomain).

The mechanism for resolving an embedded resource differs only in the way the Assembly is loaded. Consider the extended version of the CurrentDomainAssemblyResolve method from the previous chapter:

private static Assembly CurrentDomainAssemblyResolve(object sender, ResolveEventArgs args)
{
	var assembly = System.AppDomain.CurrentDomain.GetAssemblies().FirstOrDefault(a => a.FullName == args.Name);

	if (assembly != null)
		return assembly;

	using (var stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(GetResourceName(args.Name)))
	{
		if (stream != null)
		{
			// Read the raw assembly from the resource
			var buffer = new byte[stream.Length];
			stream.Read(buffer, 0, buffer.Length);

			// Load the assembly
			return Assembly.Load(buffer);
		}
	}
	return null;
}

private static string GetResourceName(string assemblyName)
{
	var name = new AssemblyName(assemblyName).Name;
	return $"{EmbeddedResourceNamespace}.{name}.dll";
}
  • lines 3-6 prevent the assembly from being loaded twice.
  • line 8 creates the stream of the resource of which the name is calculated in the method starting in line 23. In other words, this is the stream containing our embedded resource.
  • line 13-14 read the entire resource stream into a buffer.
  • line 17 loads the assembly from a buffer and returns it.
  • line 25-26 construct the name of the resource. Which consists of the 'root' namespace for embedded resources (like 'Coen.Utilities.Embedded') and the name of the resource (like 'ICSharpCode.SharpZipLib') and the .dll extension.

Note again that this code snippet is simplified for explanation purposes. If you make your own AssemblyResolve handler, make sure to do proper exception handling.

The best time to subscribe to this event, is as soon as possible, before any other calls are made. Doing this manually by forcing the user to call an 'Initialize'-method before he can use the assembly or a specific class, is not so user-friendly. It is better to subscribe to the AssemblyResolve event as soon as the assembly containing the embedded resource is loaded; using the module initializer described in the next chapter.

 

4. Adding a module initializer

A module initializer can be seen as a constructor for an assembly (technically it is a constructor for a module; each .NET assembly is comprised of one or more modules, typically just one). It is run when a module is loaded for the first time, and is guaranteed to run before any other code in the module runs, before any type initializers, static constructors or any other initialization code.

Why can't I just use C# to do this?

Module initializers are a feature of the CLR that is not available in C#. C# cannot do this because it puts everything in a class/struct and module initializers need to be globally defined. This is why we will be injecting this module initializer into the MSIL (or as it is called: IL weaving) after the C# code is compiled.

Which library to use?

So, there are a few ways to create a module initializer. One way is to use Fody, the IL weaving library by Simon Cropp. This is a great library and definitely worth checking out.
However, for fun and to learn more about this technique, we're going a little deeper and do it ourselves. For this we use the Mono.Cecil library, which is actually used by Fody as well.

Mono.Cecil

Cecil is an awesome project written by Jb Evain to generate and inspect programs and libraries in the ECMA CIL format. With Cecil, it is possible to load existing managed assemblies, browse all the contained types, modify them on the fly and save the modified assembly back to the disk ... Which is exactly what we need.

 

The trick

For the assembly containing the embedded resource we are going to create an internal class called ModuleInitializer, which will contain a public method called Initialize(). This Initialize-method subscribes to the AssemblyResolve event:

internal class ModuleInitializer
{
    public static void Initialize()
    {
        System.AppDomain.CurrentDomain.AssemblyResolve += CurrentDomainAssemblyResolve;
    }

    private static Assembly CurrentDomainAssemblyResolve(object sender, ResolveEventArgs args)
    {
        // Embedded resolve magic here
    }
} 

Nothing fancy here. Note that in this code snippet, the ModuleInitializer class is internal, so it cannot be called from outside the assembly. We don't want any other calls made to this class other than our module initializer. Another important thing to note is that the public Initialize()-method is static and has no parameters. This is a requirement for using this technique and will be explained further on.

The trick comprises of a few steps:

  • Read the compiled assembly.
  • Find the Initialize() method in the compiled assembly.
  • Create an assembly constructor using Mono.Cecil and make it call the Initialize() method
  • Inject the constructor into the assembly
  • Save the assembly and rewrite the program database (.pdb) to match the new assembly structure.

Notes:
We need to take into account that we may want the assembly to be strong named, which is why the final save-step will also take into account the key to sign the assembly.

Since injecting the module initializer into the assembly must be done in the MSIL, obviously this process needs to be a post-build step. I created a console application for this, so I can easily add it as a post-build event for any assembly I want to use this technique for.

 

The implementation

Let's check out the code that makes this all happen. It consists of a public main Inject()-method calling other private methods that will be described further on.

When explaining the code, I will leave out any boiler-plate code required for this console application to work. If you want to check it out in its full glory, check out my GitHub project.

Inject()

Consider the following code of the method that will be the main entry for the console application after command line arguments have been parsed:

private AssemblyDefinition InjectionTargetAssembly { get; set; }

public void Inject(string injectionTargetAssemblyPath, string keyfile = null)
{
	// Validate the preconditions
	// - Does the injectionTargetAssemblyPath exist?
	// - If the keyfile is provided, does the file exist?

	try
	{
		// Read the injectionTarget
		ReadInjectionTargetAssembly(injectionTargetAssemblyPath);

		// Get a reference to the initializerMethod 
		var initializerMethod = GetModuleInitializerMethod();

		// Inject the Initializermethod into the assembly as a constructormethod
		InjectInitializer(initializerMethod);

		// Rewrite the assembly
		WriteAssembly(injectionTargetAssemblyPath, keyfile);
	}
	catch (Exception ex)
	{
		throw new InjectionException(ex.Message, ex);
	}
}       
  • line 1 defines a private property in which the AssemblyDefinition is stored after the assembly has been read from file. AssemblyDefinition is a type defined in the Cecil library.
  • ReadInjectiontargetAssembly() in line 12 reads the AssemblyDefinition from disk and stores in the InjectionTargetAssembly property.
  • GetModuleInitializerMethod() in line 15 locates the Initialize() method in the assembly and returns it.
  • line 18 calls the InjectInitializer() method, which creates the constructor and makes it call the Initialize() method.
  • WriteAssembly() in line 21 rewrites the assembly.

This is the main structure of the Inject method. In the following paragraphs, each of the important calls will be further explained:

 

ReadInjectionTargetAssembly()

The ReadInjectionTargetAssembly-method reads the assembly in which the constructor should be injected. It also reads the Program Database (.pdb) file, if it could be located, in order to restructure its contents with respect to the changes made to the assembly after injecting the constructor.

private void ReadInjectionTargetAssembly(string assemblyFile)
{
	var readParams = new ReaderParameters(ReadingMode.Immediate);

	if (GetPdbFilePath(assemblyFile) != null)
	{
		readParams.ReadSymbols = true;
		readParams.SymbolReaderProvider = new PdbReaderProvider();
	}

	InjectionTargetAssembly = AssemblyDefinition.ReadAssembly(assemblyFile, readParams);
}
  • line 3 defines the parameters for the read action. We want don't want to defer reading to a later time, so set it to Immediate ReadingMode.
  • line 5 determines whether a .pdb file is present
  • line 7-8 configure the reader parameters to enable reading the symbols.
  • line 11 reads the assembly using the configured reader parameters and stores the AssemblyDefinition into a private property 'InjectionTargetAssembly' so it can be accessed in later stages.

 

GetModuleInitializerMethod()

After the target assembly was read, the GetModuleInitializerMethod is called, which locates the 'Initialize' method that should be called by the injected constructor. After it has been located, some validation is done to ensure the call to this method can actually be made.

Note that in the following snippet the className and methodName are provided as parameters. In my proof of concept, I retrieved these via command line parameters of the injector program. They correspond with the classname/methodname of the target assembly's module initializer class and method as defined in the previous subchapter The Trick.

private MethodReference GetModuleInitializerMethod(string className, string methodName)
{
	if (InjectionTargetAssembly == null)
	{
		throw new InjectionException("Unable to determine ModuleInitializer: InjectionTargetAssembly is null");
	}

	// Retrieve the ModuleInitializer Class
	var moduleInitializerClass = InjectionTargetAssembly.MainModule.Types.FirstOrDefault(t => t.Name == className);
	if (moduleInitializerClass == null)
	{
		throw new InjectionException($"No type found named '{className}'");
	}

	// Retrieve the ModuleInitializer method 
	var resultMethod = moduleInitializerClass.Methods.FirstOrDefault(m => m.Name == methodName);
	if (resultMethod == null)
	{
		throw new InjectionException($"No method named '{methodName}' exists in the type '{moduleInitializerClass.FullName}'");
	}

	// Validate the found method
	if (resultMethod.Parameters.Count > 0)
	{
		throw new InjectionException("Module initializer method must not have any parameters");
	}
        
    // Initialize method cannot be private or protected
	if (resultMethod.IsPrivate || resultMethod.IsFamily)
	{
		throw new InjectionException("Module initializer method may not be private or protected, use public or internal instead");
	}

    //Return type must be void
	if (!resultMethod.ReturnType.FullName.Equals(typeof(void).FullName))
	{
		throw new InjectionException("Module initializer method must have 'void' as return type");
	}

    // Method must be static
	if (!resultMethod.IsStatic)
	{
		throw new InjectionException("Module initializer method must be static");
	}

	return resultMethod;
}
  • line 1 is the method definition. Note that it returns a MethodReference type which references the details of the Module Initialize method inside the assembly.
  • line 9 attempts to find the initializer class in the assembly's main module.
  • line 16 attempts to find the initialize method in the initializer class found in line 9.
  • lines 23 and further make sure the initialize method has the required features:
  • It should be parameterless
  • It must be public or internal (preferably internal)
  • Its return type should be void
  • It must be static
  • Special note for line 35. The reason this void-type comparison is made using the full names, is that we don't want to compare types themselves. The void type in the current CLR, may differ from the one for which the target assembly is created. This would result in a false negative.

Now the assembly is read and the method is located that should be called by the constructor, we're ready to modify the assembly.

 

InjectInitializer()

This method is where the magic happens.

private void InjectInitializer(MethodReference initializer)
{
	if (initializer == null)
	{
		throw new ArgumentNullException(nameof(initializer));
	}

	const MethodAttributes Attributes = MethodAttributes.Static | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName;

	var initializerReturnType = InjectionTargetAssembly.MainModule.Import(initializer.ReturnType);

	// Create a new method .cctor (a static constructor) inside the Assembly  
	var cctor = new MethodDefinition(".cctor", Attributes, initializerReturnType);
	var il = cctor.Body.GetILProcessor();
	il.Append(il.Create(OpCodes.Call, initializer));
	il.Append(il.Create(OpCodes.Ret));

	var moduleClass = InjectionTargetAssembly.MainModule.Types.FirstOrDefault(t => t.Name == "<Module>");

	if (moduleClass == null)
	{
		throw new InjectionException("No module class found");
	}

	moduleClass.Methods.Add(cctor);
}
  • line 8 defines a constant which will be important later. This constant defines the attribute of the static constructor method that we inject into the assembly. Module initializers should have the following attributes:
  • Static - Indicates that the method is defined on the type; otherwise, it is defined per instance, which is something we don't want in this case.
  • SpecialName - Indicates that the method is special. The name describes how this method is special.
  • RTSpecialName - Indicates that the common language runtime checks the name encoding.
  • line 10 determines the return type of the static constructor by hijacking it from the initializer-method's return type. This will always be void (we validated it before). The reason why we don't want to use typeof(void) here, is the same why we didn't compare void types before (line 35 in the GetModuleInitializerMethod()-method): The void type of the target assembly maybe different from the one in the current CLR.
  • line 13 creates the method definition for the static constructor (or .cctor) with the given attributes and proper return type.
  • line 14 gets an ILProcessor object which can be used to further modify the method body and inject whatever IL code we want to add.
  • line 15 adds a call to the inject-method to the ILProcessor object
  • line 16 adds a 'return' to the ILProcessor object. This will exit the static constructor (in technical terms: pushing a return value from the callee's evaluation stack onto the caller's evaluation stack).
  • line 18 gets the main module itself. This is the module to which we want to add our new static constructor method.
  • line 25 adds the new static constructor to the module.

All this above is still done in memory; nothing has been changed to the assembly yet. This is what the last method in this sequence is for.

 

WriteAssembly()

The WriteAssembly method saves the changes to the assembly and modifies the .pdb file with respect to these changes. It is quite similar to the ReadInjectionTargetAssembly method we defined before.

private void WriteAssembly(string assemblyFile, string keyfile)
{
	if (InjectionTargetAssembly == null)
	{
		throw new InjectionException("Unable to write the Injection TargetAssembly: InjectionTargetAssembly is null");
	}

	var writeParams = new WriterParameters();

	if (GetPdbFilePath(assemblyFile) != null)
	{
		writeParams.WriteSymbols = true;
		writeParams.SymbolWriterProvider = new PdbWriterProvider();
	}

	if (keyfile != null)
	{
		writeParams.StrongNameKeyPair = new StrongNameKeyPair(File.ReadAllBytes(keyfile));
	}

	InjectionTargetAssembly.Write(assemblyFile, writeParams);
}
  • line 8 creates a new instance of the class we use to configure the write process.
  • line 10 determines whether a .pdb file is present. If so, lines 12 and 13 configure the writer to output to the .pdb file as well.
  • If a keyfile was provided, lines 16-19 will read its contents and use it to generate a strong named assembly.
  • line 21 writes the changes to the assembly using the configured parameters.

 

Source code

All of the code in this article can be found in my GitHub project. It contains 3 solutions in 2 projects:

  • [Solution] Coen.Utilities
  • Coen.Utilities - This is the project in which a third-party assembly (SharpZipLib) is be embedded and resolved. The library itself contains a class that makes a simple call to the SharpZipLib, forcing the Coen.Utilities library to resolve it at runtime.
  • TestApplication - This project calls the Coen.Utilities library to test whether the embedded assembly could be resolved
  • [Solution] Injector
  • Injector - This is the code for the tool that modifies the IL to create a module initializer for an assembly.

Closing thoughts

This may not be a solution to an everyday problem. However, I found it very useful and, once in place, a very elegant way to handle this problem. Credits to Einar Egilsson on who's work this code was built. Moreover, IL weaving is an interesting technique and Mono.Cecil a powerful library that can be used to do much more than just create module initializers.

posted on 2019-05-10 11:02  空明流光  阅读(1217)  评论(0编辑  收藏  举报

导航