Microsoft’s



Yüklə 37 Kb.
tarix07.11.2018
ölçüsü37 Kb.
#78918

.NET


Matthew Conover, May 2002

Introduction to .NET


The name “.NET” is misleading. The name likely comes from Microsoft’s original idea of web services. However, .NET is much more than that. It is a framework that includes an execution engine (i.e., a virtual machine) to allow applications to be platform-independent (like Java). Unlike Java, it is language independent. Programs can be written in any .NET-supported language.

Major Components


.NET has two major components: the Base Class Library (BCL) and Common Language Runtime (CLR).

The BCL is similar to Java’s namespace and has classes to support things such as IO, threading, database access, text, graphics, socket programming, security, cryptography, COM, etc.



The CLR is made up of subcomponents: the Common Type Specification (CTS), Common Language Specification (CLS), and Execution Engine. The CTS specifies certain types a language has to support to be hosted by the CLR. It specifies rules for class, struct, enums, interface, delegate, etc. Everything is actually implemented as an object (including int’s, char’s, etc.). This is an important concept because it prevents buffer overflows in managed code. For example, assume you have a int[] buffer that is 10 bytes long. While it may appear that you have an array of 10 integers sequential in memory, it’s actually a System.Array. Attempting to write past the end will simply result in an exception being thrown. The CLS is a set of guidelines that must be followed by an application or .NET implementation wanting to be CLS-compliant must follow. The Execution Engine handles object references/layouts, garbage collection, access checks, safe vs. unsafe methods, and is the location of the JIT (just-in-time) compiler that converts Microsoft Intermediate Language (MSIL) into native code. The MSIL code is the bytecode .NET uses. This what allows the application to be platform-independent.

Assembly


An assembly is just like a typical Windows application (specified in Portable Execution). The difference is that it fully describes the functions, types, and classes used within the assembly. In other words, it is self-describing. Each assembly is made up of: a manifest, metadata, MSIL code, and resources. Assemblies can be spread across several files, and it is the manifest’s responsibility to provide the information on where each of these files is located. The MSIL code is stored before the Metadata in the .text section. Resources contain the icons, images, and other data. The metadata is the central piece of a .NET assembly.

Metadata


All of the classes, types, constants, etc. used by a .NET application are stored in the Metadata. The metadata is divided into several streams, or heaps. On Microsoft.NET, there are four heaps: #US, #Strings, #Blob, and #~. The #US is a user string heap where all strings used in user code are located. For example, if a program does Print(“hello”), the “hello” will be kept in the #US heap. The #Strings heap contains things like method names, file names, etc. The #Blob heap contains binary data referenced by the assembly, such as method signatures. The #~ contains a bunch of important tables. These tables are in predefined order and define the important contents of the .NET assembly, such as AssemblyRef, MethodRef, MethodDef, and Param tables. The AssemblyRef table includes a list of external assemblies this assembly depends on. The MethodRef table includes a list of external methods this assembly uses. The MethodDef table contains all the methods defined in the assembly. The Param table contains all the parameters to all of the methods in the MethodDef table. There are twenty or thirty different tables.
Method Table

Each row in the method table contains the RVA (relative virtual address) of the method, implementation flags, method flags, the method name, an offset in the #Blob heap to the method’s signature, and an index into the Params table that contains the first parameter of the function. The RVA states the relative address of the method body (which contains the IL) in the .text section. Implementation flags specify whether it is a native or managed application, whether it is a special method (constructor/property/event) or a normal method, etc. The method signature specifies the calling convention, return type, and parameter types of a .NET application.


MSIL

MSIL is a pseudo-assembly language. It has commands such as nop, break, call, ret, etc. Rather than using offsets like x86 calls and jumps, it uses a method token. The JIT compiler will look up the method token and replace it with the offset when it converts the IL into native code. The JIT uses a stack-based model. Calling Print(1, 2, 3) would result in the following:

Ldc.i4.1

Ldc.i4.2


Ldc.i4.3

Call Print

This is loading the argument stack with 1, 2, and 3. Arguments to methods as well as the return values from methods are always placed on the stack. There are opcodes to handle all of these cases. Ldarg loads a value onto the stack, ldloc loads the value of a local variable onto the stack, ldlen loads the length of an array onto the stack, ldc loads a constant value on the stack, etc. Likewise, there are st* functions which store values from the stack to various locations. There are many other supporting instructions—it is very well defined.

Microsoft’s .NET Implementation


The Microsoft .NET Framework is stored in %SystemRoot%\Microsoft.NET, where SystemRoot is usually \WinNT or \Windows. This is where the executables, DLLs, and configuration files that make up the .NET Framework are located. In addition, there is something known as the Global Assembly Cache (GAC) which is stored in %SystemRoot%\Assembly. It is a quick and easy way to move assemblies from one machine to another. Once an assembly is put into the GAC, all other assemblies can reference it. This is where the system libraries to the BCL are kept. In addition, there is a tool that will convert IL into native code in the assembly. Since the JIT isn’t needed, the code runs much faster. For this reason, the BCL is stored in the GAC in native form. If .NET notices that there is a native version of an assembly available, it will use that instead for reasons of speed. There are five important libraries used in loading a .NET application:

  • Mscoree.dll (the Execution Engine)

  • Mscorwks.dll (where most of the stuff happens)

  • Mscorjit.dll (the JIT)

  • Mscorsn.dll (handles strong name verification)

  • Mscorlib.dll (the BCL)

  • Fushion.dll (assembly binding)

A .NET application will only have a single instruction at its entry point. It is an instruction to jump into the _CorExeMain entry of the Import Address Table. This will reference _CorExeMain of mscoree.dll, which will begin the process of loading the .NET application. Mscoree.dll is the only DLL imported by a .NET application. It calls _CorExeMain of mscorwks.dll. Mscorwks.dll is large DLL where most of the loading happens. This will load the BCL and then call the .NET application’s Main() function. Because Main() hasn’t been decompiled yet, the code of Main() will jump back in mscorwks.dll to be compiled. This will call JITFunction, which will load the JIT from mscorjit.dll. Once the IL code has been compiled into native code, control returns back to Main() and it begins executing.


.NET Hook Library

The .NET Hook Library (dotNetHookLibrary) is a library that allows inserting “hook code” into any method defined in an assembly. An API is provided to register a hook function which will then have the opportunity to inject (or not inject) the hook code into that particular method. It exports Load, Hook, and Save. Load will load the assembly, Hook will specify a callback hook function that is called to hook each method, and Save, which will save the changes to the assembly.

The contents of the hook code are up to the user-specified callback function. This allows a user of dotNetHookLibrary to decide whether to and how to hook a method based on its name, parameters, type, etc. When a user decides to hook a method, dotNetHookLibrary creates a copy of the method header, method body, and exception handling data. It will insert hooking code at the beginning of the method body. For the purposes of demonstration, it will simply insert code that will call a Print() method (which must exist) and print out the name of the method called. The hooking code could do anything, however. It could check the parameter types and print out their values also—which would allow prying into the behavior of a .NET application. .NET Hook has the opportunity to generate dynamic hook code.

First, the hooked methods are stored in full at the end of the current .text section. This must be done because the .text section of a .NET assembly is fairly compact and there usually won’t be enough room to store all the hooked functions. One may ask why I don’t simply change the RVA of the method to the hooking code and then have the hooking code all the original function. The reason, as I described earlier, is that calls and jumps in IL use method tokens rather than offsets. So if I tried to reference the code I hooked, I would need to do so by the method token. Since I changed the function’s RVA to point to the hook code, it will call the hook code again rather than the original method. The only way to call the original code of the method would be to great it as another method. However, there are two reasons that approach isn’t feasible: (1) that would require adding a new entry to the method table, and (2) there is no space available to add a new entry.

Second, the RVA of the method body in the MethodDef table is updated to point to the location of the new hooked method.

Third, the size of the .text section is increased to accommodate it. The are both virtual and raw sizes that need to be accounted for. The virtual size is the actual size of the section and the raw size is rounded up to the next file alignment. Virtual addresses and size keep track of how an executable file will be loaded in memory. If, for example, the .text section has a virtual address of 0x1000, then looking at offset 0x1000 of a running process’s memory is where the .text section is located. However, the raw address of the .text section may be 0x200, which means that the .text section in the actual file is located at offset 0x200.

Fourth, the subsequent sections, such as the data and relocation sections, are shifted to the next section and file alignments. This must be done when the end of the new .text section exceeds the beginning of the next section. When this happens, the next section has to be moved to the next section aligned address.

Finally, the PE headers are updated. The hooking code is now directly embedded into the assembly. The non-hooked methods have been left undisturbed.


Conclusion

For more information, see http://dotnethook.sourceforge.net. Thanks to Entercept’s Ricochet team (http://www.entercept.com/ricochet) and w00w00 (http://www.w00w00.org) for their feedback.
Yüklə 37 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə