Calling Code Dynamically
I've always thought that one of the coolest things about the CLR is the ability to generate and hook in code at runtime. Sure, in C++, you could use LoadLibrary(), but you had to write some complicated code, and it never felt native to the language. And if you wanted to generate code at runtime, you needed the C++ compiler to do it.
A few weeks ago, I was in a discussion with a friend on the CLR team, and he mentioned that they were improving the performance of calling through delegates for Whidbey. I had seen a number of customer questions about writing add-in architectures recently, so I decided to explore the performance aspects of various methods of calling code dynamically.
To do so, I had to decide what scenario I wanted to measure. When doing benchmarking, this is often a contentious issue, as differing choices of scenarios can have huge impacts on the numbers you get out. I decided to measure the speed at which I could call a simple method that incremented the value that was passed in:
I chose a method like this because it takes a parameter and returns a value that depends on that parameter. This means that it can't get optimized away, which is a common failing during benchmarking. It is on the small side as functions go, which means that it will show differences between methods more than larger functions. That also means that the results are more representative of theoretical speeds, and less representative of real-world numbers.
I came up with several different ways to call this function.
Direct Call
This one is pretty obvious. Given an instance of the Processor class, write the following code:
Using Type.InvokeMember()
The Type class provides a generalized way to call any method on an instance of a class, based on the string name of the method. This is very convenient if you don't know the name of the method at compile time. Here's the code:
This is obviously more complicated than the direct call to the function. It requires the creation of a temporary object to pass the parameter, and boxing of both the parameter and the return value. If you were guessing that this is slower than the direct call, you'd be correct. To find out how much slower, you'll need to wait for the results.
Calling Through an Interface
An interface is a nice way to specify the contract a class should meet without specifying the name of the class:
Calling through an interface is very clean, akin to calling directly. In fact, the code looks like the direct call:
The only difference is that processor here is an IProcessor variable rather than a Processor variable. Since the interface provides a degree of indirection, we'd expect it to be slower than the direct case.
Calling Through a Delegate
Another more loosely coupled way of calling is by using a delegate. Here's the code to do that:
Like the interface case, there's some indirection here.
Creating a Custom Delegate
In the previous example, I created the delegate instance directly, which required me to hard-code the method I wanted to call, and for me to have a delegate type that matches the method I want to call.
In some scenarios, you don't know the signature of the method you want to call until runtime. Since I knew that using Type.InvokeMember() was going to be slow, I wanted a better way of calling a member.
A look at the Delegate class shows that it has a CreateDelegate() method that takes an instance and a method name. Unfortunately, it also requires that you pass in the Type object of the delegate that you want to create. That's fine if you know ahead of time what kind of method you want to call, but not if you're trying to do things dynamically.
Therefore, I decided to use reflection to look at the method I was trying to call and then create a custom delegate type on the fly using the classes in the Reflection.Emit namespace. I could then use Delegate.CreateDelegate() to create a delegate of this type, and then call it using DynamicInvoke():
As with the call to Type.InvokeMember(), there is boxing of both the parameter and the return type. There is also the overhead of creating the delegate type on the fly, which is non-trivial.
Creating a Custom Class
Creating a delegate type on the fly is great if you need to support many different types of methods, but in this example, there's only one type that I care about. That means that I can also do what I want by creating a custom class that implements the IProcessor interface, and having that custom class call through to the real class. In this case, it will be the Processor class, which implements IProcessor itself, but generally, that wouldn't be the case.
The custom class takes an instance of the wrapped class as a constructor parameter, saves it away, and then uses it to call the appropriate function when the wrapper's Process() function is called.
Welcome to Function Call Raceway
Now that I had all of the different versions written, I needed a way to compare the results. To do that, you can hand-time your programs, which is okay to get a general sense of speed, but not very rigorous. So, I needed a way to do some automatic timing.
You can use DateTime.Now to time execution, but it doesn't have good resolution, so it only gives good data if your timings are several seconds long. I elected to use interop to talk to the NT performance counters, which are good at timing sub-second operations.
It's fairly simple to collect the data on each method and then write it out to a text file, import it into Excel, and then look at the results, but it's not very exciting. I therefore wrote a small Windows Forms application that collects the data for all the methods, and then runs a virtual race animation to show who wins and loses.
That also allowed me to run the race in two waysone where I ignore the overhead of creating the custom delegate or class, and one where I count that time.
Results
Before we get to the results, I'd like to talk a bit about what they mean.
Because the method we're calling does a trivial amount of work, it will accentuate the differences that you see between different methods. The more work the method does, the less difference there will be between the different methods of calling, and therefore big differences here may be inconsequential in actual code.
Figure 1 is a pretty picture of the results for Visual Studio 7.1, when calling a method 100,000 times:
Figure 1. Screenshot of results
It isn't surprising that the direct call is the fastest. The call that is being made can be easily put inline by the JIT, which means that there is no call overhead. Interfaces suffer compared to direct calls because they can't be put inline (and this is true of virtual calls as well). Delegates are a bit slower than interfaces.
InvokeMember() is very slow, with interface calls being some 200 times faster than InvokeMember() calls. InvokeMember() does a tremendous amount of work to make sure it is called safely and that it is calling the right method, and this shows. Similarly for the custom delegate case, calling DynamicInvoke() does a lot of work to make sure the call is okay.
The CustomClass version works okay, but it suffers from a lot of overhead in creating the custom class in the first place, which means it's only about 5 percent as fast as a direct call for 100,000 calls. As the number of calls goes up, the overhead will matter less, and it will be roughly the speed of the interface call.
What About Whidbey?
In Whidbey, the results are much the same, with the exception of the delegate case, which has been optimized, and now provides similar performance to interfaces. So, if you were unhappy with the performance of delegates, you should be happier with what you get in Whidbey.
Originally created by Eric Gunnerson at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncscol/html/csharp02172004.asp
I've always thought that one of the coolest things about the CLR is the ability to generate and hook in code at runtime. Sure, in C++, you could use LoadLibrary(), but you had to write some complicated code, and it never felt native to the language. And if you wanted to generate code at runtime, you needed the C++ compiler to do it.
A few weeks ago, I was in a discussion with a friend on the CLR team, and he mentioned that they were improving the performance of calling through delegates for Whidbey. I had seen a number of customer questions about writing add-in architectures recently, so I decided to explore the performance aspects of various methods of calling code dynamically.
To do so, I had to decide what scenario I wanted to measure. When doing benchmarking, this is often a contentious issue, as differing choices of scenarios can have huge impacts on the numbers you get out. I decided to measure the speed at which I could call a simple method that incremented the value that was passed in:
Code:
public class Processor
{
public int Process(int value)
{
return value + 1;
}
}
I chose a method like this because it takes a parameter and returns a value that depends on that parameter. This means that it can't get optimized away, which is a common failing during benchmarking. It is on the small side as functions go, which means that it will show differences between methods more than larger functions. That also means that the results are more representative of theoretical speeds, and less representative of real-world numbers.
I came up with several different ways to call this function.
Direct Call
This one is pretty obvious. Given an instance of the Processor class, write the following code:
Code:
int value = processor.Process(i);
Using Type.InvokeMember()
The Type class provides a generalized way to call any method on an instance of a class, based on the string name of the method. This is very convenient if you don't know the name of the method at compile time. Here's the code:
Code:
Type t = typeof(Processor);
int value =
(int) t.InvokeMember(
"Process",
BindingFlags.Instance | BindingFlags.Public |
BindingFlags.InvokeMethod,
null, processor, new object[] {i});
This is obviously more complicated than the direct call to the function. It requires the creation of a temporary object to pass the parameter, and boxing of both the parameter and the return value. If you were guessing that this is slower than the direct call, you'd be correct. To find out how much slower, you'll need to wait for the results.
Calling Through an Interface
An interface is a nice way to specify the contract a class should meet without specifying the name of the class:
Code:
public interface IProcessor
{
int Process(int value);
}
Code:
int value = processor.Process(i);
The only difference is that processor here is an IProcessor variable rather than a Processor variable. Since the interface provides a degree of indirection, we'd expect it to be slower than the direct case.
Calling Through a Delegate
Another more loosely coupled way of calling is by using a delegate. Here's the code to do that:
Code:
public delegate int ProcessCaller(int value);
ProcessCaller processCaller = new ProcessCaller(processor.Process);
int value = processCaller(i);
Like the interface case, there's some indirection here.
Creating a Custom Delegate
In the previous example, I created the delegate instance directly, which required me to hard-code the method I wanted to call, and for me to have a delegate type that matches the method I want to call.
In some scenarios, you don't know the signature of the method you want to call until runtime. Since I knew that using Type.InvokeMember() was going to be slow, I wanted a better way of calling a member.
A look at the Delegate class shows that it has a CreateDelegate() method that takes an instance and a method name. Unfortunately, it also requires that you pass in the Type object of the delegate that you want to create. That's fine if you know ahead of time what kind of method you want to call, but not if you're trying to do things dynamically.
Therefore, I decided to use reflection to look at the method I was trying to call and then create a custom delegate type on the fly using the classes in the Reflection.Emit namespace. I could then use Delegate.CreateDelegate() to create a delegate of this type, and then call it using DynamicInvoke():
Code:
Type delegateType = CreateCustomDelegate(methodInfo);
Delegate p = Delegate.CreateDelegate(delegateType,
process, "Process");
int value = (int) p.DynamicInvoke(new object[] {i});
As with the call to Type.InvokeMember(), there is boxing of both the parameter and the return type. There is also the overhead of creating the delegate type on the fly, which is non-trivial.
Creating a Custom Class
Creating a delegate type on the fly is great if you need to support many different types of methods, but in this example, there's only one type that I care about. That means that I can also do what I want by creating a custom class that implements the IProcessor interface, and having that custom class call through to the real class. In this case, it will be the Processor class, which implements IProcessor itself, but generally, that wouldn't be the case.
The custom class takes an instance of the wrapped class as a constructor parameter, saves it away, and then uses it to call the appropriate function when the wrapper's Process() function is called.
Welcome to Function Call Raceway
Now that I had all of the different versions written, I needed a way to compare the results. To do that, you can hand-time your programs, which is okay to get a general sense of speed, but not very rigorous. So, I needed a way to do some automatic timing.
You can use DateTime.Now to time execution, but it doesn't have good resolution, so it only gives good data if your timings are several seconds long. I elected to use interop to talk to the NT performance counters, which are good at timing sub-second operations.
It's fairly simple to collect the data on each method and then write it out to a text file, import it into Excel, and then look at the results, but it's not very exciting. I therefore wrote a small Windows Forms application that collects the data for all the methods, and then runs a virtual race animation to show who wins and loses.
That also allowed me to run the race in two waysone where I ignore the overhead of creating the custom delegate or class, and one where I count that time.
Results
Before we get to the results, I'd like to talk a bit about what they mean.
Because the method we're calling does a trivial amount of work, it will accentuate the differences that you see between different methods. The more work the method does, the less difference there will be between the different methods of calling, and therefore big differences here may be inconsequential in actual code.
Figure 1 is a pretty picture of the results for Visual Studio 7.1, when calling a method 100,000 times:
Figure 1. Screenshot of results
It isn't surprising that the direct call is the fastest. The call that is being made can be easily put inline by the JIT, which means that there is no call overhead. Interfaces suffer compared to direct calls because they can't be put inline (and this is true of virtual calls as well). Delegates are a bit slower than interfaces.
InvokeMember() is very slow, with interface calls being some 200 times faster than InvokeMember() calls. InvokeMember() does a tremendous amount of work to make sure it is called safely and that it is calling the right method, and this shows. Similarly for the custom delegate case, calling DynamicInvoke() does a lot of work to make sure the call is okay.
The CustomClass version works okay, but it suffers from a lot of overhead in creating the custom class in the first place, which means it's only about 5 percent as fast as a direct call for 100,000 calls. As the number of calls goes up, the overhead will matter less, and it will be roughly the speed of the interface call.
What About Whidbey?
In Whidbey, the results are much the same, with the exception of the delegate case, which has been optimized, and now provides similar performance to interfaces. So, if you were unhappy with the performance of delegates, you should be happier with what you get in Whidbey.
Originally created by Eric Gunnerson at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncscol/html/csharp02172004.asp
Last edited: