[ACCEPTED]-Benchmarking small code samples in C#, can this implementation be improved?-profiling

Accepted answer
Score: 102

Here is the modified function: as recommended 5 by the community, feel free to amend this 4 its a community wiki.

static double Profile(string description, int iterations, Action func) {
    //Run at highest priority to minimize fluctuations caused by other processes/threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    Thread.CurrentThread.Priority = ThreadPriority.Highest;

    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    return watch.Elapsed.TotalMilliseconds;
}

Make sure you compile in Release with optimizations enabled, and run the tests outside of Visual Studio. This 3 last part is important because the JIT stints 2 its optimizations with a debugger attached, even 1 in Release mode.

Score: 23

Finalisation won't necessarily be completed 8 before GC.Collect returns. The finalisation is queued 7 and then run on a separate thread. This 6 thread could still be active during your 5 tests, affecting the results.

If you want 4 to ensure that finalisation has completed 3 before starting your tests then you might 2 want to call GC.WaitForPendingFinalizers, which will block until the 1 finalisation queue is cleared:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
Score: 16

If you want to take GC interactions out 19 of the equation, you may want to run your 18 'warm up' call after the GC.Collect call, not 17 before. That way you know .NET will already 16 have enough memory allocated from the OS 15 for the working set of your function.

Keep 14 in mind that you're making a non-inlined 13 method call for each iteration, so make 12 sure you compare the things you're testing 11 to an empty body. You'll also have to accept 10 that you can only reliably time things that 9 are several times longer than a method call.

Also, depending 8 on what kind of stuff you're profiling, you 7 may want to do your timing based running 6 for a certain amount of time rather than 5 for a certain number of iterations -- it 4 can tend to lead to more easily-comparable 3 numbers without having to have a very short 2 run for the best implementation and/or a 1 very long one for the worst.

Score: 7

I think the most difficult problem to overcome 17 with benchmarking methods like this is accounting 16 for edge cases and the unexpected. For example 15 - "How do the two code snippets work under 14 high CPU load/network usage/disk thrashing/etc." They're 13 great for basic logic checks to see if a 12 particular algorithm works significantly faster than 11 another. But to properly test most code 10 performance you'd have to create a test 9 that measures the specific bottlenecks of 8 that particular code.

I'd still say that 7 testing small blocks of code often has little 6 return on investment and can encourage using 5 overly complex code instead of simple maintainable 4 code. Writing clear code that other developers, or 3 myself 6 months down the line, can understand 2 quickly will have more performance benefits 1 than highly optimized code.

Score: 6

I'd avoid passing the delegate at all:

  1. Delegate call is ~ virtual method call. Not cheap: ~ 25% of smallest memory allocation in .NET. If you're interested in details, see e.g. this link.
  2. Anonymous delegates may lead to usage of closures, that you won't even notice. Again, accessing closure fields is noticeably than e.g. accessing a variable on the stack.

An 3 example code leading to closure usage:

public void Test()
{
  int someNumber = 1;
  Profiler.Profile("Closure access", 1000000, 
    () => someNumber + someNumber);
}

If 2 you're not aware about closures, take a 1 look at this method in .NET Reflector.

Score: 5

I'd call func() several times for the warm-up, not 1 just one.

Score: 4

Suggestions for improvement

  1. Detecting if the execution environment is 45 good for benchmarking (such as detecting 44 if a debugger is attached or if jit optimization 43 is disabled which would result in incorrect 42 measurements).

  2. Measuring parts of the code 41 independently (to see exactly where the 40 bottleneck is).

  3. Comparing different versions/components/chunks of code (In your first sentence you say '... benchmarking small chunks of code to see which implementation is fastest.').

Regarding #1:

  • To detect if a debugger is 39 attached, read the property System.Diagnostics.Debugger.IsAttached (Remember to 38 also handle the case where the debugger 37 is initially not attached, but is attached 36 after some time).

  • To detect if jit optimization 35 is disabled, read property DebuggableAttribute.IsJITOptimizerDisabled of the relevant 34 assemblies:

    private bool IsJitOptimizerDisabled(Assembly assembly)
    {
        return assembly.GetCustomAttributes(typeof (DebuggableAttribute), false)
            .Select(customAttribute => (DebuggableAttribute) customAttribute)
            .Any(attribute => attribute.IsJITOptimizerDisabled);
    }
    

Regarding #2:

This can be done in many ways. One 33 way is to allow several delegates to be 32 supplied and then measure those delegates 31 individually.

Regarding #3:

This could also be done in 30 many ways, and different use-cases would 29 demand very different solutions. If the 28 benchmark is invoked manually, then writing 27 to the console might be fine. However if 26 the benchmark is performed automatically 25 by the build system, then writing to the 24 console is probably not so fine.

One way 23 to do this is to return the benchmark result 22 as a strongly typed object that can easily 21 be consumed in different contexts.


Etimo.Benchmarks

Another 20 approach is to use an existing component 19 to perform the benchmarks. Actually, at 18 my company we decided to release our benchmark 17 tool to public domain. At it's core, it 16 manages the garbage collector, jitter, warmups 15 etc, just like some of the other answers 14 here suggest. It also has the three features 13 I suggested above. It manages several of 12 the issues discussed in Eric Lippert blog.

This is an example 11 output where two components are compared 10 and the results are written to the console. In 9 this case the two components compared are 8 called 'KeyedCollection' and 'MultiplyIndexedKeyedCollection':

Etimo.Benchmarks - Sample Console Output

There 7 is a NuGet package, a sample NuGet package and the source code is available 6 at GitHub. There is also a blog post.

If you're in a hurry, I 5 suggest you get the sample package and simply 4 modify the sample delegates as needed. If 3 you're not in a hurry, it might be a good 2 idea to read the blog post to understand 1 the details.

Score: 1

You must also run a "warm up" pass prior 2 to actual measurement to exclude the time 1 JIT compiler spends on jitting your code.

Score: 1

Depending on the code you are benchmarking 11 and the platform it runs on, you may need 10 to account for how code alignment affects performance. To do so would probably 9 require a outer wrapper that ran the test 8 multiple times (in separate app domains 7 or processes?), some of the times first 6 calling "padding code" to force 5 it to be JIT compiled, so as to cause the 4 code being benchmarked to be aligned differently. A 3 complete test result would give the best-case 2 and worst-case timings for the various code 1 alignments.

Score: 1

If you're trying to eliminate Garbage Collection 6 impact from the benchmark complete, is it 5 worth setting GCSettings.LatencyMode?

If not, and you want the 4 impact of garbage created in func to be part 3 of the benchmark, then shouldn't you also 2 force collection at the end of the test 1 (inside the timer)?

Score: 0

The basic problem with your question is 21 the assumption that a single measurement 20 can answer all your questions. You need 19 to measure multiple times to get an effective 18 picture of the situation and especially 17 in a garbage collected langauge like C#.

Another 16 answer gives an okay way of measuring the 15 basic performance.

static void Profile(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

However, this single measurement 14 does not account for garbage collection. A 13 proper profile additionally accounts for 12 the worst case performance of garbage collection 11 spread out over many calls (this number 10 is sort of useless as the VM can terminate 9 without ever collecting left over garbage 8 but is still useful for comparing two different implementations 7 of func.)

static void ProfileGarbageMany(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

And one might also want to measure 6 the worst case performance of garbage collection 5 for a method that is only called once.

static void ProfileGarbage(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();

        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

But 4 more important than recommending any specific 3 possible additional measurements to profile 2 is the idea that one should measure multiple different 1 statistics and not just one kind of statistic.

More Related questions