Tim! Posted October 17, 2009 Share Posted October 17, 2009 A common rendering inefficiency amongst many code samples is caused by the repetitive accessing of properties of the Rectangle class in for-loops in the Render method. Consider the following code sample. The second style is faster because it uses local variables to access the rectangle data. The first style makes repeated subroutine calls to access the rectangle data: Hidden Content: using System.Diagnostics; using System.Drawing; using System.Windows.Forms; namespace WindowsFormsApplication1 { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void buttonLoops_Click(object sender, System.EventArgs e) { Cursor.Current = Cursors.WaitCursor; Stopwatch sw = new Stopwatch(); Rectangle rect = new Rectangle(0, 0, 10000, 10000); string str = ""; sw.Start(); for (int y = rect.Top; y < rect.Bottom; y++) { for (int x = rect.Left; x < rect.Right; x++) { int result = 0; } } sw.Stop(); str += "Loop 1: " + sw.ElapsedTicks + "\r\n"; sw.Reset(); sw.Start(); int right = rect.Right; int left = rect.Left; int top = rect.Top; int bottom = rect.Bottom; for (int y = top; y < bottom; y++) { for (int x = left; x < right; x++) { int result = 0; } } sw.Stop(); str += "Loop 2: " + sw.ElapsedTicks + "\r\n"; str += "Ticks per second: " + Stopwatch.Frequency; MessageBox.Show(str); Cursor.Current = Cursors.Default; } } } Another common problem is unecessary recalculation of for-loop limits every time the loop is executed. Limits should be calculated before the for-loops. In Example 2 below, for an image of 800x600px, it saves 480598 computations (600 + 800 x 600 - 2) in comparison to Example 1: Hidden Content: // // Example 1 // int a = ComputeA(); int b = ComputeB(); int left = rect.Left; int right = rect.Right; int top = rect.Top; int bottom = rect.Bottom; for (int y = top; y < bottom - a; y++) { for (int x = left; x < right - b; x++) { int result = 0; } } // // Example 2 // int a = ComputeA(); int b = ComputeB(); int left = rect.Left; int right = rect.Right - b; int top = rect.Top; int bottom = rect.Bottom - a; for (int y = top; y < bottom; y++) { for (int x = left; x < right; x++) { int result = 0; } } Quote Link to comment Share on other sites More sharing options...
Illnab1024 Posted October 17, 2009 Share Posted October 17, 2009 Just use csc with the optimize code option... or optimize code in visual studio project properties. It does all this. Also, I believe the Effects API in Paint.NET does some refactoring. Quote ~~ Link to comment Share on other sites More sharing options...
Tim! Posted October 17, 2009 Author Share Posted October 17, 2009 Just use csc with the optimize code option... or optimize code in visual studio project properties. It does all this.Also, I believe the Effects API in Paint.NET does some refactoring. I checked optimise code in project properties (Release). Loop 2 runs 3.64 times faster than Loop 1. Here's my ILDASM where Loop 1 uses call instance and Loop 2 uses ldloc.s: Hidden Content: .method private hidebysig instance void buttonLoops_Click(object sender, class [mscorlib]System.EventArgs e) cil managed { .maxstack 5 .locals init ( [0] class [system]System.Diagnostics.Stopwatch sw, [1] valuetype [system.Drawing]System.Drawing.Rectangle rect, [2] string str, [3] int32 y, [4] int32 x, [5] int32 left, [6] int32 right, [7] int32 top, [8] int32 bottom, [9] int32 V_9, [10] int32 V_10, [11] object CS$0$0000, [12] object[] CS$0$0001, [13] object CS$0$0002, [14] object[] CS$0$0003) L_0000: call class [system.Windows.Forms]System.Windows.Forms.Cursor [system.Windows.Forms]System.Windows.Forms.Cursors::get_WaitCursor() L_0005: call void [system.Windows.Forms]System.Windows.Forms.Cursor::set_Current(class [system.Windows.Forms]System.Windows.Forms.Cursor) L_000a: newobj instance void [system]System.Diagnostics.Stopwatch::.ctor() L_000f: stloc.0 L_0010: ldloca.s rect L_0012: ldc.i4.0 L_0013: ldc.i4.0 L_0014: ldc.i4 0x2710 L_0019: ldc.i4 0x2710 L_001e: call instance void [system.Drawing]System.Drawing.Rectangle::.ctor(int32, int32, int32, int32) L_0023: ldstr "" L_0028: stloc.2 L_0029: ldloc.0 L_002a: callvirt instance void [system]System.Diagnostics.Stopwatch::Start() L_002f: ldloca.s rect L_0031: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Top() L_0036: stloc.3 L_0037: br.s L_0059 L_0039: ldloca.s rect L_003b: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Left() L_0040: stloc.s x L_0042: br.s L_004a L_0044: ldloc.s x L_0046: ldc.i4.1 L_0047: add.ovf L_0048: stloc.s x L_004a: ldloc.s x L_004c: ldloca.s rect L_004e: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Right() L_0053: blt.s L_0044 L_0055: ldloc.3 L_0056: ldc.i4.1 L_0057: add.ovf L_0058: stloc.3 L_0059: ldloc.3 L_005a: ldloca.s rect L_005c: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Bottom() L_0061: blt.s L_0039 L_0063: ldloc.0 L_0064: callvirt instance void [system]System.Diagnostics.Stopwatch::Stop() L_0069: ldloc.2 L_006a: stloc.s CS$0$0000 L_006c: ldc.i4.4 L_006d: newarr object L_0072: stloc.s CS$0$0001 L_0074: ldloc.s CS$0$0001 L_0076: ldc.i4.0 L_0077: ldloc.s CS$0$0000 L_0079: stelem.ref L_007a: ldloc.s CS$0$0001 L_007c: ldc.i4.1 L_007d: ldstr "Loop 1: " L_0082: stelem.ref L_0083: ldloc.s CS$0$0001 L_0085: ldc.i4.2 L_0086: ldloc.0 L_0087: callvirt instance int64 [system]System.Diagnostics.Stopwatch::get_ElapsedTicks() L_008c: box int64 L_0091: stelem.ref L_0092: ldloc.s CS$0$0001 L_0094: ldc.i4.3 L_0095: ldstr "\r\n" L_009a: stelem.ref L_009b: ldloc.s CS$0$0001 L_009d: call string [mscorlib]System.String::Concat(object[]) L_00a2: stloc.2 L_00a3: ldloc.0 L_00a4: callvirt instance void [system]System.Diagnostics.Stopwatch::Reset() L_00a9: ldloc.0 L_00aa: callvirt instance void [system]System.Diagnostics.Stopwatch::Start() L_00af: ldloca.s rect L_00b1: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Left() L_00b6: stloc.s left L_00b8: ldloca.s rect L_00ba: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Right() L_00bf: stloc.s right L_00c1: ldloca.s rect L_00c3: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Top() L_00c8: stloc.s top L_00ca: ldloca.s rect L_00cc: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Bottom() L_00d1: stloc.s bottom L_00d3: ldloc.s top L_00d5: stloc.s V_9 L_00d7: br.s L_00f1 L_00d9: ldloc.s left L_00db: stloc.s V_10 L_00dd: br.s L_00e5 L_00df: ldloc.s V_10 L_00e1: ldc.i4.1 L_00e2: add.ovf L_00e3: stloc.s V_10 L_00e5: ldloc.s V_10 L_00e7: ldloc.s right L_00e9: blt.s L_00df L_00eb: ldloc.s V_9 L_00ed: ldc.i4.1 L_00ee: add.ovf L_00ef: stloc.s V_9 L_00f1: ldloc.s V_9 L_00f3: ldloc.s bottom L_00f5: blt.s L_00d9 L_00f7: ldloc.0 L_00f8: callvirt instance void [system]System.Diagnostics.Stopwatch::Stop() L_00fd: ldloc.2 L_00fe: stloc.s CS$0$0002 L_0100: ldc.i4.4 L_0101: newarr object L_0106: stloc.s CS$0$0003 L_0108: ldloc.s CS$0$0003 L_010a: ldc.i4.0 L_010b: ldloc.s CS$0$0002 L_010d: stelem.ref L_010e: ldloc.s CS$0$0003 L_0110: ldc.i4.1 L_0111: ldstr "Loop 2: " L_0116: stelem.ref L_0117: ldloc.s CS$0$0003 L_0119: ldc.i4.2 L_011a: ldloc.0 L_011b: callvirt instance int64 [system]System.Diagnostics.Stopwatch::get_ElapsedTicks() L_0120: box int64 L_0125: stelem.ref L_0126: ldloc.s CS$0$0003 L_0128: ldc.i4.3 L_0129: ldstr "\r\n" L_012e: stelem.ref L_012f: ldloc.s CS$0$0003 L_0131: call string [mscorlib]System.String::Concat(object[]) L_0136: stloc.2 L_0137: ldloc.2 L_0138: ldstr "Ticks per second: " L_013d: ldsfld int64 [system]System.Diagnostics.Stopwatch::Frequency L_0142: box int64 L_0147: call string [mscorlib]System.String::Concat(object, object, object) L_014c: stloc.2 L_014d: ldloc.2 L_014e: call valuetype [system.Windows.Forms]System.Windows.Forms.DialogResult [system.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) L_0153: pop L_0154: call class [system.Windows.Forms]System.Windows.Forms.Cursor [system.Windows.Forms]System.Windows.Forms.Cursors::get_Default() L_0159: call void [system.Windows.Forms]System.Windows.Forms.Cursor::set_Current(class [system.Windows.Forms]System.Windows.Forms.Cursor) L_015e: ret } Quote Link to comment Share on other sites More sharing options...
Tim! Posted October 17, 2009 Author Share Posted October 17, 2009 And when I switch off overflow/underflow checks, Loop 2 runs about six times faster than Loop 1. Quote Link to comment Share on other sites More sharing options...
Julian Beaumont Posted October 17, 2009 Share Posted October 17, 2009 If your running the tests from within visual studio then you need to go Debug -> Start without Debugging (Ctrl + F5), otherwise visual studio will suppress JIT optimization, even in Release mode. (or just run it from outside visual studio). Quote Link to comment Share on other sites More sharing options...
Tim! Posted October 17, 2009 Author Share Posted October 17, 2009 I'm running it by double-clicking on the exe in Windows Explorer. Quote Link to comment Share on other sites More sharing options...
Julian Beaumont Posted October 17, 2009 Share Posted October 17, 2009 Micro-benchmarks like this are mostly pointless IMO. Do you have an actual problem your trying to optimize? If so, have you profiled to see what the bottlenecks are? The C# compiler can't eliminate the loop or the calls to Left, Right, Top, or Bottom in the IL because it doesn't have access to the code in the assembly containing them, and hence, can't guarantee that they are side effect free. If they aren't hoisting the calls out of the loop changes the semantics. The JIT compiler doesn't have this limitation, it knows about all relevant assemblies and hence should be able to determine they are side effect free. Once it's done this and performed the loop-invariant code motion to lift the calls to Left, Right, etc out of the loop it's just left with an empty loop, and should eliminate it entirely. To avoid the loop being removed you need to perform some non-trivial computation in it and use/output the result. Finally, what absolute running times are you getting? If it's not at least a few seconds then it's too small to discount things like fluctuations in machine load as well. Quote Link to comment Share on other sites More sharing options...
Tim! Posted October 17, 2009 Author Share Posted October 17, 2009 I've tried the following plug-in experiment. It gives inconsistent results, but Loop 2 is usually a bit faster than Loop 1 (the JIT compiler has done its job well). The Windows Forms experiment gave a consistent and marked speed difference. So why the difference? Hidden Content: using System.Diagnostics; using System.Drawing; using System.Windows.Forms; using PaintDotNet; using PaintDotNet.Effects; namespace EffectsPluginTemplate1 { public class EffectPlugin : PaintDotNet.Effects.Effect { public static string StaticName { get { return "Effect Plugin"; } } public static Bitmap StaticIcon { get { return new Bitmap(typeof(EffectPlugin), "EffectPluginIcon.png"); } } public static string StaticSubMenuName { get { //return null; // Use for no submenu #if DEBUG return "_test"; // Use for custom submenu #else return SubmenuNames.Render; #endif } } public EffectPlugin() : base(EffectPlugin.StaticName, EffectPlugin.StaticIcon, EffectPlugin.StaticSubMenuName, EffectFlags.Configurable | EffectFlags.SingleThreaded) { show_msgbox = false; } ~EffectPlugin() { if (show_msgbox) { string str = ""; str += "Loop 1: " + loop1_ticks + "\r\n"; str += "Loop 2: " + loop2_ticks + "\r\n"; str += "Loop 1 / Loop 2: " + loop1_ticks / loop2_ticks + "\r\n"; str += "Ticks per second: " + Stopwatch.Frequency; MessageBox.Show(str); } } public override EffectConfigDialog CreateConfigDialog() { return new EffectPluginConfigDialog(); } protected override void OnSetRenderInfo(EffectConfigToken parameters, RenderArgs dstArgs, RenderArgs srcArgs) { loop1_ticks = 0; loop2_ticks = 0; show_msgbox = true; base.OnSetRenderInfo(parameters, dstArgs, srcArgs); } private Stopwatch sw = new Stopwatch(); private float loop1_ticks; private float loop2_ticks; private bool show_msgbox; public override void Render(EffectConfigToken parameters, RenderArgs dstArgs, RenderArgs srcArgs, Rectangle[] rois, int startIndex, int length) { PdnRegion selectionRegion = EnvironmentParameters.GetSelection(srcArgs.Bounds); sw.Reset(); sw.Start(); // // Loop 1 // for (int i = startIndex; i < startIndex + length; ++i) { Rectangle rect = rois[i]; for (int y = rect.Top; y < rect.Bottom; ++y) { for (int x = rect.Left; x < rect.Right; ++x) { // Render Code Here dstArgs.Surface[x, y] = srcArgs.Surface[rect.Right - x - 1, y]; } } } sw.Stop(); loop1_ticks += sw.ElapsedTicks; sw.Reset(); sw.Start(); // // Loop 2 // int r = startIndex + length; for (int i = startIndex; i < r; i++) { Rectangle rect = rois[i]; int left = rect.Left; int right = rect.Right; int top = rect.Top; int bottom = rect.Bottom; for (int y = top; y < bottom; y++) { for (int x = left; x < right; x++) { dstArgs.Surface[x, y] = srcArgs.Surface[right - x - 1, y]; } } } sw.Stop(); loop2_ticks += sw.ElapsedTicks; } } } Quote Link to comment Share on other sites More sharing options...
Rick Brewster Posted October 17, 2009 Share Posted October 17, 2009 The property accesses on Rectangle are the least of your troubles. All that Rectangle.get_Left() does, for instance, is "return this.X;". That will be inlined. Also, I believe the Effects API in Paint.NET does some refactoring. I'm not sure what you mean by this. It'd be news to me if it were transforming code. Quote The Paint.NET Blog: https://blog.getpaint.net/ Donations are always appreciated! https://www.getpaint.net/donate.html Link to comment Share on other sites More sharing options...
pyrochild Posted October 17, 2009 Share Posted October 17, 2009 Also, I believe the Effects API in Paint.NET does some refactoring. ...wut...? Quote ambigram signature by Kemaru [i write plugins and stuff] If you like a post, upvote it! Link to comment Share on other sites More sharing options...
Illnab1024 Posted October 18, 2009 Share Posted October 18, 2009 Sorry, I was tired -- better word choice would have been "The Effects API isn't entirely stupid." Quote ~~ Link to comment Share on other sites More sharing options...
Neil Cassidy Posted October 18, 2009 Share Posted October 18, 2009 Say I put a method call in a loop test. I know that the method is referentially transparent (for all intents and purposes). So intuition tells me that the jitter will extensively analyze all relevant assemblies, determine that the method is side-effect free, and lift the call out of the loop *. But the bytecode tells me that my method will be called a few million times. To resolve the ambiguity, I must either: 1. Accurately predict how the jitter will transform the loop on this machine, or 2. Conduct careful performance tests, or 2. Disassemble and read the machine code. So Tim's hypothetical optimization may or may not be an optimization, but hopefully we've learned something from this. Namely that if we set the upper bound by calling our referentially transparent method before entering the loop, as Tim suggests, the performance characteristics of the resulting program are substantially less surprising to someone who lacks up-to-date and detailed knowledge of the machine-dependent optimizations performed by the jitter. I must admit that it does introduce a problem: the loop's upper bound is now accessible outside of the loop's scope. Frankly, I'm more confident in my ability to read C# code than in my ability to mentally debug the jitter, so I will use Tim's solution from now on. Hidden Content: [reason]*[/reason]Actually... the get is side-effect free, but definitely not referentially transparent. Another thread could intervene during the execution of the loop and change the property's value. Even if the jitter does do enough analysis to determine whether it is safe to lift the get, then whether it is willing to do so will depend upon the precise circumstances under which every possible set might occur, in every related assembly. This would make it extremely hard for a programmer to reason about the optimizations performed by the jitter without being familiar with the entire code base. So intuition tells me that it will not lift. But I don't know much about the jitter, so please take this with a grain of salt. Quote Segment Image : Average Color (HSL) : Eigen Blur : []Cool, new forum! Link to comment Share on other sites More sharing options...
Julian Beaumont Posted October 19, 2009 Share Posted October 19, 2009 I do agree, predictability is a problem. I usually take the opposite approach and assume that if an optimization is possible and only leads to a constant factor change in performance it will be performed, regardless of how difficult the analysis required is. If an optimization may change the algorithmic performance of an algorithm then I'll perform it by hand. Most of the time if the optimization is performed or not doesn't end up actually mattering. If get proven wrong and it does matter then I proceed to change things as necessary. In this particular case, I believe the property is accessing a const member variable, and hence, multithreaded access isn't a problem as it can't be mutated, however, as Neil said knowing this requires knowledge of the entire code base, which often isn't practical. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.