Jump to content

Rendering Efficiency


Tim!

Recommended Posts

A common rendering inefficiency amongst many code samples is caused by the repetitive accessing of properties of the Rectangle class in for-loops in the Render method.

Consider the following code sample. The second style is faster because it uses local variables to access the rectangle data. The first style makes repeated subroutine calls to access the rectangle data:

Hidden Content:
using System.Diagnostics;
using System.Drawing;
using System.Windows.Forms;

namespace WindowsFormsApplication1
{
   public partial class Form1 : Form
   {
       public Form1()
       {
           InitializeComponent();
       }

       private void buttonLoops_Click(object sender, System.EventArgs e)
       {
           Cursor.Current = Cursors.WaitCursor;

           Stopwatch sw = new Stopwatch();

           Rectangle rect = new Rectangle(0, 0, 10000, 10000);
           string str = "";

           sw.Start();

           for (int y = rect.Top; y < rect.Bottom; y++)
           {
               for (int x = rect.Left; x < rect.Right; x++)
               {
                   int result = 0;
               }
           }

           sw.Stop();
           str += "Loop 1: " + sw.ElapsedTicks + "\r\n";

           sw.Reset();
           sw.Start();

           int right = rect.Right;
           int left = rect.Left;
           int top = rect.Top;
           int bottom = rect.Bottom;
           for (int y = top; y < bottom; y++)
           {
               for (int x = left; x < right; x++)
               {
                   int result = 0;
               }
           }

           sw.Stop();
           str += "Loop 2: " + sw.ElapsedTicks + "\r\n";

           str += "Ticks per second: " + Stopwatch.Frequency;

           MessageBox.Show(str);

           Cursor.Current = Cursors.Default;
       }
   }
}

Another common problem is unecessary recalculation of for-loop limits every time the loop is executed. Limits should be calculated before the for-loops. In Example 2 below, for an image of 800x600px, it saves 480598 computations (600 + 800 x 600 - 2) in comparison to Example 1:

Hidden Content:
           //
           // Example 1
           //
           int a = ComputeA();
           int b = ComputeB();
           int left = rect.Left;
           int right = rect.Right;
           int top = rect.Top;
           int bottom = rect.Bottom;
           for (int y = top; y < bottom - a; y++)
           {
               for (int x = left; x < right - b; x++)
               {
                   int result = 0;
               }
           }

           //
           // Example 2
           //
           int a = ComputeA();
           int b = ComputeB();
           int left = rect.Left;
           int right = rect.Right - b;
           int top = rect.Top;
           int bottom = rect.Bottom - a;
           for (int y = top; y < bottom; y++)
           {
               for (int x = left; x < right; x++)
               {
                   int result = 0;
               }
           }

Link to comment
Share on other sites

Just use csc with the optimize code option... or optimize code in visual studio project properties. It does all this.

Also, I believe the Effects API in Paint.NET does some refactoring.

~~

Link to comment
Share on other sites

Just use csc with the optimize code option... or optimize code in visual studio project properties. It does all this.

Also, I believe the Effects API in Paint.NET does some refactoring.

I checked optimise code in project properties (Release). Loop 2 runs 3.64 times faster than Loop 1. Here's my ILDASM where Loop 1 uses call instance and Loop 2 uses ldloc.s:

Hidden Content:
.method private hidebysig instance void buttonLoops_Click(object sender, class [mscorlib]System.EventArgs e) cil managed
{
   .maxstack 5
   .locals init (
       [0] class [system]System.Diagnostics.Stopwatch sw,
       [1] valuetype [system.Drawing]System.Drawing.Rectangle rect,
       [2] string str,
       [3] int32 y,
       [4] int32 x,
       [5] int32 left,
       [6] int32 right,
       [7] int32 top,
       [8] int32 bottom,
       [9] int32 V_9,
       [10] int32 V_10,
       [11] object CS$0$0000,
       [12] object[] CS$0$0001,
       [13] object CS$0$0002,
       [14] object[] CS$0$0003)
   L_0000: call class [system.Windows.Forms]System.Windows.Forms.Cursor [system.Windows.Forms]System.Windows.Forms.Cursors::get_WaitCursor()
   L_0005: call void [system.Windows.Forms]System.Windows.Forms.Cursor::set_Current(class [system.Windows.Forms]System.Windows.Forms.Cursor)
   L_000a: newobj instance void [system]System.Diagnostics.Stopwatch::.ctor()
   L_000f: stloc.0 
   L_0010: ldloca.s rect
   L_0012: ldc.i4.0 
   L_0013: ldc.i4.0 
   L_0014: ldc.i4 0x2710
   L_0019: ldc.i4 0x2710
   L_001e: call instance void [system.Drawing]System.Drawing.Rectangle::.ctor(int32, int32, int32, int32)
   L_0023: ldstr ""
   L_0028: stloc.2 
   L_0029: ldloc.0 
   L_002a: callvirt instance void [system]System.Diagnostics.Stopwatch::Start()
   L_002f: ldloca.s rect
   L_0031: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Top()
   L_0036: stloc.3 
   L_0037: br.s L_0059
   L_0039: ldloca.s rect
   L_003b: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Left()
   L_0040: stloc.s x
   L_0042: br.s L_004a
   L_0044: ldloc.s x
   L_0046: ldc.i4.1 
   L_0047: add.ovf 
   L_0048: stloc.s x
   L_004a: ldloc.s x
   L_004c: ldloca.s rect
   L_004e: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Right()
   L_0053: blt.s L_0044
   L_0055: ldloc.3 
   L_0056: ldc.i4.1 
   L_0057: add.ovf 
   L_0058: stloc.3 
   L_0059: ldloc.3 
   L_005a: ldloca.s rect
   L_005c: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Bottom()
   L_0061: blt.s L_0039
   L_0063: ldloc.0 
   L_0064: callvirt instance void [system]System.Diagnostics.Stopwatch::Stop()
   L_0069: ldloc.2 
   L_006a: stloc.s CS$0$0000
   L_006c: ldc.i4.4 
   L_006d: newarr object
   L_0072: stloc.s CS$0$0001
   L_0074: ldloc.s CS$0$0001
   L_0076: ldc.i4.0 
   L_0077: ldloc.s CS$0$0000
   L_0079: stelem.ref 
   L_007a: ldloc.s CS$0$0001
   L_007c: ldc.i4.1 
   L_007d: ldstr "Loop 1: "
   L_0082: stelem.ref 
   L_0083: ldloc.s CS$0$0001
   L_0085: ldc.i4.2 
   L_0086: ldloc.0 
   L_0087: callvirt instance int64 [system]System.Diagnostics.Stopwatch::get_ElapsedTicks()
   L_008c: box int64
   L_0091: stelem.ref 
   L_0092: ldloc.s CS$0$0001
   L_0094: ldc.i4.3 
   L_0095: ldstr "\r\n"
   L_009a: stelem.ref 
   L_009b: ldloc.s CS$0$0001
   L_009d: call string [mscorlib]System.String::Concat(object[])
   L_00a2: stloc.2 
   L_00a3: ldloc.0 
   L_00a4: callvirt instance void [system]System.Diagnostics.Stopwatch::Reset()
   L_00a9: ldloc.0 
   L_00aa: callvirt instance void [system]System.Diagnostics.Stopwatch::Start()
   L_00af: ldloca.s rect
   L_00b1: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Left()
   L_00b6: stloc.s left
   L_00b8: ldloca.s rect
   L_00ba: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Right()
   L_00bf: stloc.s right
   L_00c1: ldloca.s rect
   L_00c3: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Top()
   L_00c8: stloc.s top
   L_00ca: ldloca.s rect
   L_00cc: call instance int32 [system.Drawing]System.Drawing.Rectangle::get_Bottom()
   L_00d1: stloc.s bottom
   L_00d3: ldloc.s top
   L_00d5: stloc.s V_9
   L_00d7: br.s L_00f1
   L_00d9: ldloc.s left
   L_00db: stloc.s V_10
   L_00dd: br.s L_00e5
   L_00df: ldloc.s V_10
   L_00e1: ldc.i4.1 
   L_00e2: add.ovf 
   L_00e3: stloc.s V_10
   L_00e5: ldloc.s V_10
   L_00e7: ldloc.s right
   L_00e9: blt.s L_00df
   L_00eb: ldloc.s V_9
   L_00ed: ldc.i4.1 
   L_00ee: add.ovf 
   L_00ef: stloc.s V_9
   L_00f1: ldloc.s V_9
   L_00f3: ldloc.s bottom
   L_00f5: blt.s L_00d9
   L_00f7: ldloc.0 
   L_00f8: callvirt instance void [system]System.Diagnostics.Stopwatch::Stop()
   L_00fd: ldloc.2 
   L_00fe: stloc.s CS$0$0002
   L_0100: ldc.i4.4 
   L_0101: newarr object
   L_0106: stloc.s CS$0$0003
   L_0108: ldloc.s CS$0$0003
   L_010a: ldc.i4.0 
   L_010b: ldloc.s CS$0$0002
   L_010d: stelem.ref 
   L_010e: ldloc.s CS$0$0003
   L_0110: ldc.i4.1 
   L_0111: ldstr "Loop 2: "
   L_0116: stelem.ref 
   L_0117: ldloc.s CS$0$0003
   L_0119: ldc.i4.2 
   L_011a: ldloc.0 
   L_011b: callvirt instance int64 [system]System.Diagnostics.Stopwatch::get_ElapsedTicks()
   L_0120: box int64
   L_0125: stelem.ref 
   L_0126: ldloc.s CS$0$0003
   L_0128: ldc.i4.3 
   L_0129: ldstr "\r\n"
   L_012e: stelem.ref 
   L_012f: ldloc.s CS$0$0003
   L_0131: call string [mscorlib]System.String::Concat(object[])
   L_0136: stloc.2 
   L_0137: ldloc.2 
   L_0138: ldstr "Ticks per second: "
   L_013d: ldsfld int64 [system]System.Diagnostics.Stopwatch::Frequency
   L_0142: box int64
   L_0147: call string [mscorlib]System.String::Concat(object, object, object)
   L_014c: stloc.2 
   L_014d: ldloc.2 
   L_014e: call valuetype [system.Windows.Forms]System.Windows.Forms.DialogResult [system.Windows.Forms]System.Windows.Forms.MessageBox::Show(string)
   L_0153: pop 
   L_0154: call class [system.Windows.Forms]System.Windows.Forms.Cursor [system.Windows.Forms]System.Windows.Forms.Cursors::get_Default()
   L_0159: call void [system.Windows.Forms]System.Windows.Forms.Cursor::set_Current(class [system.Windows.Forms]System.Windows.Forms.Cursor)
   L_015e: ret 
}




Link to comment
Share on other sites

If your running the tests from within visual studio then you need to go Debug -> Start without Debugging (Ctrl + F5), otherwise visual studio will suppress JIT optimization, even in Release mode. (or just run it from outside visual studio).

Link to comment
Share on other sites

Micro-benchmarks like this are mostly pointless IMO. Do you have an actual problem your trying to optimize? If so, have you profiled to see what the bottlenecks are? The C# compiler can't eliminate the loop or the calls to Left, Right, Top, or Bottom in the IL because it doesn't have access to the code in the assembly containing them, and hence, can't guarantee that they are side effect free. If they aren't hoisting the calls out of the loop changes the semantics.

The JIT compiler doesn't have this limitation, it knows about all relevant assemblies and hence should be able to determine they are side effect free. Once it's done this and performed the loop-invariant code motion to lift the calls to Left, Right, etc out of the loop it's just left with an empty loop, and should eliminate it entirely. To avoid the loop being removed you need to perform some non-trivial computation in it and use/output the result.

Finally, what absolute running times are you getting? If it's not at least a few seconds then it's too small to discount things like fluctuations in machine load as well.

Link to comment
Share on other sites

I've tried the following plug-in experiment. It gives inconsistent results, but Loop 2 is usually a bit faster than Loop 1 (the JIT compiler has done its job well). The Windows Forms experiment gave a consistent and marked speed difference. So why the difference?

Hidden Content:
using System.Diagnostics;
using System.Drawing;
using System.Windows.Forms;
using PaintDotNet;
using PaintDotNet.Effects;

namespace EffectsPluginTemplate1
{
   public class EffectPlugin
       : PaintDotNet.Effects.Effect
   {
       public static string StaticName
       {
           get
           {
               return "Effect Plugin";
           }
       }

       public static Bitmap StaticIcon
       {
           get
           {
               return new Bitmap(typeof(EffectPlugin), "EffectPluginIcon.png");
           }
       }

       public static string StaticSubMenuName
       {
           get
           {
               //return null; // Use for no submenu
#if DEBUG
               return "_test"; // Use for custom submenu
#else
               return SubmenuNames.Render;
#endif
           }
       }

       public EffectPlugin()
           : base(EffectPlugin.StaticName, EffectPlugin.StaticIcon, EffectPlugin.StaticSubMenuName, EffectFlags.Configurable | EffectFlags.SingleThreaded)
       {
           show_msgbox = false;
       }

       ~EffectPlugin()
       {
           if (show_msgbox)
           {
               string str = "";
               str += "Loop 1: " + loop1_ticks + "\r\n";
               str += "Loop 2: " + loop2_ticks + "\r\n";
               str += "Loop 1 / Loop 2: " + loop1_ticks / loop2_ticks + "\r\n";
               str += "Ticks per second: " + Stopwatch.Frequency;
               MessageBox.Show(str);
           }
       }

       public override EffectConfigDialog CreateConfigDialog()
       {
           return new EffectPluginConfigDialog();
       }

       protected override void OnSetRenderInfo(EffectConfigToken parameters, RenderArgs dstArgs, RenderArgs srcArgs)
       {
           loop1_ticks = 0;
           loop2_ticks = 0;
           show_msgbox = true;
           base.OnSetRenderInfo(parameters, dstArgs, srcArgs);
       }

       private Stopwatch sw = new Stopwatch();
       private float loop1_ticks;
       private float loop2_ticks;
       private bool show_msgbox;

       public override void Render(EffectConfigToken parameters, RenderArgs dstArgs, RenderArgs srcArgs, Rectangle[] rois, int startIndex, int length)
       {
           PdnRegion selectionRegion = EnvironmentParameters.GetSelection(srcArgs.Bounds);

           sw.Reset();
           sw.Start();

           //
           // Loop 1
           //
           for (int i = startIndex; i < startIndex + length; ++i)
           {
               Rectangle rect = rois[i];

               for (int y = rect.Top; y < rect.Bottom; ++y)
               {
                   for (int x = rect.Left; x < rect.Right; ++x)
                   {
                       // Render Code Here
                       dstArgs.Surface[x, y] = srcArgs.Surface[rect.Right - x - 1, y];
                   }
               }
           }

           sw.Stop();
           loop1_ticks += sw.ElapsedTicks;

           sw.Reset();
           sw.Start();

           //
           // Loop 2
           //
           int r = startIndex + length;
           for (int i = startIndex; i < r; i++)
           {
               Rectangle rect = rois[i];
               int left = rect.Left;
               int right = rect.Right;
               int top = rect.Top;
               int bottom = rect.Bottom;
               for (int y = top; y < bottom; y++)
               {
                   for (int x = left; x < right; x++)
                   {
                       dstArgs.Surface[x, y] = srcArgs.Surface[right - x - 1, y];
                   }
               }
           }

           sw.Stop();
           loop2_ticks += sw.ElapsedTicks;
       }
   }
}

Link to comment
Share on other sites

The property accesses on Rectangle are the least of your troubles. All that Rectangle.get_Left() does, for instance, is "return this.X;". That will be inlined.

Also, I believe the Effects API in Paint.NET does some refactoring.

I'm not sure what you mean by this. It'd be news to me if it were transforming code.

The Paint.NET Blog: https://blog.getpaint.net/

Donations are always appreciated! https://www.getpaint.net/donate.html

forumSig_bmwE60.jpg

Link to comment
Share on other sites

Also, I believe the Effects API in Paint.NET does some refactoring.

...wut...?

xZYt6wl.png

ambigram signature by Kemaru

[i write plugins and stuff]

If you like a post, upvote it!

Link to comment
Share on other sites

Say I put a method call in a loop test. I know that the method is referentially transparent (for all intents and purposes). So intuition tells me that the jitter will extensively analyze all relevant assemblies, determine that the method is side-effect free, and lift the call out of the loop *. But the bytecode tells me that my method will be called a few million times.

To resolve the ambiguity, I must either:

1. Accurately predict how the jitter will transform the loop on this machine, or

2. Conduct careful performance tests, or

2. Disassemble and read the machine code.

So Tim's hypothetical optimization may or may not be an optimization, but hopefully we've learned something from this. Namely that if we set the upper bound by calling our referentially transparent method before entering the loop, as Tim suggests, the performance characteristics of the resulting program are substantially less surprising to someone who lacks up-to-date and detailed knowledge of the machine-dependent optimizations performed by the jitter.

I must admit that it does introduce a problem: the loop's upper bound is now accessible outside of the loop's scope. Frankly, I'm more confident in my ability to read C# code than in my ability to mentally debug the jitter, so I will use Tim's solution from now on.

Hidden Content:
[reason]*[/reason]Actually... the get is side-effect free, but definitely not referentially transparent. Another thread could intervene during the execution of the loop and change the property's value. Even if the jitter does do enough analysis to determine whether it is safe to lift the get, then whether it is willing to do so will depend upon the precise circumstances under which every possible set might occur, in every related assembly. This would make it extremely hard for a programmer to reason about the optimizations performed by the jitter without being familiar with the entire code base. So intuition tells me that it will not lift. But I don't know much about the jitter, so please take this with a grain of salt.

Segment Image : Average Color (HSL) : Eigen Blur : []

Cool, new forum!

Link to comment
Share on other sites

I do agree, predictability is a problem. I usually take the opposite approach and assume that if an optimization is possible and only leads to a constant factor change in performance it will be performed, regardless of how difficult the analysis required is. If an optimization may change the algorithmic performance of an algorithm then I'll perform it by hand. Most of the time if the optimization is performed or not doesn't end up actually mattering. If get proven wrong and it does matter then I proceed to change things as necessary.

In this particular case, I believe the property is accessing a const member variable, and hence, multithreaded access isn't a problem as it can't be mutated, however, as Neil said knowing this requires knowledge of the entire code base, which often isn't practical.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...