Hmm, I've tried the static cache, but then it isn't 'clean' when the Render method is called. The image still renders in a recognizable way, but it gets polluted with noise from previous renderings. Cleaning the cache by resetting all its elements at the start of the Render method pretty much cancels out the static cache gain.
However, when I change the ColorError class to a struct it gets much faster even when I'm creating new caches every time the Render method is called. The structs are created much faster than classes, and they're instantiated automatically when I create an array. Since the default value happens to be exactly what I need, I don't need to loop through the array afterwards. According to my profiler the Floyd-Steinberg calculations are also twice as fast.
I did some performance tests with arrays of structs and arrays of classes , and the structs turned out to be much faster then the classes, with half the number of GC collections as well. An array with just one struct isn't much faster than an array with one class, but it gets better with bigger arrays. With 10000 elements it's 4 times faster, with 500000 elements it's 80x faster.
So, that sounds like a gain. I'll upload a new version later today.