Jump to content

Rick Brewster

Administrator
  • Posts

    20,637
  • Joined

  • Last visited

  • Days Won

    376

Everything posted by Rick Brewster

  1. Already answered by @Tactilis above. Upgrade your PC so it has more memory (RAM). This may require buying a new PC, since it looks like you're using a laptop and those can't always have their RAM upgraded. I should probably fix PDN itself so it reports this to the user (you) as Out Of Memory instead of File Is Corrupted.
  2. I got it down to 9.5s by calculating 3 px at a time 😎
  3. Looks like a GPU driver or GPU configuration (control panel) problem, or you have some other software that’s interfering or causing the glitch. Like recording/broadcasting or overclocking software.
  4. (Correction to data above: I've been using a 12K x 8K image for performance testing, not 18K x 12K)
  5. I was able to convert this to a compute shader that calculates 2 pixels at a time: 10.5 seconds 😁 Increasing that to 4 pixels reduced performance, likely because of occupancy spillage.
  6. If you're using the MSI then you need to supply properties (DESKTOPSHORTCUT=0 included) with whatever syntax it is that msiexe uses for specifying properties.
  7. Did you read the instructions? https://getpaint.net/doc/latest/UnattendedInstallation.html And the MSI isn't extractable because it's provided separately now. https://github.com/paintdotnet/release/releases In general, unless you're doing a deployment across a large network of PCs and know exactly what you're doing, it's not a good idea to use the MSI.
  8. The number of iterations is currently fixed, but that's an interesting idea
  9. It may also be worth having PDN use smaller tiles in this case. I'm not sure whether it should be an option specified in OnInitializeRenderInfo(), or if PDN should somehow auto-detect that the effect is running "too slow" and automatically adjust downwards. I think both should be used in this case. Using either of the two (multiple rendering passes, or smaller tiles) will help a lot, but lower-end hardware will really need both. Here's how the tile size is calculated, based on the total image size:
  10. I experimented with converting to/from linear space (e.g. WorkingSpaceLinear) -- and the results were substantially worse than with WorkingSpace. This is definitely an algorithm that should execute "within" the original color space.
  11. I've been able to optimize this further vs. the original shader (at full sampling) in this post, cutting down the execution time by about 42% -- without using a compute shader (although that's my next step!) and while also improving quality. Here's how I did it: Instead of starting at a pivot point of c=0.5, I use the output of shaders that calculate the min and max for the neighborhood square kernel area. This then establishes the traditional lo, hi, and pivot values for the binary search. This is less precise than taking the min/max of the circle kernel area, but can execute substantially faster (linear instead of quadratic) because these are separable kernels. This has two side effects: 1) it increases precision in areas that have a smaller dynamic range, and 2) it supports any dynamic range of values, not just those that are [0,1]. Binary search provides 1-bit per iteration. I also implemented 4-ary, 8-ary, and 16-ary. I kept only 4-ary enabled because it has the best mix of performance and can reach 8-bits of output in 4 iterations (instead of 8 iterations w/ binary search). The 8-ary can hit 9-bits in 3 iterations, which is more than we need. The 16-ary can hit 8-bits in 2 iterations but because it's using so many registers it actually runs slower due to reduced shader occupancy. The search now produces the wrong result when percentile=0 because they can only output the value from the localized min shader, which is often providing the min value for a pixel outside of the circular kernel. This means you get "squares" instead of "circles" in the output. I special-case this to use a different shader that finds the minimum value within the circular kernel. It's possible to incorporate this logic into the regular n-ary shader methods, but it significantly reduces performance. For my performance testing, I used an 18K x 12K 12K x 8K image. I set radius to 100, percentile to 75, and then used either "Full" sampling (w/ your original shader), or the default iteration count (for my shaders). Your original shader took 30.7 second, while my 4-ary implementation takes 17.8 seconds (with higher quality!). The next steps for optimization would seem to be using a compute shader, which could calculate multiple output pixels at once. This should be able to bring that 17.8 down even further, meaning this might even be shippable as a built-in PDN effect! And a quality slider that chooses full vs. half vs. etc. sampling would also enable faster performance (like your shader does). I'd also like to separate each iteration of the algorithm into its own rendering pass. This would definitely require a compute shader, as it would need to write out 2 additional float4s in order to provide the hi/lo markers (so the output image would be 3x the width of the input image, and then a final shader would discard those 2 extra values). This would enable the effect to run without monopolizing the GPU as much and would help to avoid causing major UI lag. I don't think it would improve performance, but I need to see how it goes. Here's the code I've got so far. It's using some PDN internal stuff (like PixelShaderEffect<T>), but you can still translate it to not use the internal stuff.
  12. Paint.NET isn't Photoshop ¯\_(ツ)_/¯ Unless I'm misunderstanding your initial problem description
  13. The canvas size can only be an integer (non-fractional) number of pixels. So even if you type in an "exact" number of inches, it has to be rounded to the nearest pixel size. Because images are stored using pixels, not inches
  14. That's there because you have a brush tool active. The circle shows the area that the brush will affect. You can switch to a different tool to remove it.
  15. This should be fixed for good in the upcoming 5.0.13 update. Definitely let us know if it happens again after 5.0.13.
  16. This information isn't available in the plugin interfaces. I would simply add an option in the UI for low vs. full precision. From what I could understand from the algorithm, each call to HiLo() essentially calculates 1 bit of precision starting from the most-significant bit. When working with linearized pixels (that is, WorkingSpaceLinear instead of WorkingSpace), you need up to 12-bits because the values are spread out differently. The increase from 8 to 12 is pretty dramatic with some images, but I could only see very minute differences after that. Even 11 to 12 was very small, but still noticeable upon close inspection. Going forward it may be necessary to run up to 16 times, but it's easy to make it configurable for when that comes up.
  17. btw this won't necessarily be true in future releases of Paint.NET First, the upcoming v5.1 will have color management -- so an effect will either receive pixels in the image's "working space" (which is currently de facto sRGB) which is still the unmodified BGRA32 values, or the pixels will be converted to the linearized version of the image's actual color profile (WorkingSpaceLinear is the default). As a backup, in case the color profile can't be linearized, the image will be converted to scRGB (linear sRGB). The effect's output will then be automatically converted back to the storage format of the image. Second, for future releases I am planning on adding higher-precision pixel formats like RGBA64 (4 x uint16), RGBA64Half (4 x float16), and even RGBA128Float (4 x float32). In other words, I would not rely on PrecisionEffect(UInt8Normalized) as a way to maintain the original precision -- because that won't be true in the future. I designed the new effect systems with future-proofing in mind!
  18. Another thing to note is that Paint.NET always runs effects at the highest precision (32-bit float per component / 128-bits per pixel). The SourceImage is still stored on the GPU as 32-bit BGRA, but is then premultiplied and/or color converted using 128-bpp to ensure the best quality. By using PrecisionEffect you are manually reducing the precision, which as you've seen can improve performance. However, it will of course reduce precision and color accuracy. IMO it's not worth it, unless you're using caching (set effect.Properties.Cached to true) and you set the precision to Float16. This (caching) is almost never necessary, however, and should only be used very carefully and sparingly.
  19. You need to tell us what the error is. Sadly we're not psychic.
  20. This compute shader's performance advantage seems to be that it greatly reduces the number of texture sampling instructions. It does not reduce the computational requirements -- each output pixel still needs to do the same amount of work. But there's up to an 87.5% reduction in texture sampling instructions because a sample that is used to compute multiple output pixels is only retrieved once. It likely doesn't reduce VRAM bandwidth because the GPU would be using an internal cache (e.g. L2) anyway, but it will reduce the bandwidth pressure on that internal cache.
  21. PrecisionEffect is a pass-through effect that uses a pixel shader to read the input image. This ensures Direct2D can't optimize it away. So yes, it is essentially forcing an intermediate buffer so that the next effect in the chain will consume the source at the given precision. Source -> Precision -> NextEffect This contrasts with PassthroughEffect which is a proper "passthrough" effect -- it uses ID2D1TransformGraph::SetPassthroughGraph() so it essentially "washes away" at render time as if it didn't even exist in the first place. It's not really useful for an effect graph, but it does have uses in some niche cases for architectural purposes. DynamicImage (e.g. PdnDentsEffect) uses this so that it can hand you the PassthroughEffect which you can plug into an effect graph, but then it can change which image/effect is plugged into that PassthroughEffect. This means you don't have to keep retrieving the DynamicImage's "output" when you change its properties (DynamicImage is not actually an ID2D1Image/ID2D1Effect). It's very beneficial to use PrecisionEffect instead of a CompatibleDeviceContext.Bitmap because 1) that let's Direct2D manage the rendering process and memory management, and 2) it permits Paint.NET to manage rendering with tiles along with progress reporting and cancellation support. Otherwise you're forcing everything to render during OnCreateOutput(), during which there is no progress reporting or cancellation support.
  22. You'll have to try this out and let us know how well it works for your specific scenario.
×
×
  • Create New...