Jump to content

_koh_

Members
  • Posts

    112
  • Joined

  • Last visited

Everything posted by _koh_

  1. Enabled HDR mode and tested this. When I do Alt+Tab, canvas occasionally disappears like 1 frame but not that jarring. Not sure I gonna use ACM though. Sounds like a good thing for images / videos, but I'm too familiar with UI / texts having P3 color at this point.
  2. I see. Thanks. Thought maybe I could treat ExifColorSpace.AdobeRgb profile separately, but once assigned to the image, it already has different bytes so I gave up😅 edit: Ah, since I learned how to convert RGB to XYZ, I can extract RGB primaries from ColorContext. A bit hacky but can test any profile.
  3. Ah maybe. But if (ColorContext == DisplayP3) is more difficult part I guess. PDN already doing this for sRGB, right? If there is a simple way to do this, I can also treat Adobe RGB separately. As I reported, D2D CME's handling of Adobe RGB is a bit iffy.
  4. My bad. Seems like this is PDN 5.0 to 5.1 difference. When I ported my plugins to 5.1, I found sRGB <-> Linear is more accurate. I thought this has to do with DeviceColorSpace.ScRgb to LinearizedColorContext but... Yeah, this is the reason. Now it's doing actual math instead of table lookup. Same reason I was using my own shader for sRGB <-> Linear in 5.0. Can you also give DP3 a special treat? Which is using the same TRC as sRGB.
  5. Thanks. I'll look into that. One thing I know is sRGB -> DeviceColorSpace.ScRgb -> sRGB sRGB -> LinearizedColorContext -> sRGB give me slightly different results and later is more accurate than former.
  6. Oh, I didn't know that. I've been thinking when I do sRGB <-> Display P3 etc. changing rendering intent should change the outcome. But yeah, this is the first time I actually tried to tweak the intent. So I tested Display P3 -> scRGB -> luminous = dot(RGB, new(0.212639, 0.715169, 0.072192)) Display P3 -> Linear -> luminous = dot(RGB, new(0.228975, 0.691739, 0.079287)) and those two give me the same result with handful of rounding errors. Which makes color management compatible luminous effect simple as this. Thanks! edit: CME just outputs untouched non 0-1 range values when I do P3 -> scRGB, so going through scRGB is more useful than I thought. I was assuming CME only outputs 0-1 range and I need to pick a normalize method like scale (perceptual), clamp (absolute)... but this wasn't the case. protected override void OnInitializeRenderInfo(IGpuImageEffectRenderInfo renderInfo) { renderInfo.ColorContext = GpuEffectColorContext.ScRgb; base.OnInitializeRenderInfo(renderInfo); } protected override IDeviceImage OnCreateOutput(PaintDotNet.Direct2D1.IDeviceContext DC) { var gray = (float R, float G, float B) => new Matrix5x4Float(R,R,R,0, G,G,G,0, B,B,B,0, 0,0,0,1, 0,0,0,0); return new ColorMatrixEffect(DC, Environment.SourceImage, gray(0.2126f, 0.7152f, 0.0722f)); }
  7. Ah thanks for this. I just have a rough image like perceptual intent fits one cube to another. My initial thought was if I create a D65 XYZ profile and use it as the destination, ColorMatrixEffect will do RGB -> D50 XYZ -> D65 XYZ, but I need to use absolute intent to get the actual XYZ values.
  8. Now I realized I can do some RGB -> scRGB (perceptual), then scRGB -> XYZ with ColorMatrix. Not sure this is same as some RGB -> XYZ (absolute) though. I remember at least those two worked at one point.
  9. I don't understand why ColorManagementEffect takes 2 intent parameters either. It goes against my understanding. Feels like absolute intent is just clamp(invert(dstMatrix) * srcMatrix * RGB), but I might be wrong. What I'm trying to do here is create a XYZ D65 image and use its Y as the luminance. If I have RGB -> XYZ D50 matrix in the ICC profile or XYZ D50 PCS image, I only need ColorMatrixEffect to do that, but I have neither so I need to use SimpleColorProfile + ColorManagementEffect instead.
  10. My final code will be something like this and the "rotate" one being the 3rd case. And if AbsoluteColorimetric is respected throughout the process, then yeah, I think it will be fine. I'm just using ColorManagementEffect as ColorMatrixEffect without knowing the matrix in the ICC profile, so I need to bypass the rendering intent in this case. BTW while I'm not gonna use those, DxgiColorSpace having the same problem. protected override IDeviceImage OnCreateOutput(PaintDotNet.Direct2D1.IDeviceContext DC) { var abs = ColorManagementRenderingIntent.AbsoluteColorimetric; using var enc = Environment.Document.ColorContext.CreateRef(); using var dec = DC.CreateLinearizedColorContextOrScRgb(enc); using var xyz = DC.CreateColorContext(new SimpleColorProfile(new(1,0), new(0,1), new(0,0), new(0.95045f, 1.08905f), SimpleColorProfileGamma.G10)); using var source = Environment.SourceImage.CreateRef(); using var decode = new ColorManagementEffect(DC, source, enc, abs, dec, abs); using var rotate = new ColorManagementEffect(DC, decode, dec, abs, xyz, abs); using var lumino = new PixelSwizzleEffect(DC, rotate, PixelSwizzle.GGGA); using var encode = new ColorManagementEffect(DC, lumino, dec, abs, enc, abs); return encode.CreateRef(); }
  11. Looks like SimpleColorProfile having problem. This grayscale script works in 5.0 but not in 5.1. System.Runtime.InteropServices.COMException (0x800707DC): IsColorProfileValid (2012, ERROR_TAG_NOT_FOUND) edit: DC.CreateEffect(DeviceEffectIDs.ColorManagement) works so likely somewhere in the wrapper. protected override void OnInitializeRenderInfo(IGpuImageEffectRenderInfo renderInfo) { renderInfo.ColorContext = GpuEffectColorContext.WorkingSpace; base.OnInitializeRenderInfo(renderInfo); } protected override IDeviceImage OnCreateOutput(IDeviceContext DC) { var enc = DC.CreateColorContext(DeviceColorSpace.Srgb); var xyz = DC.CreateColorContext(new SimpleColorProfile(new(1,0), new(0,1), new(0,0), new(1,1), SimpleColorProfileGamma.G10)); var source = Environment.SourceImage; var xyzimg = new ColorManagementEffect(DC, source, enc, xyz); var lumimg = new ColorMatrixEffect(DC, xyzimg, new(0,0,0,0, 1,1,1,0, 0,0,0,0, 0,0,0,1, 0,0,0,0)); var rgbimg = new ColorManagementEffect(DC, lumimg, xyz, enc); return rgbimg; }
  12. Additionally, your laptop can choose AMD / NVIDIA GPU for the UI rendering, so enable hardware acceleration and switch UI rendering to NVIDIA could be a better workaround. You can change this through the Windows settings. PDN 5.1 removed open Windows settings button but I think it's better to have it.
  13. Not sure you are happy with this result, but this CodeLab script makes the pixel more transparent when pixel color is more closer to the primary color. #region UICode DoubleSliderControl threshold = 0.32; // [0,1] Threshold #endregion protected override void OnInitializeRenderInfo(IGpuImageEffectRenderInfo renderInfo) { renderInfo.ColorContext = GpuEffectColorContext.WorkingSpace; base.OnInitializeRenderInfo(renderInfo); } protected override IDeviceImage OnCreateOutput(IDeviceContext DC) { using var source = Environment.SourceImage.CreateRef(); using var color = new FloodEffect2(DC, (ColorRgba128Float)Environment.PrimaryColor); using var vector = new HlslBinaryOperatorEffect(DC, source, HlslBinaryOperator.Subtract, color); using var length = new HlslUnaryFunctionEffect(DC, HlslUnaryFunction.Length, vector); using var alpha = new HlslBinaryOperatorEffect(DC, length, HlslBinaryOperator.Multiply, new Vector4Float((float)(1 / threshold))); using var clamp = new HlslUnaryFunctionEffect(DC, HlslUnaryFunction.Saturate, alpha); using var output = new HlslBinaryOperatorEffect(DC, source, HlslBinaryOperator.Multiply, clamp); return output.CreateRef(); }
  14. Another choice is just keep <TargetFramework>net7.0-windows</TargetFramework> in your csproj, then .NET 8 SDK builds .NET 7 dll and both PDN 5.0 and 5.1 can load it. Once you bump this to net8.0-windows, only PDN 5.1 and later can load it.
  15. Oh, so there is a command to force GC. Keys.Oemtilde = keycode 192 which is "@" on my keyboard, and if I do Ctrl+Alt+Shift+@, the image flickers and I can open more than 60 images. edit: Shift+@ is backquote on my keyboard too. It's 0%.
  16. Same result but the image doesn't flicker so likely due to the different keyboard locale. Tried "~" and the key next to "1" but doesn't seem like doing a thing.
  17. Maybe it's better, but now 60 images freeze PDN instead of crash it.
  18. Opening 60 images crashes PDN. No need to open them at once and 59 is OK. pdncrash.1.log
  19. I'm talking about canvas so it's due to your configuration. Since PDN supported it, I was looking for a simple image viewer with gamma correct zoom. IrfanView is the only one I could find but the quality is not as good as PDN. An image viewer needs to do interpolated upscaling, so maybe they are using uniform method for both upscaling and downscaling.
  20. This scaler is really high quality. Not only it's gamma correct, it almost has no moire pattern at any zoom level. I found latest IrfanView does gamma correct scaling, but I still see some moire so likely sampling is not as good or a bit more aggressive.
  21. Technically, 50% zoomed texts should look 50% thinner, not 50% smaller texts with the same thickness. Not so many image viewers handle this properly, and PDN is the most accurate one I know of. You can spot the noise because non gamma correct down scaling enhances the noise. So if you want to use down scaling for this purpose, hit Ctrl+R and uncheck gamma correction then you can create a noise enhanced image.
  22. This MS histogram is a bit cumbersome to use so that's good to hear. In the latest version, I'm mapping input histogram to output histogram like my CPU version and I need to use at least 4096 bins to do that, but MS histogram only supports up to 1024 bins so I'm scanning the image 4 times then back calculating it. So if new histogram supports higher bin count, that's even better. private Vector4[] Histogram(IDeviceImage image, int prec) { using var idc = DC.CreateCompatibleDeviceContext(null, new(1024, 1024), DevicePixelFormats.Prgba128Float); using var odc = DC.CreateCompatibleDeviceContext(null, new(), DevicePixelFormats.Prgba128Float); var (bins, span) = (Math.Min(prec, 4) * 256, Math.Max(prec, 4) / 4); var data = new Vector4[span * bins]; var d = new RectInt32(0, 0, Environment.Document.Size); var t = new RectInt32(0, 0, idc.PixelSize); for (t.Offset(0, -t.Y); t.Y < d.Height; t.Offset(0, t.Height)) for (t.Offset(-t.X, 0); t.X < d.Width; t.Offset(t.Width, 0)) { var r = RectInt32.Intersect(d, t); using (idc.UseBeginDraw()) idc.DrawImage(image, null, r, compositeMode: CompositeMode.SourceCopy); foreach (var n in new[] {0, 1, 2}) for (var j = 0; j < span; j++) { var (l, o) = (255f / 256, 0.5f / 256 - (j - span + 1f) / (span * bins)); using var size = new CropEffect(DC, idc.Bitmap, new(0, 0, r.Size)); using var tran = new ColorMatrixEffect(DC, size, new(l,0,0,0, 0,l,0,0, 0,0,l,0, 0,0,0,1, o,o,o,0)); using var hist = new HistogramEffect(DC); hist.Properties.Input.Set(tran); hist.Properties.Bins.SetValue(bins); hist.Properties.ChannelSelect.SetValue((ChannelSelector)n); using (odc.UseBeginDraw()) odc.DrawImage(hist); var (v, s) = (hist.Properties.HistogramOutput.GetValue(), (float)r.Area / d.Area); for (var i = 0; i < bins; i++) data[span * i + j][n] += v[i] * s; } } var sum = (Span<Vector4> a) => {var x = Vector4.Zero; foreach (var v in a) x += v; return x;}; for (var i = 0; i < data.Length; i++) data[i] -= sum(data.AsSpan(Math.Max(i - span + 1, 0)..i)); return data; } full source code LightBalanceGPU.zip
  23. Posting latest version with 2bit binary search, or more like quarter search. source code + dll MedianFilterGPU.zip Additionally I made versions which compute 2,3,4 pixel colors at once, then ran them on 1/2,1/3,1/4 sized images to estimate compute shader performance. 8K image, radius 100, sampling rate 1/4, RTX 3060 laptop No optimization - 18.2s INT8 sampling - 8.6s 2bit binary search - 10.2s pseudo 2,3,4 pixel output - 10.2s, 8.2s, 7.8s INT8 sampling + 2bit binary search - 7.2s INT8 sampling + pseudo 2,3,4 pixel output - 6.9s Looks like 2.6x original version is the performance ceiling on my GPU, and this latest version is at 2.5x. 2bit binary search need to test 3 thresholds inside of the loop to make loop iteration 1/2. Maybe that's why it runs slightly slower. 2bit binary search shader private readonly partial struct Render : ID2D1PixelShader { private readonly float r, p; private readonly float3 d; private float4 HiLo(float4 c, float v) { float3x4 n = 0; float m = 0; float y = r % d.Y - r; for (; y <= r; y += d.Y) { float w = Hlsl.Trunc(Hlsl.Sqrt(r * r - y * y)); float x = (w + r * d.X + y / d.Y * d.Z) % d.X - w; for (; x <= w; x += d.X) { float4 s = D2D.SampleInputAtOffset(0, new(x, y)); n += Hlsl.Step(new float3x4(s, s, s), new(c - v, c, c + v)); m += 1; } } return (float3)1 * (1 - 2 * Hlsl.Step(Hlsl.Max(m * p, 1), n * 100)); } public float4 Execute() { float4 c = 0.5f; float v = 0.5f; c += HiLo(c, v *= 0.5f) * (v *= 0.5f); c += HiLo(c, v *= 0.5f) * (v *= 0.5f); c += HiLo(c, v *= 0.5f) * (v *= 0.5f); c += HiLo(c, v *= 0.5f) * (v *= 0.5f); return c; } } pseudo 4 pixel output shader private readonly partial struct Render : ID2D1PixelShader { private readonly float r, p; private readonly float3 d; private float4x4 HiLo(float4x4 c, float2 o) { float4x4 n = 0; float m = 0; float y = r % d.Y - r; for (; y <= r; y += d.Y) { float w = Hlsl.Trunc(Hlsl.Sqrt(r * r - y * y)); float x = (w + r * d.X + y / d.Y * d.Z) % d.X - w; for (; x - d.X * 3 <= w; x += d.X) { // float4 s = input[(int2)(o + new float2(x, y))]; float4 s = D2D.SampleInputAtPosition(0, o + new float2(x, y)); float4 a = Hlsl.Step(Hlsl.Abs(x - d.X * new float4(0, 1, 2, 3)), w); n += new float4x4(a.X, a.Y, a.Z, a.W) * Hlsl.Step(new(s, s, s, s), c); m += a.X; } } return 1 - 2 * Hlsl.Step(Hlsl.Max(m * p, 1), n * 100); } public float4 Execute() { // float2 o = new(ThreadIds.X * 4 - ThreadIds.X % d.X * 3, ThreadIds.Y); float2 o = D2D.GetScenePosition().XY; float4x4 c = 0.5f; float v = 0.5f; c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); c += HiLo(c, o) * (v *= 0.5f); // output[(int2)(o + new float2(d.X * 0, 0))] = c[0]; // output[(int2)(o + new float2(d.X * 1, 0))] = c[1]; // output[(int2)(o + new float2(d.X * 2, 0))] = c[2]; // output[(int2)(o + new float2(d.X * 3, 0))] = c[3]; return (float4)1 / 4 * c; } }
  24. My last post sounds mess even for my english. haha I was trying to say, looks like original shader idling 75% of the time because of bandwidth or latency, so doing 4x computing per sample and make loop iterations 1/4 might be the sweet spot. I expect making any part of for(y) for(x) for(i) loop 1/4 has the same effect, but 1/4 i loop requires 8x computing per sample and pixel shader can't configure the xy loop. I tested 1/2 i loop configuration and got 15% boost with it. My latest version doing INT8 sampling so not that much room left apparently. edit: I found performance ceiling on my GPU is rather 3x than 4x, so likely GPU clock dependent. When 2GHz GPU is idling 75% of the time, 1.5GHz GPU is idling 66% of the time and such.
  25. Seems like at least you can have 4x compute / fetch compared to the original shader for free. What if you change arity=2 and keep output 4 pixels? That's another 4x compute / fetch setup I believe.
×
×
  • Create New...