I was looking to try this out to train an upscaling model but thought to try one of my

Feature request/bug fix: Perform scaling and other operations in linear light about trainner HOT 3 OPEN

victorca25 commented on June 16, 2024

Feature request/bug fix: Perform scaling and other operations in linear light

from trainner.

Comments (3)

victorca25 commented on June 16, 2024

Hello! Thanks for doing the tests and experimenting. I am aware of the linear space operations on images, the correct functions are used in iNNfer:
https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/colors.py#L29
https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/colors.py#L49

Where a "linear_resize" is also implemented:
https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/utils.py#L267

However, the conversions add considerable latency in the training process (every image in every batch has to be converted back and forth between SRGB and linear) and not all operations on images require them to be applied on the linear space.

Additionally, the logic may be better implemented as a wrapper in https://github.com/victorca25/augmennt, but I haven't had time to evaluate the different implementation options and compare results between the current case and doing the linear conversions, but considering no current SOTA project does SRGB to linear conversions before doing the images operations and results are not impacted, the priority to implement it is relatively low, in comparison to other WIP elements. However, if during your testing you find results improve with the conversions, the priority can change.

from trainner.

awused commented on June 16, 2024

Looking at linear2srgb that appears to be missing a rounding step before conversion to avoid truncation (np.around(srgb).astype(uint8)).

I did a little experiment, it's not much but in the interest of time I ran 500 iterations on DIV2K (using the pre-trained 4xPSNR as a starting point) with both srgb and linear rgb downscaling (modifying MLResize augmentations.py). With the default sRGB downscaling my first validation was 21-11-22 03:56:04.056 - INFO: <epoch: 4, iter: 500> PSNR: 24.702, SSIM: 0.68838, LPIPS: 0.12116, but with linear RGB downscaling I got 21-11-22 04:36:38.677 - INFO: <epoch: 4, iter: 500> PSNR: 26.169, SSIM: 0.7637, LPIPS: 0.10971. I repeated it again with similar results. I used the basic train_sr.yml with no substantial changes.

Using these two models I took this this original image: and produced these two output images (which I've downscaled back to the original size to demonstrate the colour differences).

With no modifications to trainner (srgb downscaling):

With augment.py switched to use linear RGB downscaling:

While this was a very small test I do think it demonstrates the impact of colour spaces on downscaling on training. Neither one of them was perfect inside of 500 iterations but the models trained on linear RGB downscaling were much closer in colour. The point isn't that either of these two models is actually any good, but that downscaling colour space can make a difference. This should be repeatable.

Even compared to one of the more well-regarded models (yandere neo xl) the results are roughly on par (I'd argue subjectively better, but marginally worse by PSNR/MAE) in terms of colour accuracy despite only 500 iterations. yandere_neo_xl

no current SOTA project does SRGB to linear conversions before doing the images operations and results are not impacted
I honestly believe this is incorrect. The results may not be obviously, visibly, incorrect most of the time but I do believe the results are impacted. The damage done to an image being downscaled in sRGB colour space can be fairly subtle (shifting hues) and the models will be trained to guess how to reverse this process which will have unpredictable effects on output colours.

from trainner.

awused commented on June 16, 2024

I've had extremely positive results in terms of colour accuracy from training some non-trivial models locally. While I wasn't entirely happy with this one for other reasons and killed it at 183k iterations, this 2x model shows no discernable colour distortion at all.

Here's the same image round-tripped through my model and then downscaled to the original size to show colour accuracy. It's not just this image, either, I can get noticeable colour distortion passing other images through the all the other ESRGAN models I've tried (not all of them from the wiki, but a decent selection) but my model maintains colours just about perfectly.

I do believe downscaling properly makes a difference and I think at least some of the colour inaccuracies plaguing ESRGAN models can be attributed to downscaling in srgb colour. Without correcting for gamma, downscaling will cause hue shifts which the model will have to learn to reverse. I may repeat the experiment for the same duration without linear rgb downscaling but because it takes so long I'll be leaving that for later. I've already proven the results to my own satisfaction to keep using this edit locally.

from trainner.

Feature request/bug fix: Perform scaling and other operations in linear light about trainner HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent