-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does JDEC support high resolution JPEGs? #7
Comments
Well, I tried to
And this was the result:
Does this mean the network cannot process JPEG files bigger than 1120x1120? Or that I need to do more code adjustments? Any ideas? It's starting to seem like every input JPG must be exactly 1120x1120 which would severely restrict the usefulness of this network. I hope that's not true. If so, it's more interesting as a research concept but not actually useful for removing artifacts in real JPG files. |
Well, I see that the paper says:
So it seems like it can use different input sizes. But that the JPEG files dimensions must be multiples of 112? I hope you can clarify whether this network can:
If it's not possible, would it be possible to adapt the network so it can automatically pad the input JPGs to multiples of 112 so JDEC can support all dimensions? Or maybe automatically crop it, if that's the only solution?
At least then it would process all the 112x112 input slices, if that's what the network needs. It's better than not working at all for random input sizes. |
In conclusion, it's possible, but it's not what you expected it to look like. As I mentioned, we need to treat it so that it's a multiple of the lcm (112) of the swin window and the JDEC block. If you do not keep that size, you'll often see unexpected artifacts.
Getting ARB inputs will be quite interesting. As far as I know, if padding is needed, a jpeg is also required. So, even if the actual size is arb size, if you import the spectral through a module like dct-manip, you will already bring it with padding. The problem is how to make the process possible without a swin window. In a way, it can be solved simply by replacing it with a module like resblock, but it can't guarantee performance withoa ut transformer. Also, networks that receive spectrum are very sensitive (probably because each point has different energy.) It's a question of what to do with the residual energy for the arb input. |
@WooKyoungHan Thank you for explaining the details! I see that we need to be multiples of 112 since that's the smallest multiple of the JDEC block (16x16) + the Swin window (7x7). The goal for me is to load real-world JPG files into JDEC to clean them up, which is why I am trying to load images with arbitrary size. I would be happy with any of these two solutions:
For solution 2, you mention quality loss. I am guessing that there's some loss of quality in the padded blocks because the "padded" data is not real JPEG data so JDEC cannot guess what the original pre-compressed input was in the "padded" blocks, because the padding messes up the DCT spectrum? If that is the case, can JDEC be tweaked so it only gives "attention" to the original pixels in the padded JDEC block? So if a block was 15x80 and is padded to 112x112, JDEC only gives attention to the DCT spectrum inside the 15x80 area of the 112x112 padded block? The algorithm could know this by giving JDEC the original JPEG width/height dimensions (so it knows what the size was before Edit: I guess my last idea is impossible since the neural network only learns that "112x112 is my input and I need to generate a 112x112 output" and has no understanding about padding. :) |
Okay, I tried to look at https://github.com/JeongsooP/RGB-no-more/blob/main/dct_manip/dct_manip.cpp to find how to pad the data being read by Here's my ideal solution:
And I understand that the "fake 112x112 block" at the right/bottom edges will not be good quality, but still I think this would be the best solution for now. PS: I am talking about using JPEG files. Not PNG files. This is for processing real-world JPEGs to de-block them. I don't have any original PNGs for them. If my idea here is not possible with input JPEGs, then perhaps we can at least process all of the 112x112 parts of the JPEG file and CROP/IGNORE the rest of the image (ignore the blocks that are smaller than 112x112)? (That was one of my suggestions above.) |
I agree with a lot of things. There's been a lot of talk about how to use padding properly. As you said, zero padding is a good way to use memory, but mirror/periodic padding can be better quality. Thank you for suggesting a good way. JDEC is limited by traditional JPEG legacy in some ways. Image size is considered to be one of them. It is inevitable at the moment to digest JPEG files. I'm hoping someone (would love it if I do ) will overcome this with a fancy idea. I acknowledge that learning-based JPEG encoding is also being studied these days. It would be nice if it could be used in combination with those. |
Yeah, in But in So if I have understood correctly, our only solution for "real jpg input with arbitrary size" is to process all 112x112 blocks and we must drop/skip all the smaller blocks? So input JPG 3219 x 3287 => JDEC => 3136 x 3248 PNG output (skip every small non-112x112 blocks). Cropped output to only process the 112x112 squares. That's the only solution, right? |
Now I understand. Thinking it over, I realize that, as you mentioned, JDEC indeed has some ambiguous aspects when handling arbitrary inputs. Additionally, the 112 padding, while essential, is quite large. I don’t immediately have a better idea than what you proposed. Thank you for bringing this up. I believe a better approach could be developed by properly utilizing the transformation formula. I'll update you on this once it’s more thoroughly organized. |
@WooKyoungHan Thank you so much for answering. I am sorry, I did not see your message until today! Indeed, The 112x112 size is pretty large, and it's very unfortunate because JDEC is the best model I have ever seen for JPEG artifact removal (much better than Swin2SR, FBCNN, etc). But even though I would lose some pixel chunks (the non-112x112 blocks), I would love to use JDEC for personal image restoration. In many situations, an image can be safely cropped on right/bottom edges without losing anything important from the image. I wonder if you can improve the JDEC repo's file input code to automatically skip JPEG blocks that are not 112x112? Because currently, the repo cannot be used for arbitrary resolution inputs. It errors instead, with It would be great if JDEC's code processes all 112x112 blocks and auto-skips the smaller blocks. Automatically creating a cropped output (multiple of 112). That would be very useful. :)
I am curious about this. Do you mean that there may be a solution to process the smaller ( Or that the |
I also read your supplementary. JDEC is very impressive. By the way, since you have looked at the state-of-the-art pixel-based JPEG artifact removers, I wonder if you have any opinions about which pixel-based model is the best right now? I plan to use JDEC for most of my images, and a pixel-based model only if I need to keep the non-112x112 edges of an image. I suspect that one of these is the best pixel-based model:
From that list of the state-of-the-art models I have heard about, it seems like Swin2SR is the best (but slow), and FBCNN is the 2nd best. Your paper update only compared JDEC vs those 2 models specifically, which also indicates to me that they are the state-of-the-art pixel solutions? PS: To me, JDEC looks better than all of them. I hope one day that it can support arbitrary resolutions. :) Edit: It seems like Swin2SR is the best of the publicly available pixel-based solutions. I have created and shared a ChaiNNer workflow to use that artifact removal method here for batch processing with realistic results: chaiNNer-org/chaiNNer#3055 (comment) |
So far I have been experimenting with
test.py
.I see that it sets
size = 112*10
and converts the input image to1120x1120
(with some mirroring to make the image square if it's not square).But that might just be a test script decision?
Is it possible to process higher-size JPEGs with this network? Like a 6000x6000 .jpg file?
Or would it require some rewrite to use Tiled Processing, with 1120x1120 tiles?
Edit: Actually I don't even know how tiled processing would work here, since this network doesn't operate on pixels. :D Hope you got some advice for how to use high resolution JPEG input if it's even possible.
The text was updated successfully, but these errors were encountered: