-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making continuous masking work right #1
Comments
I tried to have a look and identify the problem. -- hukosai.jpg is not in the repository, also the masks for hoover tower need to be manually copied -- -style_seg option also needs to be set? -- -color_codes black needs to be set I figured this out myself and used python neural_style.py -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg,examples/inputs/cubist.jpg -style_seg examples/segments/starry_night.png,examples/segments/cubist.png -content_seg examples/segments/hoovertowernight2a.png,examples/segments/hoovertowernight2b.png -color_codes black -backend cudnn but then I get another error, two file names are given but preprocess takes a single one content_seg_caffe = preprocess(params.content_seg, params.image_size, to_normalize=False).type(dtype) File "neural_style.py", line 91, in main would be great if you fix these minor issue so then one can focus on understanding the main blending mask problem |
@nikjetchev - looks like you checked out the master branch instead of the continuous-masking one - I made the same mistake. |
@nikjetchev @Quasimondo sorry for burying the note about using a different branch -- i should have been clearer about that! Yes, you should checkout the "continuous-masking" branch. I am keeping master with the old features to stay in sync with the original upstream. However you're right that I forgot to upload the hokusai image. I just added that. Let me know if it works for you. |
Quick summary of random experiments I posted in the Twitter thread:
In short, I don't think an approach based on gram matrices will work easily. Fast Style methods have a linear latent space rather than a matrix, and those work well. There are a few other ideas in the Twitter discussion, but too early to say. |
The "continuous-masking" branch of this repository overhauls the way masks are specified. In the master branch, you input a single
content_seg
segmentation image which is color coded according tocolor_codes
and matched to the colors associated with eachstyle_seg
image. In this branch, we instead get rid of the color codes, and input multiple grayscalecontent_seg
images, one for each style image, where the brightness corresponds positively to how much of that style is let through onto thecontent_image
. Thestyle_seg
parameter lets you also create style masks for each style image which let you extract style just from the bright region in the mask, but this is optional, defaulting to white (extract style from the entire style image). The purpose of this change is to allow for arbitrary style mixture, rather than just limited to discrete non-overlapping regions like in the master branch.For example, the following command:
produces the following output:
Notice that the
content_seg
images (hoovertowernight1a.png and hoovertowernight1b.png) are discrete, black on one side and white on the other side. This associates one half of the image fully withstarry_night.jpg
and one half withhokusai.jpg
.This works fine, but we'd like to be able to use continuous masks that blend/transition between the two style images. For example, a
content_seg
using hoovertowernight2a.png and hoovertowernight2b.png should interpolate between the two styles along the horizontal length of the output image, starting with hokusai on the left and ending at starry_night on the right. But if we try to run it:We get the following result, where at both extremes the style is transferred well, but in the middle, where both style images contribute roughly equal influence, there is little effect, and instead what we get is mostly a reconstruction of the content image.
This effect is especially visible if we run the same command as above but set
content_weight 0
to do a pure texture synthesis with no content reconstruction. The middle region appears muddy with a poor transition between the two styles.One way to fix this problem is by using covariance matrix instead of a Gram matrix for the style statistic. By adding the line
x_flat = x_flat - x_flat.mean(1).unsqueeze(1)
just before the return statementreturn torch.mm(x_flat, x_flat.t())
inGramMatrix
, and then running the same command as above withcontent_weight 0
, we get the following result where styles appear to transition horizontally, as expected, however the quality of the style reconstruction appears to be somewhat worse.It would be desirable to find a way to do good style transitions without compromising on the quality of style reconstruction, in the same way that it's possible to do this with transitioning between different class optimizations in deepdream. Possible strategies that might help is to use a different style statistic, like for example, style feature histogram loss instead of Gram or covariance matrices. Another might be to use the masks in a different way other than masking the feature activation maps.
Any insights into how to possibly improve continuous style masking are greatly appreciated...
The text was updated successfully, but these errors were encountered: