memo / webcam-pix2pix-tensorflow Goto Github PK

Source code and pretrained model for running pix2pix in realtime on a webcam feed.

Home Page: https://www.memo.tv/works/learning-to-see/

License: MIT License

Python 100.00%

ai-art aiart artificial-intelligence artificial-neural-networks deep-learning deep-neural-networks pix2pix pix2pix-tensorflow tensorflow

webcam-pix2pix-tensorflow's Introduction

This is the source code and pretrained model for the webcam pix2pix demo I posted recently on twitter and vimeo. It uses deep learning, or to throw in a few buzzwords: deep convolutional conditional generative adversarial network autoencoder.

video 1

video 2

Overview

The code in this particular repo actually has nothing to do with pix2pix, GANs or even deep learning. It just loads any pre-trained tensorflow model (as long as it complies with a few constraints), feeds it a processed webcam input, and displays the output of the model. It just so happens that the model I trained and used is pix2pix (details below).

I.e. The steps can be summarised as:

Collect data: scrape the web for a ton of images, preprocess and prepare training data
Train and export a model
Preprocessing and prediction: load pretrained model, feed it live preprocessed webcam input, display the results.

1. Data

I scraped art collections from around the world from the Google Art Project on wikimedia. A lot of the images are classical portraits of rich white dudes, so I only used about 150 collections, trying to keep the data as geographically and culturally diverse as possible (full list I used is here). But the data is still very euro-centric, as there might be hundreds or thousands of scans from a single European museum, but only 8 scans from an Arab museum.

I downloaded the 300px versions of the images, and ran a batch process to :

Rescale them to 256x256 (without preserving aspect ratio)
Run a a simple edge detection filter (opencv canny)

I also ran a batch process to take multiple crops from the images (instead of a non-uniform resizing) but I haven't trained on that yet. Instead of canny edge detection, I also started looking into the much better 'Holistically-Nested Edge Detection' (aka HED) by Xie and Tu (as used by the original pix2pix paper), but haven't trained on that yet either.

This is done by the preprocess.py script (sorry no command line arguments, edit the script to change paths and settings, should be quite self-explanatory).

A small sample of the training data - including predictions of the trained model - can be seen here. Right-most column is the original image, left-most column is the preprocessed version. These two images are fed into the pix2pix network as a 'pair' to be trained on. The middle column is what the model learns to produce given only the left-most column. (The images show each training iteration - i.e. the number on the left, which goes from 20,000 to 58,000, so it gradually gets better the further down you go on the page).

I also trained an unconditional GAN (i.e. normal DCGAN on this same training data. An example of its output can be seen below. (This is generating 'completely random' images that resemble the training data).

2. Training

The training and architecture is straight up 'Image-to-Image Translation with Conditional Adversarial Nets' by Isola et al (aka pix2pix). I trained with the tensorflow port by @affinelayer (Christopher Hesse), which is also what's powering that 'sketch-to-cat'- demo that went viral recently. He also wrote a nice tutorial on how pix2pix works. Infinite thanks to the authors (and everyone they built on) for making their code open-source!

I only made one infinitesimally tiny change to the tensorflow-pix2pix training code, and that is to add tf.Identity to the generator inputs and outputs with a human-readable name, so that I can feed and fetch the tensors with ease. So if you wanted to use your own models with this application, you'd need to do the same. (Or make a note of the input/output tensor names, and modify the json accordingly, more on this below).

You can download my pretrained model from the Releases tab.

3. Preprocessing and prediction

What this particular application does is load the pretrained model, do live preprocessing of a webcam input, and feed it to the model. I do the preprocessing with old fashioned basic computer vision, using opencv. It's really very minimal and basic. You can see the GUI below (the GUI uses pyqtgraph).

Different scenes require different settings.

E.g. for 'live action' I found canny to provide better (IMHO) results, and it's what I used in the first video at the top. The thresholds (canny_t1, canny_t2) depend on the scene, amount of detail, and the desired look.

If you have a lot of noise in your image you may want to add a tiny bit of pre_blur or pre_median. Or play with them for 'artistic effect'. E.g. In the first video, at around 1:05-1:40, I add a ton of median (values around 30-50).

For drawing scenes (e.g. second video) I found adaptive threshold to give more interesting results than canny (i.e. disable canny and enable adaptive threshold), though you may disagree.

For a completely static input (i.e. if you freeze the capture, disabling the camera update) the output is likely to flicker a very small amount as the model makes different predictions for the same input - though this is usually quite subtle. However for a live camera feed, the noise in the input is likely to create lots of flickering in the output, especially due to the high susceptibility of canny or adaptive threshold to noise, so some temporal blurring can help.

accum_w1 and accum_w2 are for temporal blurring of the input, before going into the model: new_image = old_image * w1 + new_image * w2 (so ideally they should add up to one - or close to).

Prediction.pre_time_lerp and post_time_lerp also do temporal smoothing: new_image = old_image * xxx_lerp + new_image * (1 - xxx_lerp) pre_time_lerp is before going into the model, and post_time_lerp is after coming out of the model.

Zero for any of the temporal blurs disables them. Values for these depend on your taste. For both of the videos above I had all of pre_model blurs (i.e. accum_w1, accum_w2 and pre_time_lerp) set to zero, and played with different post_time_lerp settings ranging from 0.0 (very flickery and flashing) to 0.9 (very slow and fadey and 'dreamy'). Usually around 0.5-0.8 is my favourite range.

Using other models

If you'd like to use a different model, you need to setup a JSON file similar to the one below. The motivation here is that I actually have a bunch of JSONs in my app/models folder which I can dynamically scan and reload, and the model data is stored elsewhere on other disks, and the app can load and swap between models at runtime and scale inputs/outputs etc automatically.

{
	"name" : "gart_canny_256", # name of the model (for GUI)
	"ckpt_path" : "./models/gart_canny_256", # path to saved model (meta + checkpoints). Loads latest if points to a folder, otherwise loads specific checkpoint
	"input" : { # info for input tensor
		"shape" : [256, 256, 3],  # expected shape (height, width, channels) EXCLUDING batch (assumes additional axis==0 will contain batch)
		"range" : [-1.0, 1.0], # expected range of values 
		"opname" : "generator/generator_inputs" # name of tensor (':0' is appended in code)
	},
	"output" : { # info for output tensor
		"shape" : [256, 256, 3], # shape that is output (height, width, channels) EXCLUDING batch (assumes additional axis==0 will contain batch)
		"range" : [-1.0, 1.0], # value range that is output
		"opname" : "generator/generator_outputs" # name of tensor (':0' is appended in code)
	}
}

Requirements

python 2.7 (likely to work with 3.x as well)
tensorflow 1.0+
opencv 3+ (probably works with 2.4+ as well)
pyqtgraph (only tested with 0.10)

Tested only on Ubuntu 16.04, but should work on other platforms.

I use the Anaconda python distribution which comes with almost everything you need, then it's (hopefully) as simple as:

Download and install anaconda from https://www.continuum.io/downloads
Install tensorflow https://www.tensorflow.org/install/ (Which - if you have anaconda - is often quite straight forward since most dependencies are included)
Install opencv and pyqtgraph

conda install -c menpo opencv3 conda install pyqtgraph

Acknowledgements

Infinite thanks once again to

Isola et al for pix2pix and @affinelayer (Christopher Hesse) for the tensorflow port
Radford et al for DCGAN and @carpedm20 (Taehoon Kim) for the tensorflow port
The tensorflow team
Countless others who have contributed to the above, either directly or indirectly, or opensourced their own research making the above possible
My wife for putting up with me working on a bank holiday to clean up my code and upload this repo.

webcam-pix2pix-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

roszcz eyaler sunjieee jdc08161063 wanjinchang eternallovelin benjamesbabala debuguself reallyrad d4le hsab zumbalamambo nanfengpo jkcen zhujianing neuralnetworkingtechnologies willgeary jlertle jimimased l4z3rc47 osobnjak hbcbh1999 scopalaffairs skinjester mikkelmedm marc45 jshaw iitmcvg afcarl ericbfriday lp249839965 paulchou0309 dabulina amirunpri2018 rogalag mipster nelsont tanghaoran666 qidiso shayan-taheri feenando tekiela r3a2t10 miguelramosfdz rkelln ramidecodes asdlei99 laaksonel yugakhan tamwaiban hell-to-heaven jzkelter manuliner mohammadalm chenshuxian1997 joyce-passananti pauljaspersahr fabianbehrendt guillaumeai n1ckfg motform douglasgoodwin haitonglin nftmind boytjj jrhe iq-scm carlyliu jyjodio

webcam-pix2pix-tensorflow's Issues

This is code is outdated in compare with PDF article "Learning to See: You Are What You See"

@memo , this is a great project! Thank you for making it real, and for sharing a code! I love original video "Learning to see" (as well as Diamanda Galas's voice!), and I'm trying to use same video instrument in my artwork with different datasets (themes).

I've carefully read and found out, that this code has no "downscale" and "normalise" features.
But much more sad, that you have not released a code for pix2pix training! I mean that thing with random variation of parameters during image batch creation routine. Without that code whole this thing doesn't give results on quality level close to your masterpiece "Learning to see".

I'll appreciate if you release training code as well, to make this thing publicly available.
Thanks a lot!

Replace webcam for video capture

Hi Memo,
I've been playing with the code in the repo and have to say is a great piece of software to start learning.
I'm new to python, opencv, tensorflow but I'm starting to get comfortable with them :)
I do not have access to a webcam anymore so I wonder if you can help me a bit...I imagine is a very simple thing.

In the capturer.py I've changed line #32 to this:
self.cvcap = cv2.VideoCapture('/path/to/video.mov')

Then, when I execute webcam-pix2pix.py everything works but the frame of the video seems to be freezed, always stuck in the 1st frame.
Any clue why is happening? I'm sure is more related to OpenCV but can't get it working.

And another silly question (pardon me), how difficult would be to work with larger image sizes, like 1024x1024?
I guess more computational power but anything else? Do you think is possible?

Thanks for your time and for sharing some of your work :)

Not compatible with current pix2pix and its models exported

If you try your own trained models, you'll get the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype string and shape [1]

Current pix2pix (mode=export) use base64 PNG as an input value. It is easy to fix webcam-pix2pix-tensorflow in predictor.py, patch starting from line 77:

-        img = np.expand_dims(img, 0) if len(img.shape) < 4 else img
-        model_out = self.sess.run( [self.output_op], { self.input_op: img })[0]

+        pil_img = Image.fromarray(img, mode="RGB")
+        buff = BytesIO()
+        pil_img.save(buff, format="PNG")
+        img_base64 = base64.urlsafe_b64encode(buff.getvalue())
+        model_out = self.sess.run( [self.output_op], { "Placeholder:0": [img_base64] })[0]

Custom models - predictions don't seem to be influenced by camera input

Hello :)

I'm encountering an issue running my own models generated through pix2pix-tensorflow. Check out the following video:

Untitled.mp4

In the video I'm switching between two pre-trained TF models: The model you provided from Google Art, and generated from pictures of animals. Notice that when I switch to the animal model, the prediction window looks like a fast slideshow of existing animal pictures.. it doesn't seem to be affected much by the input from the camera. When I switch to the GArt model, it looks as expected.

I have tried creating models from animals, flowers, and even other artworks. Image sets have ranged in size from 100 to 8,000 pictures. I have tried different numbers of epochs (ranging from 10 to 500), as well as different batch sizes. When I put any of these through webcam-pix2pix, I get the same "slideshow" effect. I just can't seem to make the predictions look like the input from the webcam. Tweaking the capture/processing parameters on the options panel doesn't do anything, either.

When I train the models, I follow the instructions listed here and on the pix2pix page, with one caveat: I am not doing the "export" step because doing so causes webcam-pix2pix to throw an error. Instead, I am using the models as they are exported from "train". This is my first suspect for why my output is looking weird, but I'm not certain. The error I get when running an exported model is this:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input' with dtype float and shape [256,256,3]

My commands for training/exporting models vary but are generally of the form:
python pix2pix.py --mode train --input_dir ../meta-mirror/image-data/animals_canny --output_dir ../meta-mirror/models/animals-temp --which_direction BtoA
python pix2pix.py --mode export --input_dir ../meta-mirror/image-data/animals_canny --output_dir ../meta-mirror/models/animals --checkpoint ../meta-mirror/models/animals-temp

I have tested the models as well and they all look as expected:

Would love some advice on what I could be doing differently. I am using Tensorflow v.1.13.1 for both training and prediction. Thank you.

pics

What is Opname, giving error as below

memo, I've been following since a long and really amazed by your work.
I would say Thank you so much for giving such tool which can inspire people to create things.
You rock! :)

And ya... I've created this issue as I've trained my custom model on some 3000 Keyboard images and now trying to put the model as you suggested but getting the below error since yesterday, not able to find out what's wrong.

Capturer.init with 0 (480, 640) 30 [286, 286, 3]
   Initialized at 640x480 at 30.0fps
Capturer.run
Traceback (most recent call last):
  File "/Users/kadia/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 942, in _run
    allow_operation=False)
  File "/Users/kadia/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2584, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/Users/kadia/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2626, in _as_graph_element_locked
    "graph." % (repr(name), repr(op_name)))
KeyError: "The name 'model-36200:0' refers to a Tensor which does not exist. The operation, 'model-36200', does not exist in the graph."

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "webcam-pix2pix.py", line 112, in <module>
    img_predicted = predictor.predict(img_in)[0]
  File "/Users/kadia/Downloads/webcam-pix2pix-tensorflow-master/msa/predictor.py", line 76, in predict
    model_out = self.sess.run( [self.output_op], { self.input_op: img })[0]
  File "/Users/kadia/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/Users/kadia/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 945, in _run
    + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: The name 'model-36200:0' refers to a Tensor which does not exist. The operation, 'model-36200', does not exist in the graph.

problem in results after training

Hello Memo!!!

i trained my model with 20000 photos form space to simulate a video you published zooming your eye etc
The result has nothing to do with your results I would really appreciate your help.
The result from a video with scarlett jojanson speech is the following. what am i doing wrong?

I run it with 500 epochs
I suspect something wrong when run preproccess.py am i right? Can you please help?

Thank you in advance!!

Installation successful on Windows 10. Python 3.6.8.

After many conflicts installing opencv3, I wanted to share this configuration working on Windows 10:
CUDA 10.0.132
NVidia 417.35
cudnn64_7.dll
opencv 4.1.0
tensorflow-gpu 1.13.1
numpy 1.16.3
python 3.6.8

Opencv3 is not available at menpo anymore:
conda install -c menpo opencv3

For my configuration it worked following this installation of opencv:
pip install opencv-python
pip install opencv-contrib-python

Custom model

Hello,

First, I'd like to thank you a lot for sharing your code in open source, it's an amazing project !

I have spent 3 days trying to create my own model with no success.

I am on OSX, using Tensorflow 1.2 (try with Tensorflow 1.4 though), your model works well but when I switch to my JSON I had plenty of errors I have been able to solve. But this one, let me clueless.
I train my model with https://github.com/affinelayer/pix2pix-tensorflow as you describe it, in your readme.

Capturer.init with 0 (480, 640) 30 [256, 256, 3] Initialized at 640x480 at 30.00003fps Capturer.run Traceback (most recent call last): File "webcam-pix2pix.py", line 112, in <module> img_predicted = predictor.predict(img_in)[0] File "/Users/jocelynlecamus/tensorflow/webcam/msa/predictor.py", line 76, in predict model_out = self.sess.run( [self.output_op], { self.input_op: img })[0] File "/Users/jocelynlecamus/anaconda3/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/Users/jocelynlecamus/anaconda3/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 975, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 256, 256, 3) for Tensor u'load_images/input_producer/Const:0', which has shape '(40,)'

40 is the number of images I trained (I know it's a small number but I am trying to make the things work before taking my time to generate a real complete model)

Do you have any idea or where the problem might come?
Thank you so much for your help, and sorry for my awful english skill.

Best

AttributeError: 'module' object has no attribute 'CAP_PROP_FRAME_WIDTH'

Hello,

First of all thanks for this amazing piece of code.

I got this problem though :

Traceback (most recent call last):
File "webcam-pix2pix.py", line 74, in
capture = init_capture(capture, output_shape=predictor.input_shape)
File "webcam-pix2pix.py", line 63, in init_capture
output_shape = output_shape
File "/home/corepan/Desktop/webcam-pix2pix-tensorflow-master/msa/capturer.py", line 31, in init
self.cvcap.set(cv2.CAP_PROP_FRAME_WIDTH, capture_shape[1])
AttributeError: 'module' object has no attribute 'CAP_PROP_FRAME_WIDTH'

Would you have an idea of what this could be ?

Thank so much.