shaoanlu / faceswap-gan Goto Github PK

View Code? Open in Web Editor NEW

3.4K 177.0 845.0 2.21 MB

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

Jupyter Notebook 72.61% Python 27.39%

face-swap generative-adversarial-network gan gans image-manipulation

faceswap-gan's Introduction

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture.

Updates

Date	Update
2018-08-27	Colab support: A colab notebook for faceswap-GAN v2.2 is provided.
2018-07-25	Data preparation: Add a new notebook for video pre-processing in which MTCNN is used for face detection as well as face alignment.
2018-06-29	Model architecture: faceswap-GAN v2.2 now supports different output resolutions: 64x64, 128x128, and 256x256. Default `RESOLUTION = 64` can be changed in the config cell of v2.2 notebook.
2018-06-25	New version: faceswap-GAN v2.2 has been released. The main improvements of v2.2 model are its capability of generating realistic and consistent eye movements (results are shown below, or Ctrl+F for eyes), as well as higher video quality with face alignment.
2018-06-06	Model architecture: Add a self-attention mechanism proposed in SAGAN into V2 GAN model. (Note: There is still no official code release for SAGAN, the implementation in this repo. could be wrong. We'll keep an eye on it.)

Google Colab support

Here is a playground notebook for faceswap-GAN v2.2 on Google Colab. Users can train their own model in the browser.

[Update 2019/10/04] There seems to be import errors in the latest Colab environment due to inconsistent version of packages. Please make sure that the Keras and TensorFlow follow the version number shown in the requirement section below.

Descriptions

faceswap-GAN v2.2

FaceSwap_GAN_v2.2_train_test.ipynb
- Notebook for model training of faceswap-GAN model version 2.2.
- This notebook also provides code for still image transformation at the bottom.
- Require additional training images generated through prep_binary_masks.ipynb.
FaceSwap_GAN_v2.2_video_conversion.ipynb
- Notebook for video conversion of faceswap-GAN model version 2.2.
- Face alignment using 5-points landmarks is introduced to video conversion.
prep_binary_masks.ipynb
- Notebook for training data preprocessing. Output binary masks are save in ./binary_masks/faceA_eyes and ./binary_masks/faceB_eyes folders.
- Require face_alignment package. (An alternative method for generating binary masks (not requiring face_alignment and dlib packages) can be found in MTCNN_video_face_detection_alignment.ipynb.)
MTCNN_video_face_detection_alignment.ipynb
- This notebook performs face detection/alignment on the input video.
- Detected faces are saved in ./faces/raw_faces and ./faces/aligned_faces for non-aligned/aligned results respectively.
- Crude eyes binary masks are also generated and saved in ./faces/binary_masks_eyes. These binary masks can serve as a suboptimal alternative to masks generated through prep_binary_masks.ipynb.

Usage

Run MTCNN_video_face_detection_alignment.ipynb to extract faces from videos. Manually move/rename the aligned face images into ./faceA/ or ./faceB/ folders.
Run prep_binary_masks.ipynb to generate binary masks of training images.
- You can skip this pre-processing step by (1) setting use_bm_eyes=False in the config cell of the train_test notebook, or (2) use low-quality binary masks generated in step 1.
Run FaceSwap_GAN_v2.2_train_test.ipynb to train models.
Run FaceSwap_GAN_v2.2_video_conversion.ipynb to create videos using the trained models in step 3.

Miscellaneous

faceswap-GAN_colab_demo.ipynb
- An all-in-one notebook for demostration purpose that can be run on Google colab.

Training data format

Face images are supposed to be in ./faceA/ or ./faceB/ folder for each taeget respectively.
Images will be resized to 256x256 during training.

Generative adversarial networks for face swapping

1. Architecture

2. Results

Improved output quality: Adversarial loss improves reconstruction quality of generated images.
Additional results: This image shows 160 random results generated by v2 GAN with self-attention mechanism (image format: source -> mask -> transformed).
Evaluations: Evaluations of the output quality on Trump/Cage dataset can be found here.

The Trump/Cage images are obtained from the reddit user deepfakes' project on pastebin.com.

3. Features

VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face. It also smoothes out artifacts in the segmentation mask, resulting higher output quality.
Attention mask: Model predicts an attention mask that helps on handling occlusion, eliminating artifacts, and producing natrual skin tone.
Configurable input/output resolution (v2.2): The model supports 64x64, 128x128, and 256x256 outupt resolutions.
Face tracking/alignment using MTCNN and Kalman filter in video conversion:
- MTCNN is introduced for more stable detections and reliable face alignment (FA).
- Kalman filter smoothen the bounding box positions over frames and eliminate jitter on the swapped face.
Eyes-aware training: Introduce high reconstruction loss and edge loss in eyes area, which guides the model to generate realistic eyes.

Frequently asked questions and troubleshooting

1. How does it work?

The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.

2. Previews look good, but it does not transform to the output videos?

Model performs its full potential when the input images are preprocessed with face alignment methods.

Requirements

keras 2.1.5
Tensorflow 1.6.0
Python 3.6.4
OpenCV
keras-vggface
moviepy
prefetch_generator (required for v2.2 model)
face-alignment (required as preprocessing for v2.2 model)

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and reddit user deepfakes' project. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.

faceswap-gan's People

Contributors

Stargazers

Watchers

Forkers

5kejun jjdblast giveup shafiahmed y495965825 canphp finderl chaojie lloves angilis wa-sans-tv magnetowang longjohncoder anazou clorr jchacon4 khamv hhgzljp ganzib4fun awesome-archive khchanaq minstrelsy exgile shubhampachori12110095 xiusdk zhubofei shaoweipng jordipala bigmac447 brandhill hunslater-deeplearning dimdrop prabhjotsl tyolab david30907d datar-ai th4tirishguy leezqcst sybersu huleg forsskieken niotw junhuicao sszllx adairzhao aaronedmistone unaventures hangwudy lavrovd primemover2011 fastrocket rodrmp bigrlab cleoag shoff leerock laal65 johnwall2016 keyky phantoan xxradon jonkoi leerisk tcast necrno scholltan bradfox2 wulingtian clover2008 lebronyxm johndpope sophrinix ruah1984 mrthiago kalvar vish25v qsdj huguanglong wilsonleeee loinapex dineshkumares wq49 jffifa flashus bssrdf bradypuz chickenlove andrewchan2022 gurpreetshanky mousechen mulx10 seifer08ms cmschmtt wangmn93 keyboyi cybort cosecant-csc interesting-opensource-projects duyamin lexfreeman

faceswap-gan's Issues

No preview screen showing while training

Running FaceSwap_GAN_v2_train, but no preview screen is showing.

ImportError: No module named 'keras_vggface'

Thank you for developing a great program. I installed keras_vggface as "!Pip install keras_vggface" normally. But, the following error occurred. I searched the problem on Google but could not resolve it. Could you tell me how to fix it? Thank you in advance.

IN [1]: from keras_vggface.vggface import VGGFace

ImportError Traceback (most recent call last)
in ()
----> 1 from keras_vggface.vggface import VGGFace

ImportError: No module named 'keras_vggface'

XGAN and Cycle-GAN with improved w-gan

Have you tried the two methods?
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Improved Training of Wasserstein GANs
I tried to use your code, but it's hard to achieve the effect you showed in the "readme".
I used a video of myself and a video of Daniel Wu, maybe I am too ugly that the network can not transform??? It really hurts me.

The new GAN version doesn't seem to work

I've been training the new version for 10000 + 6000 iterations and the output (from the show_g function) doesn't even start to change... I saw some outputs where the network tried to mimic PersonB but at the end of the day the three columns [test_A, path_A(test_A), path_B(test_A)] all look the same as test_A and vice versa...

Not only that, when I tried turning use_mixup to False I had this error about different number of channels here:

number of input channels does not match corresponding dimension of filter, 3 != 6
`output_real = netD(real) # positive

It seems we have to manually change nc_D_inp to 3 instead of 6.

Keep up the good work...

dlib_video_face_detection extract 2 different sizes?

186x186 & 223x222

It's ok or it's a bug?

What is your OS？

Training stops at a certain iteration

I have a GTX 1080 and have trained successfully the original NN, but when I try to train these, they run for like an hour and then stop training at a given iteration. If I rerun the function, it stops at the same one.

NameError: name 'bbox_moving_avg_coef' is not defined & Typo errors.

On FaceSwap_GAN_v2_test_video

Also there are Typo Errors, instead VIDEO say VODEO.

But even after correct that, gives the title error.

MoviePy] >>>> Building video OUTPUT_VIDEO.mp4
[MoviePy] Writing video OUTPUT_VIDEO.mp4

  0%|                                                                                                    | 0/341 [00:00<?, ?it/s]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<timed eval> in <module>()

<decorator-gen-176> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in requires_duration(f, clip, *a, **k)
     52         raise ValueError("Attribute 'duration' not set")
     53     else:
---> 54         return f(clip, *a, **k)
     55 
     56 

<decorator-gen-175> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in use_clip_fps_by_default(f, clip, *a, **k)
    135              for (k,v) in k.items()}
    136 
--> 137     return f(clip, *new_a, **new_kw)

<decorator-gen-174> in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in convert_masks_to_RGB(f, clip, *a, **k)
     20     if clip.ismask:
     21         clip = clip.to_RGB()
---> 22     return f(clip, *a, **k)
     23 
     24 @decorator.decorator

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\VideoClip.py in write_videofile(self, filename, fps, codec, bitrate, audio, audio_fps, preset, audio_nbytes, audio_codec, audio_bitrate, audio_bufsize, temp_audiofile, rewrite_audio, remove_temp, write_logfile, verbose, threads, ffmpeg_params, progress_bar)
    347                            verbose=verbose, threads=threads,
    348                            ffmpeg_params=ffmpeg_params,
--> 349                            progress_bar=progress_bar)
    350 
    351         if remove_temp and make_audio:

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\io\ffmpeg_writer.py in ffmpeg_write_video(clip, filename, fps, codec, bitrate, preset, withmask, write_logfile, audiofile, verbose, threads, ffmpeg_params, progress_bar)
    207 
    208     for t,frame in clip.iter_frames(progress_bar=progress_bar, with_times=True,
--> 209                                     fps=fps, dtype="uint8"):
    210         if withmask:
    211             mask = (255*clip.mask.get_frame(t))

C:\ProgramData\Anaconda3\lib\site-packages\tqdm\_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in generator()
    473         def generator():
    474             for t in np.arange(0, self.duration, 1.0/fps):
--> 475                 frame = self.get_frame(t)
    476                 if (dtype is not None) and (frame.dtype != dtype):
    477                     frame = frame.astype(dtype)

<decorator-gen-139> in get_frame(self, t)

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\decorators.py in wrapper(f, *a, **kw)
     87         new_kw = {k: fun(v) if k in varnames else v
     88                  for (k,v) in kw.items()}
---> 89         return f(*new_a, **new_kw)
     90     return decorator.decorator(wrapper)
     91 

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in get_frame(self, t)
     93                 return frame
     94         else:
---> 95             return self.make_frame(t)
     96 
     97     def fl(self, fun, apply_to=[], keep_duration=True):

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\Clip.py in <lambda>(t)
    134 
    135         #mf = copy(self.make_frame)
--> 136         newclip = self.set_make_frame(lambda t: fun(self.get_frame, t))
    137 
    138         if not keep_duration:

C:\ProgramData\Anaconda3\lib\site-packages\moviepy\video\VideoClip.py in <lambda>(gf, t)
    531         `get_frame(t)` by another frame,  `image_func(get_frame(t))`
    532         """
--> 533         return self.fl(lambda gf, t: image_func(gf(t)), apply_to)
    534 
    535     # --------------------------------------------------------------

<ipython-input-18-3ca7ce65bf3e> in process_video(input_img)
    170         if use_smoothed_bbox:
    171             if frames != 0:
--> 172                 x0, x1, y0, y1 = get_smoothed_coord(x0, x1, y0, y1, image.shape, bbox_moving_avg_coef)
    173                 set_global_coord(x0, x1, y0, y1)
    174                 frames += 1

NameError: name 'bbox_moving_avg_coef' is not defined

gtx 1060 with 6G ram, out of memory

I am running the script in gtx 1060 with 6G ram laptop, it halts on 56 niters and said out of memory, is there any way I could lower the memory requirement? Thanks.

ERROR: FaceSwap_GAN_v2_train.ipynb (TENSORFLOW IS UPDATED)

ERROR: https://pastebin.com/mJesKjZh

(gan) C:\Users\ZeroCool22\faceswap-GAN>conda list
packages in environment at C:\ProgramData\Anaconda3\envs\gan:

Name Version Build Channel
absl-py 0.1.10
backports 1.0 py36h81696a8_1
backports.weakref 1.0rc1 py36_0
bleach 1.5.0 py36_0 conda-forge
boost 1.64.0 py36_vc14_4 [vc14] conda-forge
boost-cpp 1.64.0 vc14_1 [vc14] conda-forge
bzip2 1.0.6 vc14_1 [vc14] conda-forge
ca-certificates 2017.08.26 h94faf87_0
certifi 2018.1.18 py36_0
click 6.7
cudatoolkit 8.0 3 anaconda
cudnn 6.0 0 anaconda
decorator 4.0.11 py36_0 conda-forge
dlib 19.4 np112py36_201 conda-forge
dlib 19.9.0
face-recognition 1.2.1
face-recognition-models 0.3.0
ffmpeg 3.4.1 1 conda-forge
freetype 2.8.1 vc14_0 [vc14] conda-forge
h5py 2.7.1 py36_2 conda-forge
hdf5 1.10.1 vc14_1 [vc14] conda-forge
html5lib 0.9999999 py36_0 conda-forge
icc_rt 2017.0.4 h97af966_0
icu 58.2 vc14_0 [vc14] conda-forge
imageio 2.1.2 py36_0 conda-forge
intel-openmp 2018.0.0 hd92c6cd_8
jpeg 9b vc14_2 [vc14] conda-forge
keras 2.0.9 py36_0 conda-forge
libgpuarray 0.7.5 vc14_0 [vc14] conda-forge
libiconv 1.14 vc14_4 [vc14] conda-forge
libpng 1.6.34 vc14_0 [vc14] conda-forge
libtiff 4.0.9 vc14_0 [vc14] conda-forge
libwebp 0.5.2 vc14_7 [vc14] conda-forge
libxml2 2.9.3 vc14_9 [vc14] conda-forge
mako 1.0.7 py36_0 conda-forge
markdown 2.6.9 py36_0 conda-forge
Markdown 2.6.11
markupsafe 1.0 py36_0 conda-forge
mkl 2018.0.1 h2108138_4
moviepy 0.2.3.2 py36_0 conda-forge
numpy 1.12.1 py36hf30b8aa_1 anaconda
numpy 1.14.0
olefile 0.44 py36_0 conda-forge
opencv 3.3.0 py36_200 conda-forge
openssl 1.0.2n h74b6da3_0
pillow 5.0.0 py36_0 conda-forge
pip 9.0.1 py36_1 conda-forge
protobuf 3.5.1 py36_vc14_3 [vc14] conda-forge
protobuf 3.5.1
pygpu 0.7.5 py36_0 conda-forge
python 3.6.4 0 conda-forge
pyyaml 3.12 py36_1 conda-forge
qt 5.6.2 vc14_1 [vc14] conda-forge
scipy 1.0.0 py36h1260518_0
setuptools 38.4.0 py36_0 conda-forge
setuptools 38.5.1
six 1.11.0 py36_1 conda-forge
six 1.11.0
sqlite 3.20.1 vc14_2 [vc14] conda-forge
tensorboard 0.4.0rc3 py36_2 conda-forge
tensorflow 1.4.0 py36_0 conda-forge
tensorflow-gpu 1.5.0
tensorflow-tensorboard 1.5.1
theano 1.0.1 py36_1 conda-forge
tk 8.6.7 vc14_0 [vc14] conda-forge
tqdm 4.11.2 py36_0 conda-forge
vc 14 0 conda-forge
vs2015_runtime 14.0.25420 0 conda-forge
webencodings 0.5 py36_0 conda-forge
Werkzeug 0.14.1
werkzeug 0.14.1 py_0 conda-forge
wheel 0.30.0
wheel 0.30.0 py36_2 conda-forge
wincertstore 0.2 py36_0 conda-forge
yaml 0.1.7 vc14_0 [vc14] conda-forge
zlib 1.2.11 vc14_0 [vc14] conda-forge

Q: What's the reasoning behind using PixelShuffler over Conv2DTranspose?

The upscale block uses pixel shuffler after 4x convolution. I understand that this is a neat way of increasing the number of coefficients and then nicely reshaping everything to bring it one resolution step up, but why this and not Conv2DTranspose?

def upscale_ps(filters, use_norm=True): def block(x): x = Conv2D(filters*4, kernel_size=3, use_bias=False, kernel_initializer=RandomNormal(0, 0.02), padding='same' )(x) x = LeakyReLU(0.1)(x) x = PixelShuffler()(x) return x return block

The Mask geneartion and learn process

I just saw the new update on mask generation and wanted to ask if you could elaborate on how it works and if the code is available to inspect...

By the way, nice idea (and implementation) !

time is 1875.640280

I am wondering if the train screen changes every 1875 seconds and this speed is correct.
Can not use this program with CUDA?

[2/150][50] Loss_DA: 0.205057 Loss_DB: 0.193327 Loss_GA: 0.415360 Loss_GB: 0.413623 time: 1875.640280
[4/150][100] Loss_DA: 0.200707 Loss_DB: 0.211183 Loss_GA: 0.295002 Loss_GB: 0.341839 time: 3606.393274

Video generation runs out of GPU

Using 2x K80s. Training works fine, but video generation always runs out of GPU memory, even with batch size 1.

Any way to fix this?

Error when run FaceSwap_GAN_v2_train.ipynb

Runing on jupyter notebook:

ERROR: https://pastebin.com/egjXVyG3

CONDA LIST:

(gan) C:\Users\ZeroCool22\faceswap-GAN>conda list

packages in environment at C:\ProgramData\Anaconda3\envs\gan:

(gan) C:\Users\ZeroCool22\faceswap-GAN>

"Weights file not found." despite them being present

try:
	encoder.load_weights("models/encoder.h5")
	decoder_A.load_weights("models/decoder_A.h5")
	decoder_B.load_weights("models/decoder_B.h5")
	# netDA.load_weights("models/netDA.h5") 
	# netDB.load_weights("models/netDB.h5") 
	print("model loaded.")
except:
	print("weights file not found.")
	pass

At this point in the code, it always fails, saying it couldn't find the weight files. They are located at ./faceswap-GAN-master/models/. Is this incorrect? I should note that the model is the Trump to Cage model from the deepfakes/faceswap project, and that I commented out netDA and netDB because they do not exist.

Any help? Thank you.

Could you change the input and output to 128, 128 for WGAN model?

Hi, I see a WGAN in temp folder. Could you help change the input and output to 128, 128 for WGAN model? I tried to change it but not succeed, Thanks!!

Make bbox tracking configurable for video processing

Some videos tend to have faster head movements than others, so the default, hardcoded value of 0.65 weight for keeping the previous coords may not suit some of the needs. Consider changing the binary parameter of use_smoothed_bbox = True, to something more like bbox_smoothing_wkeep=0.65 and change the calls to:

def get_smoothed_coord(x0, x1, y0, y1, wkeep=0.65):
    global prev_x0, prev_x1, prev_y0, prev_y1
    x0 = int(wkeep*prev_x0 + (1-wkeep)*x0)
    x1 = int(wkeep*prev_x1 + (1-wkeep)*x1)
    y1 = int(wkeep*prev_y1 + (1-wkeep)*y1)
    y0 = int(wkeep*prev_y0 + (1-wkeep)*y0)
    return x0, x1, y0, y1

if bbox_smoothing_wkeep > 0:
            if frames != 0:
                x0, x1, y0, y1 = get_smoothed_coord(x0, x1, y0, y1, wkeep=bbox_smoothing_wkeep)
                set_global_coord(x0, x1, y0, y1)
            else:
                set_global_coord(x0, x1, y0, y1)
                frames += 1

Eventually it would be great to implement some sort of tracking mechanism with a PID-like behavior.

Swapping mouth movement

Does anyone have any idea how one would approach swapping mouth movement? I.e. transforming the target's mouth shape to match the source, rather than swapping the look. This may be a different project entirely...

Loss of generator cannot be lowered down below 0.26

I used my GTX970 to train for about 45000 iters, with batchsize=16. However I found that both loss_GA and loss_GB cannot be lowered down below 0.26.
I'm not familiar with GAN, is it because my datasets' lack of diversity? Or dicriminators is over trained?
My both sets contain about 3000 images extracted from video.

Automated switching to refined mask generation

I know (and appreciate) that you added snippet to switch loss function for L2 mask refinement, but could you maybe come up with some metric (and not necessarily the number of iterations) that would express the "right moment" in training to switch to smoother loss function? Or maybe even keep the dependency on the iteration number but do it somewhat automatically so that the training can be just left alone overnight and "figure it out" by itself?

mp4 making speed

The mp4 making speed seems to be too slow.

Below (1) is my device type and (2) is the speed image that made the video. I would appreciate it if you could tell me what I need to do.

(1)https://imgur.com/P0mMDlC
(2)https://imgur.com/tqd4lyP

will you provide pretrained model? thanks~

我没有mac 没有 GPU 训练很慢！资源文件，能一个吗？

资源文件，和训练好的模型能给我一个吗？感谢 [email protected]

Have you implemented something like this for the alignment and smoothing?

Here's the Link... Do you think this approach does better with the alignments or the current code is better ?

GAN training

Hi @shaoanlu,

I am a little confused about the GAN training code.
errDA = netDA_train([warped_A,target_A])
errGA = netGA_train([warped_A,target_A])
here warped_A and target_A seems to be of same person "A". These two are referred to distorted_A and real_A in the code. If so, the learned generated fake_A should also be person "A" since your loss_G defined the L1 loss between fake_A and real_A. Where is the relationship to another person "B" in your DA and GA? So as DB and GB.

I have run the code, and indeed learned the swapping between two persons, but I don't quite understand it. Could please give some advice?

May I ask about the walkthrough?

Hi, it is an amazing project and really better result than deepfakes one. About the walkthrough, do you mean :

I run the dlib_video_face_detection.ipynb for person A (with input video of person A), then unzip the zip file and put it into ./TE/ folder
I run the dlib_video_face_detection.ipynb for person B (with input video of person B), then unzip the zip file and put it into ./SH/ folder
I run FaceSwap_GAN_github.ipynb to get the output_movie.mp4

But in the step 3, where should I set the input video in the file? Thanks.

Best Wishes,
Chi Kiu SO

is this normal GAN masks in FaceSwap ?

dots with oversharped image
is this normal , or faceswap developers adapted your plugin with errors?

Idea about matching color before/during training.

I found that one factor for archive good result depends on the variety of lighting/color the data set but sometimes we have a very limit dataset.
Is it possible to match color between two pictures before/during to improve the result of training?
http://answers.opencv.org/question/178127/matching-colors-between-two-pictures-in-opencv/

"14. Make video clips w/ face alignment" is missing?

I about to make mp4 output but I can't find "14. Make video clips w/ face alignment" in FaceSwap_GAN_v2_sz128_train.ipynb where I can find it?

IOPub data rate exceeded. (FaceSwap_GAN_v2_sz128_train)

When try to use FaceSwap_GAN_v2_sz128_train.ipynb i get this error:

PD: This not happen when i use FaceSwap_GAN_v2_train.ipynb, but i would like to try the 128 too.

read me to start for beginner

Can you help to provide the idiot instruction to those new beginner ??

Pre-trained models

First of all, very impressive results. Could you provide some pre-trained models for us to test out your implementation. Google Drive could be a good place to host the models.

Is it supposed to look like this after a night of training?

OUTPUT_VIDEO:

https://www.youtube.com/watch?v=wew_q9yJMug

Train preview images:

Also there doesn't seems to be any attempt for try to match colors?

Universal face encoder, more resilient decoders?

First: thanks for this useful tool! I'm in the process of learning ML, but reading through this project has helped greatly.

Please correct me if I'm wrong, but I see that each encoder model is intended to be specific to a pair of faces. In my experience a given encoder adapts very rapidly to the addition of a new face set. And you may reuse decoders for a given face as these are only trained with already known data.

Is there any effort to make a universal face encoder, a very large, well-trained model that can be shared publicly? This could cut down on training time and produce arbitrary A->B swaps, where the generator for B is already trained. Perhaps this would require more layers or training time, but would it be possible to leverage the features extracted from layers of frozen existing image classification networks, or networks specifically trained on detecting facial orientation or expressions.

One example I'm thinking of is MSG-Net (https://github.com/zhanghang1989/MSG-Net), which extracts VGG-16 based features to train a model for a large set of artistic styles, but also includes a separate 'inspiration' layer for a given style.

As for the decoders: with current techniques I understand that the interpretation of the abstract face vector and the resulting transformations for a particular face set must be baked into the weights of a decoder model. I've seen some quirky decoder behavior when faces A and B are very different, though. In this case, is it possible to tweak the parameters related to the distortion / warping of the training images? While generating the training data for face A, It might also be interesting if there were some way to automatically swap eyes, mouths, etc. (using opencv, face_recognition), drawing from a large training set of these features from other faces, so the decoder for face A isn't just practicing with variations of A's eyes, for instance.

Maybe it's not helpful to throw ideas out without concrete action toward implementation, but please let us know if there are any helpful experiments we can run.

Training iteration duration

How long should a single iteration take to train? For reference, I'm using a Tesla P100 and it's taking about 50 seconds.

1 & a half day of training...

Why the new face looks so out of place? (i'm very glad with the color correction).

https://www.youtube.com/watch?v=sKhBkUVeFyQ

dlib video face detection takes massive amount of time

I am not a jupyter user although I assume I have the repository setup correctly as it seems to be doing work. I have been working with other code bases for a while using the same modules so a dependency issue is likely not an issue.

When I step through the code for dlib_video_face_detection.ipynb I get to the code block where moviepy does some manipulation of an input video. According to the timestamp on the output I am looking at a very long time before it is complete.

0%| | 9/15887 [11:09<329:34:28, 74.72s/it]

The target video is 1280x720, 00:08:50 long with a data rate of 1413kbps. My hardware consists of a 3.5ghz i5, gtx1080, and 16gb of ram. Training large data sets have not been a problem for me so I am unsure of why processing a video frame by frame would take this long.

What is the purpose of the

output = '_.mp4'
clip1 = VideoFileClip("x-cropped.mp4")
clip = clip1.fl_image(process_video)#.subclip(0,10) #NOTE: this function expects color images!!
%time clip.write_videofile(output, audio=False)

block besides to run the process_video method on each frame? I have used the dlib module as a stand alone script and it processes the video in a handful of minutes pulling many faces as a result.

Would it be beneficial to pre-process the video file in ffmpeg before handing the work to the notebook in any way? Perhaps rip the frames beforehand so that moviepy would not need to step frame by frame?

"Weight files not found" + Type error during "define_loss"

I followed the FaceSwap_GAN_github notebook pretty closely. I was able to install all the packages expect dlib, which I believe we don't need for training.

I have two issues:

During "Load Models" it is throwing an error, saying that the model input has 7 layers but expects 8. I just copied over my fakeapp model. I moved on because I assumed that the program would create a new model later on.
During the loss_DA, loss_GA = define_loss(netDA, real_A, fake_A, vggface_feat) function, I am getting the following error stack. I would love some feedback, I am fairly new to all of this.

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 loss_DA, loss_GA = define_loss(netDA, real_A, fake_A, vggface_feat)
2 loss_DB, loss_GB = define_loss(netDB, real_B, fake_B, vggface_feat)

in define_loss(netD, real, fake, vggface_feat)
3 dist = Beta(mixup_alpha, mixup_alpha)
4 lam = dist.sample()
----> 5 mixup = lam * real + (1 - lam) * fake
6 output_mixup = netD(mixup)
7 loss_D = loss_fn(output_mixup, lam * K.ones_like(output_mixup))

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\ops\math_ops.py in binary_op_wrapper(x, y)
818 with ops.name_scope(None, op_name, [x, y]) as name:
819 if not isinstance(y, sparse_tensor.SparseTensor):
--> 820 y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
821 return func(x, y, name=name)
822

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
637 name=name,
638 preferred_dtype=preferred_dtype,
--> 639 as_ref=False)
640
641

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
702
703 if ret is None:
--> 704 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
705
706 if ret is NotImplemented:

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
111 as_ref=False):
112 _ = as_ref
--> 113 return constant(v, dtype=dtype, name=name)
114
115

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\constant_op.py in constant(value, dtype, shape, name, verify_shape)
100 tensor_value = attr_value_pb2.AttrValue()
101 tensor_value.tensor.CopyFrom(
--> 102 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
103 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
104 const_tensor = g.create_op(

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
368 nparray = np.empty(shape, dtype=np_dt)
369 else:
--> 370 _AssertCompatible(values, dtype)
371 nparray = np.array(values, dtype=np_dt)
372 # check to them.

~\Anaconda3\envs\fakes\lib\site-packages\tensorflow\python\framework\tensor_util.py in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 302 (dtype.name, repr(mismatch), type(mismatch).name))
303
304

TypeError: Expected float32, got /input_8 of type 'TensorVariable' instead.
`

Tensorboard support?

Feather edge of Smoothed bbox

I want to suggest adding an algorithm to feather the edges of the bounding box such as the opencv feather blender or use a opacity gradient to blend in the bounding box edges where an radius is defined. This would hide the jitter a bit better.

Question about preview images

First of all, thank you so much for such a detailed notebook!

Could you explain a little bit more about the training previews?

I am a long time developer, but my specialty is at node.js.
With this new face-swapping hype, I decided to jump on the train for fun, but some stuff is still a bit confusing for me.
I've read some GAN papers and I found this to be the most effective (and most fun!) path to do face swapping.

First of all: Why are the faces on the output sample blue-ish? Is this the correct behavior?
Second: I'm noticing some weird hard pixels around the actors' noses, specifically around Tom Hiddleston's nose. Is this also supposed to happen?
Third: The third column at the sample masks is empty. Is this correct?

Hardware
I am currently running the training script on this setup:

NVIDIA GK210 GPU w/ 12GiB DDR5 VRAM
tensorflow 1.5.0
CUDA 9.0

This is running ~1 iteration/sec.
Current loss information:

Loss_DA: 0.001124 Loss_DB: 0.000390 Loss_GA: 0.008428 Loss_GB: 0.010443

Sample after ~1k iterations

Sample after ~2.5k iterations

Most recent sample after ~3.2k iterations with mask preview

Which paper did the code based on?

Please, Which paper did the code based on? I want to understand the pipeline according to the paper.

How can I find definition for nc_D_inp?

https://github.com/shaoanlu/faceswap-GAN/blob/master/FaceSwap_GAN_github.ipynb
netDA = Discriminator(nc_D_inp)
netDB = Discriminator(nc_D_inp)

What is the Repeat Point?

Could you please elaborate a bit about the mixup in the Repeat Point?
Do we really have to manually change and run it or it's optional? and what difference does this snippet make?

For 1 ~ 10000 iteratioons, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake, distorted])
loss_G += K.mean(K.abs(fake_rgb - real))
fake_sz224 = tf.image.resize_images(fake, [224, 224]) # or set use_perceptual_loss = False

For 10000 ~ 13000 iterations, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake_rgb, distorted])
loss_G += K.mean(K.abs(fake - real))
fake_sz224 = tf.image.resize_images(fake, [224, 224]) # Ignore this line if you dont wan to use perceptual loss

For 13000 ~ 16000 or longer iterations, set:
mixup = lam * concatenate([real, distorted]) + (1 - lam) * concatenate([fake_rgb, distorted])
loss_G += K.mean(K.abs(fake - real))
fake_sz224 = tf.image.resize_images(fake_rgb, [224, 224]) # Ignore this line if you dont wan to use perceptual loss

And thank you for your work... I've been learning lots of new things from your take on Keras..

dlib face detection in video

i test the dlib face detection in my own video(about 10s), but it output only one image ,not output the each frame's face image.

Preview during video generation

As the result of the output generation for the video is somewhat unknown until the full video is processed (the pyvideo does not let you view the file during creation due to lock or muxing) it would be nice to consider a method using opencv (I haven't time test it so maybe it's totally inferior). For my own purpose I made something like this:

import cv2

cap = cv2.VideoCapture("./INPUT.mp4")
width, height, fps, fcount = [int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), cap.get(cv2.CAP_PROP_FPS), int(cap.get(cv2.CAP_PROP_FRAME_COUNT))]

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('./OUTPUT.mp4',fourcc, fps, (1920, 360), isColor=True)

for fnum in range(0, fcount):
    ret, frame = cap.read()
    frame = np.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    frame_out = np.array(process_video(frame, wkeep=0.01)) # this method actually lets you pass function parameters nicely
    frame_to_write = cv2.cvtColor(np.clip(frame_out, 0, 255).astype(np.uint8), cv2.COLOR_RGB2BGR)
    
    write_ret = out.write(frame_to_write)
    
    if fnum % 10 == 0:   # should be variable
        clear_output()
        print(frame_to_write.shape)
        print(write_ret)
        print('Frame {} / {}'.format(fnum, fcount))
        plt.figure(figsize=(12,12))
        plt.imshow(frame_out.astype(np.uint8))
        plt.show()
cap.release()
out.release()

How do I obtain decoded face result without masking.

Sorry, I'm a python newbie.
I tried with below code but result image looks darker than the preview.
Could you tell me the correct way to get decoding result without masking?

ae_input = cv2.resize(input_face, (128,128))/255. * 2 - 1        
result = np.squeeze(np.array([path_abgr_B([[ae_input]])]))     

# Trying to get ouput witout mask
raw_face = cv2.cvtColor(result[:,:,1:], cv2.COLOR_BGR2RGB)

Running "10. Start Training" itself results in out of memory error

Executing the third cell of "10. Start Training" results in following error

2018-01-27 11:21:44.433195: I tensorflow/core/common_runtime/bfc_allocator.cc:683] Sum Total of in-use chunks: 1.41GiB
2018-01-27 11:21:44.433220: I tensorflow/core/common_runtime/bfc_allocator.cc:685] Stats: 
Limit:                  1516306432
InUse:                  1516306432
MaxInUse:               1516306432
NumAllocs:                    1278
MaxAllocSize:            143701248

2018-01-27 11:21:44.433336: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***************x*******xxx************************************************************************xx
2018-01-27 11:21:44.433369: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[3,3,512,1024]

OS: ubuntu 17.10
CUDA: 8.0
Tensorflow: 1.4

Nvidia GPU model: GM108M [GeForce 920MX]

dataset for training

Hi, @shaoanlu

Thanks for your nice work!

you have shown a result of trained models transforming Hinako Sano (佐野ひなこ, left) to Emi Takei (武井咲, right). Since you have provided the source video of Hinako Sano (佐野ひなこ, left), the training data of Emi Takei (武井咲, right) is not provided. Could please kindly give a link to this data.

Thanks.