megvii-research / eccv2022-rife Goto Github PK

View Code? Open in Web Editor NEW

4.4K 76.0 433.0 10.76 MB

ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

License: MIT License

Python 96.64% Jupyter Notebook 2.85% Dockerfile 0.44% Shell 0.08%

video-interpolation computer-vision slomo-filter deep-learning aigc

eccv2022-rife's Introduction

Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Introduction

This project is the implement of Real-Time Intermediate Flow Estimation for Video Frame Interpolation. Currently, our model can run 30+FPS for 2X 720p interpolation on a 2080Ti GPU. It supports arbitrary-timestep interpolation between a pair of images.

2024.08 - We find that 4.22.lite is quite suitable for post-processing of some diffusion model generated videos.

2023.11 - We recently release new v4.7-4.10 optimized for anime scenes! We draw from SAFA’s research.

2022.7.4 - Our paper is accepted by ECCV2022. Thanks to all relevant authors, contributors and users!

From 2020 to 2022, we submitted RIFE for five submissions（rejected by CVPR21 ICCV21 AAAI22 CVPR22). Thanks to all anonymous reviewers, your suggestions have helped to significantly improve the paper!

ECCV Poster | ECCV 5-min presentation | 论文中文介绍 | rebuttal (2WA1WR->3WA)

YouTube | BiliBili | Colab | Tutorial | V2EX

Pinned Software: RIFE-App | FlowFrames | SVFI (中文)

16X interpolation results from two input images:

Software

RIFE-App(Paid) | Steam-VFI(Paid)

We are not responsible for and participating in the development of above software. According to the open source license, we respect the commercial behavior of other developers.

VapourSynth-RIFE | RIFE-ncnn-vulkan | VapourSynth-RIFE-ncnn-Vulkan | vs-mlrt

If you are a developer, welcome to follow Practical-RIFE, which aims to make RIFE more practical for users by adding various features and design new models with faster speed.

You may check this pull request for supporting macOS.

CLI Usage

Installation

git clone [email protected]:megvii-research/ECCV2022-RIFE.git
cd ECCV2022-RIFE
pip3 install -r requirements.txt

Download the pretrained HD models from here. (百度网盘链接:https://pan.baidu.com/share/init?surl=u6Q7-i4Hu4Vx9_5BJibPPA 密码:hfk3，把压缩包解开后放在 train_log/*)
Unzip and move the pretrained parameters to train_log/*
This model is not reported by our paper, for our paper model please refer to evaluation.

Run

Video Frame Interpolation

You can use our demo video or your own video.

python3 inference_video.py --exp=1 --video=video.mp4

(generate video_2X_xxfps.mp4)

python3 inference_video.py --exp=2 --video=video.mp4

(for 4X interpolation)

python3 inference_video.py --exp=1 --video=video.mp4 --scale=0.5

(If your video has very high resolution such as 4K, we recommend set --scale=0.5 (default 1.0). If you generate disordered pattern on your videos, try set --scale=2.0. This parameter control the process resolution for optical flow model.)

python3 inference_video.py --exp=2 --img=input/

(to read video from pngs, like input/0.png ... input/612.png, ensure that the png names are numbers)

python3 inference_video.py --exp=2 --video=video.mp4 --fps=60

(add slomo effect, the audio will be removed)

python3 inference_video.py --video=video.mp4 --montage --png

(if you want to montage the origin video and save the png format output)

Extended Application

You may refer to #278 for Optical Flow Estimation and refer to #291 for Video Stitching.

Image Interpolation

python3 inference_img.py --img img0.png img1.png --exp=4

(2^4=16X interpolation results) After that, you can use pngs to generate mp4:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -c:v libx264 -pix_fmt yuv420p output/slomo.mp4 -q:v 0 -q:a 0

You can also use pngs to generate gif:

ffmpeg -r 10 -f image2 -i output/img%d.png -s 448x256 -vf "split[s0][s1];[s0]palettegen=stats_mode=single[p];[s1][p]paletteuse=new=1" output/slomo.gif

Run in docker

Place the pre-trained models in train_log/\*.pkl (as above)

Building the container:

docker build -t rife -f docker/Dockerfile .

Running the container:

docker run --rm -it -v $PWD:/host rife:latest inference_video --exp=1 --video=untitled.mp4 --output=untitled_rife.mp4

docker run --rm -it -v $PWD:/host rife:latest inference_img --img img0.png img1.png --exp=4

Using gpu acceleration (requires proper gpu drivers for docker):

docker run --rm -it --gpus all -v /dev/dri:/dev/dri -v $PWD:/host rife:latest inference_video --exp=1 --video=untitled.mp4 --output=untitled_rife.mp4

Evaluation

Download RIFE model or RIFE_m model reported by our paper.

UCF101: Download UCF101 dataset at ./UCF101/ucf101_interp_ours/

Vimeo90K: Download Vimeo90K dataset at ./vimeo_interp_test

MiddleBury: Download MiddleBury OTHER dataset at ./other-data and ./other-gt-interp

HD: Download HD dataset at ./HD_dataset. We also provide a google drive download link.

# RIFE
python3 benchmark/UCF101.py
# "PSNR: 35.282 SSIM: 0.9688"
python3 benchmark/Vimeo90K.py
# "PSNR: 35.615 SSIM: 0.9779"
python3 benchmark/MiddleBury_Other.py
# "IE: 1.956"
python3 benchmark/HD.py
# "PSNR: 32.14"

# RIFE_m
python3 benchmark/HD_multi_4X.py
# "PSNR: 22.96(544*1280), 31.87(720p), 34.25(1080p)"

Training and Reproduction

Download Vimeo90K dataset.

We use 16 CPUs, 4 GPUs and 20G memory for training:

python3 -m torch.distributed.launch --nproc_per_node=4 train.py --world_size=4

Revision History

2021.3.18 arXiv: Modify the main experimental data, especially the runtime related issues.

2021.8.12 arXiv: Remove pre-trained model dependency and propose privileged distillation scheme for frame interpolation. Remove census loss supervision.

2021.11.17 arXiv: Support arbitrary-time frame interpolation, aka RIFEm and add more experiments.

Recommend

We sincerely recommend some related papers:

CVPR22 - Optimizing Video Prediction via Video Frame Interpolation

CVPR22 - Video Frame Interpolation with Transformer

CVPR22 - IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation

CVPR23 - A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

CVPR23 - Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

Citation

If you think this project is helpful, please feel free to leave a star or cite our paper:

@inproceedings{huang2022rife,
  title={Real-Time Intermediate Flow Estimation for Video Frame Interpolation},
  author={Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}

Reference

Optical Flow: ARFlow pytorch-liteflownet RAFT pytorch-PWCNet

Video Interpolation: DVF TOflow SepConv DAIN CAIN MEMC-Net SoftSplat BMBC EDSC EQVI

eccv2022-rife's People

Contributors

Stargazers

Watchers

Forkers

evie404 styler00dollar yukisakuma stonecypher scys davidalphafox carlosfranco9 louislwang ideaplexus ywu40 wikiworker gavinatthu michaelwhite34 icer1 bradsegal dbonattoj rbozydar macca69 aihill kakalovedata hhy5277 hadryan a1600012888 satrusskumar arseniysky movane xrosliang yxyume nasa03 xidiancpy awesome-archive chonspqx zctt00 cv-ip goswamig shaunstanislauslau oysteinkrog mar2ck mkemka alexsevas jpotier zotikus1001 janfschr smeshing zeta1999 iwillcodeu cwb96 happyxuwork codehxj arryboom bensonlp saoruy tonywork juanchowang dynmi seranus assassindesign legendarydaim anxiaoci bravew erickang08 zhigaloff helloworldcn worthless443 melih-durmaz leo-ryu nikhil0003 ajichand2009 chappjo riku-42 x3nosiz tubbz-alt sloganking liuguoyou rj-0605 delldu zouppa aertist yrcrcy miragine natsuo117 projektosmium talosh abhishekvermasg ak9250 n00mkrad dut3062796s danxcorp 009twb 03050903 bencoster abhishekanimatron seominlee niceban ko1n stakhiev-alexander lucawiouh battyone 2362524514 xiajiufan

eccv2022-rife's Issues

Cannot load model properly

Hello,
When I run
python inference_img.py --img demo/I0_0.png demo/I0_1.png --exp=4

I get this error

Traceback (most recent call last):
File "inference_img.py", line 18, in
model.load_model('./train_log', -1)
File "/workspace/interpolation/RIFE/model/RIFE_HD.py", line 179, in load_model
convert(torch.load('{}/flownet.pkl'.format(path), map_location=device)))
File "/opt/conda/envs/RIFE/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for IFNet:
Missing key(s) in state_dict: "block3.conv0.0.weight", "block3.conv0.1.weight", "block3.conv0.1.bias", "block3.conv0.1.running_mean", "block3.conv0.1.running_var", "block3.conv0.2.weight",
...

Could you please tell me how to fix this?
(I followed the instructions on the github page and the youtube video)

Not first realtime

Hi,

In your paper, you write that “Our proposed RIFE is the first flow-based and real-time VFI algorithm that can process 720p videos at 30FPS.”

I believe this is incorrect; DIS (Dense Inverse Search), a VFI algorithm, was published in 2016 (https://github.com/tikroeger/OF_DIS), is flow-based, and on an RTX 2070 can run stable at 1080p at 60 fps. See https://nageru.sesse.net/ for my GPU implementation (scroll down to Futatabi).

Image output

Great work ! I wonder if you can add image sequence output function cuz I always want to use lossless encode.

"inference_video.py" throws a segmentation fault after finishing interpolation

The outputted video file seems to be left finished in a viewable state. So effectively all this does is make inference_video.py terminate ungracefully after it's finished it's task. However this may be indicative of a greater issue.

Image sequence and input

     Thanks for adding the png output function. Can you make the output name to be consistent with ffmpeg ? i.e. 0000.png 0001.png ----- 7821.png.And then we can use ffmpeg to deal with image sequence.
     Adding image sequence input would also be great.

内存需求怎么计算

首先感谢项目。
问题是：只有CPU没有GPU的机器，32g内存跑1080p视频8X，日志显示"Killed"退出，怀疑是内存不够。想要跑4K视频8X需要多少内存?

怎样能补出DAIN那种特殊的丝滑感

同样的视频补4倍帧率，为什么感觉DAIN更丝滑一些
DAIN: https://drive.google.com/file/d/1_P4GmHiurc31pNenIMkfw2jd3LDn_7-L/view?usp=sharing
RIFE: https://drive.google.com/file/d/1MoUG4wfidarmuYIKlhbq4pFbsHSZNEHx/view?usp=sharing

about trained-model file in README.md

file RIFE_trained_model_v1.1.zip from pan.baidu.com which descripted in README.md is broken; but drive.google file RIFE_trained_model_new.zip is good enough.

Google Colab

Willing to set up a Google Colab notebook or a Docker setup?
interested in seeing results for 3DCG interpolation

Non-Windows - Multiprocessing for ~2x processing speed

I profiled the code and you can expect roughly another 2x processing speed increase if you create a multiprocessing script and split the inferences apart from the image writing.

Unfortunately I just found out the hard way you cannot pipe CUDA tensors on Windows, but Linux systems should be able to do this.

请问有没有计划做成真实时播放器或者插件

Output videos are shorter than input videos

Videos created with commands such as python3 inference_video.py --exp=1 --video=video.mp4 are shorter than their source videos. How much shorter seems to vary significantly depending on the source video. I believe this to be due to dropped frames even though the --skip flag is not being used. ~~There is also the possibility of output video frames being out of order which may also affect timing~~ (my mistake that was part of the source video I used). I am working on providing examples for this.

This issue will cause desync between video and sound if it's added, as requested in #12

Counting frames takes forever on some videos

Before interpolation, RIFE seems to be running ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 video.mp4 in order to get the number of frames in the input video. This command is slow on some types of files and can take tens of minutes to complete. Delaying the start of interpolation significantly.

Using ffmpeg, ffmpeg -i input.mkv -map 0:v:0 -c copy -f null - the counting of frames can be done within seconds instead of minutes. Though the output may take some parsing to isolate the frame count.

A discussion involving both commands:
https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg#28376817

8x Interpolation Support for inference_video.py

Hello,

is it possible to enable 8x (exp 3) interpolation for video inference?

Outputted videos are "very slightly" shorter than input videos

This issue is similar to #23 but I believe a different bug is causing this.

Outputted videos are shorter by usually only a few seconds. If I rip the audio from the source video and give it to the new video, the beginning of the video is synced up with the audio but the video slowly gets more de-synced with the audio until it's end. This issue is more apparent with longer videos. This is due to frames being dropped very rarely.

Assertion error:

I made it do that we can directly download and upscale videos from youtube, It worked for 360p videos but is not working for 720p.
any idea on how to solve?

Here is the error:

`myvideo.mp4, 2664.0 frames in total, 30.0FPS to 120.0FPS
33% 889/2664.0 [03:21<8:35:12, 17.42s/it]Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 271, in _read_frame_data
assert len(arr) == framesize
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference_video.py", line 80, in
for frame in videogen:
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/io.py", line 253, in vreader
for frame in reader.nextFrame():
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 297, in nextFrame
yield self._readFrame()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 281, in _readFrame
s = self._read_frame_data()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 275, in _read_frame_data
raise RuntimeError("%s" % (err1,))
RuntimeError
33% 889/2664.0 [03:21<06:42, 4.41it/s]`

Link to COLAB Notebook

Progress bar

Would be useful to have a progress bar!

What does --fps do?

In your colab example, you upsample a 25 fps video by 2x, which seems like it ought to produce 50fps, but then you encode it with --fps 60, and the output is in fact 60fps

does that mean every fifth frame is being repeated? or is it not actually 2x but rather some fraction above 2x

CUDA out of memory though there supposed to be enough VRAM

n00mkrad/flowframes#2 (comment)

I got below message when running RIFE integrated in Flowframes.

CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Full stack is as below. No other VRAM-sensitive apps were running at the meanwhile.

12-2-2020 16:29:30: [E] Traceback (most recent call last):
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 138, in <module>
12-2-2020 16:29:30: [E]     inferences = make_inference(model, I0, I1, exp=args.times)
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 110, in make_inference
12-2-2020 16:29:30: [E]     middle = model.inference(I0, I1)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 207, in inference
12-2-2020 16:29:30: [E]     return self.predict(imgs, flow, training=False).detach()
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 191, in predict
12-2-2020 16:29:30: [E]     refine_output, warped_img0, warped_img1, warped_img0_gt, warped_img1_gt = self.fusionnet(
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 118, in forward
12-2-2020 16:29:30: [E]     x = self.up3(torch.cat((x, s0), 1))
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\container.py", line 117, in forward
12-2-2020 16:29:30: [E]     input = module(input)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\conv.py", line 905, in forward
12-2-2020 16:29:30: [E]     return F.conv_transpose2d(
12-2-2020 16:29:30: [E] RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Model v2 update log

We show some hard case results for every version model.
v2 google drive download link: (https://drive.google.com/file/d/1wsQIhHZ3Eg4_AfCXItFKqqyDMB4NS0Yd/view).

v1.1 2020.11.16 链接:https://pan.baidu.com/s/1SPRw_u3zjaufn7egMr19Eg 密码:orkd

what Environment variables need to be set (torch distributed)

Running train_WIP.py with default arg values excepts with error: some environment variables are not set.
original-error

Traceback (most recent call last):
  File "train_WIP.py", line 11, in <module>
    torch.distributed.init_process_group(backend="nccl", world_size=4)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 166, in _env_rendezvous_handler
    raise _env_error("MASTER_ADDR")

The env variable RANK is not set. So I set an arbitary env var with os.environ["RANK"] = "1"; it complained that another variable "MASTER ADDR" is not set. I assume there's either a pre-setup needed for torch. distributed which is not set on my system.

Parallel processing for x2?

The parallel processing for the x4 option is great, would love to see this added to the x2 version as well.

Don't know what's wrong! :/

So I followed all the defined steps.

Git cloned the repo locally
Downloaded all the requirements
Created a folder inside repo called train_log and moved all the extracted *.pkl files inside
Try to apply frame interpolation on a video like python3 inference_video.py --exp=1 --video=video.mp4

I get this error:

Traceback (most recent call last):
  File "inference_video.py", line 81, in <module>
    lastframe = next(videogen)
  File "/Users/kabhinavaditya/.pyenv/versions/3.8.5/lib/python3.8/site-packages/skvideo/io/io.py", line 240, in vreader
    assert _HAS_FFMPEG, "Cannot find installation of ffmpeg."
AssertionError: Cannot find installation of ffmpeg.

Could you please help me?

I'm having trouble applying this.

I think I'm doing this incorrectly.

Upload my video.
1. My video is 30fps, but the original was 23.97fps. I can't change this.
2. I get a ton of Warning: Your video has 7556 static frames, it may change the duration of the generated video.
3. Input video is 4 minutes 54 seconds.
I run your command from your readme
1. I use the 4x version
  1. !python3 inference_mp4_4x.py --video myvideoname.mp4 --fps=120
  2. I change the fps to 120, because I expect 4x30fps
2. Output file is ... 41 seconds? I had expected 4:54, not 0:41
  1. Maybe ... this is about static frames?
3. It's the same video, but seemingly at random, most of the frames are dropped
  1. A little over 6 in 7 are missing
  2. It is 120fps though

replicating benchmarks

Thank you for sharing your code! I was trying to replicate the numbers you stated in your paper using this implementation but have unfortunately been unsuccessful so far. Would you be able to share a script that can be used to replicate the Vimeo-90k metrics you quoted? Also, I think the following padding has some issues.

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L26-L28

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L45

The pw - w and [:h, :w] indicate that pw > w (and ph > h). However, pw = 340 // 32 * 32 = 320 for w = 340 which violates this condition. Thanks for looking into this and thanks again for sharing your code!

Better hosting for the models.

Would it be possible to provide better hosting for the models?
Currently, automating the downloads when a new version comes out is hard, even if technically possible.
I think you could use the git lfs feature for that, it gives 1 GB of storage and 1 GB of bandwith a month, altough maybe there are more suitable options.

Support for AMD GPUs in the future?

As above?

Could RIFE solve the gap between different parts?

For my perspectives, Dain will output some ugly frames.
itch

Error with recompute_scale_factor=True

I've been trying to interpolate a 24fps mp4 video and this error gets thrown whenever I try to run either inference_video.py or inference_video_parallel.py.

C:\Python38\lib\site-packages\torch\nn\functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "

When running the parallel version, I also experience about a 20% decrease in speed of interpolation. On my 1070ti with latest drivers and cuda 11.0, a 1080p video has about 3.6 fps of interpolation. With the parallel script that speed reduces to around 2.9-3 fps.

Not the fastest for multi-frame interpolation

Hi,

Thanks for open sourcing the code and contributing to the video frame interpolation community.

In the paper, it mentioned: "Coupled with the large complexity in the bi-directional flow estimation, none of these methods can achieve real-time speed"

I believe that might be inappropriate to say, as the recent published paper (https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720103.pdf) targets efficient multi-frame interpolation.

It utilizes bi-directional flow estimation as well, but it generates 7 frames for 0.12 second. where your method requires 0.036 * 7 = 0.252 seconds.

And the model from that paper is compact, which consists of only ~2M parameters, where your fast model has ~10M parameters.

A work-in-progress vulkan port :D

https://github.com/nihui/rife-ncnn-vulkan

Update requirements.txt

+moviepy

Need proper sync for buffer

The buffer is not thread safe. It might result in generating broken frame. Better use queue for buffer.

https://github.com/hzwer/arXiv2020-RIFE/blob/da21de1c813ab1c91de091026706429f3cd89424/inference_video_parallel.py#L53
https://github.com/hzwer/arXiv2020-RIFE/blob/da21de1c813ab1c91de091026706429f3cd89424/inference_video_parallel.py#L125-L128

Data augmentation bug on v1.2~v1.3

We found a data augmentation bug, which was more serious in v1.3, so we can't confirm the performance improvement of v1.3, and withdraw a version update. This bug leads to poor quantitative performance of the current model on the benchmark and is expected to be fixed in 3 days.

We think this is also a major reason for the poor performance of 2d animation.

有办法加入对timestep的支持么

Memory leak

running inference_video.py makes the python interpreter eat memory untill system crash. 6 commits ago, it never used more than 1.4GB.

ONNX export

Hi, as we talked about on reddit ONNX export does not work, due to missing support of operator (grid_sampler) in the ONNX spec.
I see that it might be possible define custom ONNX operator on export, and the possibly do the same when importing into e.g. tensorflow. The missing operator would need custom implementation in the framework that the model is imported to, but this seems to already exist (atleast what I can find using google).

关于flow_gt于loss_dis

作者您好，我有点疑问。
在loss代码中，根据权重， loss_cons应该是论文中的loss_dis吧？
for i in range(3):
loss_cons += self.epe(flow_list[i], flow_gt[:, :2], 1)
loss_cons += self.epe(-flow_list[i], flow_gt[:, 2:4], 1)

定义是这样的，flow_list[i]于-flow_list[i]是代表0->1和1->0？
论文中的是0->t，和t->1？

Interpolating lead to poor quality and jamming

https://pan.baidu.com/s/1m72KUBkUApodDDyrGZPhqQ 提取码: kivu

The 2x and 8x video is jamming in the 0:01..It's hard to say that the video after inserting frames has better MOS, but the speed has really improved a lot.

能否添加帧时间调整功能

类似于电视上的MEMC,DeJudder 强度调节.
插值帧更接近原始帧,造成电影感,避免Soap opera effect.
效果演示:
https://www.youtube.com/watch?v=3Ny49gW7fqU

Support HD videos

I found that the effect of the current model on small images is much better than on 1080p video. I plan to release a new model for large-resolution videos. Under the premise of the same processing speed, the results of 1080p and 2K video can be significantly improved (preliminary verification).

Code problems in inference_mp4_2x.py

line #77 and line #79 should be deleted. Cuz mid0 and mid2 are not defined

Problems in inference_img.py

I use

$ python3 inference_img.py --img img0.png img1.png --times=4

, and I got this problem interpolate() got an unexpected keyword argument 'recompute_scale_factor' in IFNet.py line 95. Could you help me find out what happened, thanks

Better version declarations.

It would be good to provide the version number in some better way, e.g. commit tags. Currently, if I wanted to make a script to package RIFE automatically , I'd have to parse the first line of README.md for the version, but of course there's a big chance it will break in the future. There might be also better ways to do it than the tags, I don't use git often, I just found it after quick searching.

training code

Thanks you for your nice work!!!
Do you have any plan for releasing training code?

Thanks in advance!

Add argument to keep sound ?

Sorry if I am overstepping the bounds of the project, but I have found that the interpolated output has no sounds. It would be awesome if you could add an argument (like --sounds) to keep the sound on the generated output -- or at least when output is the same length as input.

EDIT: Temporary solution:
ffmpeg -i "$video_name" audio.mp3 -y
ffmpeg -i "video_4x.mp4" -i "audio.mp3" -map 0:0 -map 1:0 -c:v copy -c:a copy "video_4x_audio.mp4" -y

Training - Animation

I'd like to start training a model for animation if you haven't start so already.

Any tips for getting started?
e.g. How should I prep the input data to feed to the train script? Any benefits to using a higher or lower learning rate? How much VRAM a 720p image needs vs 1080p?

FPS Limit

Is there a limit for maximum FPS that can be created? Or it's a Flowframes app issue? It seems it can't go higher then 500 fps, saying "Invalid target frame rate".

TypeError when trying to output PNG frames

I'm getting this error if I add the --png argument:

xception ignored in thread started by: <function clear_buffer at 0x00000158A426F310> Traceback (most recent call last): File "inference_video_parallel.py", line 83, in clear_buffer cv2.imwrite('output/{:0>7d}.png'.format(cnt), i) TypeError: unsupported format string passed to numpy.ndarray.__format__

There's a problem

这里执行程序总是提示这个https://drive.google.com/file/d/19wbvu4B0Td9y_mS-ECaWaApQ-Nwi4e6o/view?usp=sharing
在本地笔记本没有问题，有哪里需要修改的吗？？