cdfi,tding1

class Loss in loss.py return a loss_sum of size 2, which should be a scalar?

Hello, thanks for you wonderful code! I wonder why your loss function gives a loss of 2 elements'tensor instead of a scalar?
I encounterd an ERROR when running loss.backward(), which is due to the non-scalar loss. I suppose the code should be fixed as:
for r in self.regularize:
effective_loss = r['weight'] * output[r['type']]
#**********************
effective_loss = sum(effective_loss) #I added this.
#**********************
losses.append(effective_loss)
Is that correct?

Train problem

Hi，my friend.
I use your pretrained CDFI_adacof.pth ,and train on my own dataset (5fps video, larger movement object ,about 10000 triple data,same format as vimeo_triplet)

Training from 88 epoch (your pretrained model) to 150 epoch , cost a few days on my 1080Ti.

Your pretrained model is not too bad on my dataset
But after training I got very Little improvement.
Here is my log.
log.txt

Can you tell me why?

How to determine the number of channels in a compressed model?

Hi, Bro. Thank you for your open source. I got the density according to the steps in Apply CDFI to New Models. But how to determine the number of channels of the compressed model according to the density? Can you share how to do it? Thank you!

Train on low rate video (5-10fps)?

I find your model behave bad on low rate video (5-10fps), I wonder how to fix this? Maybe Train on low rate videos can help? Thanks a lot.

GPU Memory

Hi,
Thanks for your interesting work!

I'm re-training your model using Vimeo90K dataset. Everything works fine so far, but the training process consumes around 24k MiB on nvidia A100. I just want to ask if this is normal. I use the default configurations (Batch size 8).

pretrained model of FP + 1x1Conv + PS(F=5, d=1)

Hi, I noticed that CDFI_adacof.pth is for (F=11, d=2) case. Have you trained the model of FP + 1x1Conv + PS(F=5, d=1)? If then, can you provide the pretrained model?

Different result in the same test data with different resolution? Why?

hello, my friend.
I trained model based on my own data (same format as Vimeo-90K, 448X256 resolution).
When I test my model, I found different result in the same test data with different resolution.
Data with 1280X720 resolution is a little bad while the same data with 448X256 is better.
Here is the result.

Finetune fails when not using BatchNorm

Hi, I notice that your network does not contain BN layers. Is BN really not necessary in your training? When I finetune your pretrained model on my own dataset, the trainging loss doesn' t decrease. But with BN, it decreases normally. So I think maybe with BN or some residual shortcut can improve the model a little?

About pruning layer problems.

Congratuations! This work is so helpful to me, but I am still confused how to prune these channles in detail, could you release the codes about pruning?

About the training time

your work is impressive! What kind of GPU have you trained on for a total of how many hours?

can the inference be run on the CPU?

I am not sure if the cupy kernels with throw any errors if executed for CPU-only inference

Hi , bro .Can i train my own dataset with your pretrained model?

thanks.
Because i have a little dataset ,and I want to speed up training.

cupy and CUDA compatibility

what is the pip3 command for installing cupy ?

For CUDA 11.0
the command should be pip3 install cupy-cuda110

but cupy version 7.7.0(from README) is not compatible with cupy-cuda110

pip3 install cupy-cuda110==7.7.0 gives errors..

the only command works is pip3 install cupy-cuda100==7.7.0 but that refers to cuda 10.0. Very confusing. can you help clarify?

Testing result on vimeo90k_septuplet

Hello, my friend! I tested the model with pretrained model 'FLAVR_4x.pth' (yours) and dataset 'vimeo90k_septuplet', and the result of psnr I had got was 28.376122. I don't konw why it occurs.

Inference script for image sequences

Hello, is it possible to provide a testing script to interpolate a whole image sequence instead of just one frame pair?

about train datasets set

Hi, you're doing a great job.Can you tell me what the format of the training data set is

the training strategy of compressed adacof net

do you use the same loss function(LCharbon + λvggLvgg + λtvLtv) when training compressed adacof net?

comparison with other schemes

Hello,

Nice work. Have you compared your work with other recent research such as RIFE, FLAVR, XVFI etc.

CDFI inference speed

What is the CDFI inference speed and what is FPS in 1080P video? Why is it slower than AdaCof when I see somebody's evaluation? Isn't it that the CDFI compression model is smaller and faster?

Pruning toolkit

Hi, what pruning toolkit do you use to remove layers? The built in pytorch toolkit only zeroes out weights as opposed to removing channels

Thanks

about the psnr and ssim on ucf101 dataset

I tested the results provided in your rep, but I got 34.11/0.9411 on the ucf101 test set using the frame_01_ours.png as your network output and frame_01_gt.png as the reference. However, it doesn't match the result reported in your paper. Is there something wrong with my testing code? Thanks!
![image](https://user-images.githubusercontent.com/24226074/114514413-d8f16880-9c6d-11eb-9ede-b4cd6ba5428e.png

inconsistent SSIM computation

I was surprised to see that the ratio between PSNR and SSIM deviates between the methods with dagger and the ones without dagger in Table 3, by a large margin. I noticed that the provided test.py uses the following.

CDFI/test.py

Lines 46 to 47 in d7f79e5

    
           ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)), 
        
                                                        np.transpose(frame_out, (1, 2, 0)), multichannel=True)

In doing so, it does not provide a data_range argument and skimage.metrics.structural_similarity has to guess it. However, it just uses the difference between the smallest and the largest element as a fallback. This significantly alters the results and puts the methods with dagger in Table 3 at a substantial disadvantage though (and half of the methods have a dagger).

I just updated the test.py as follows (which also addresses the quantization issue from #1).

...

gt = (gt * 255).round() / 255
frame_out = (frame_out * 255).round() / 255

psnr = skimage.metrics.peak_signal_noise_ratio(image_true=gt, image_test=frame_out)
ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)),
                                             np.transpose(frame_out, (1, 2, 0)), data_range=1.0, multichannel=True)

...

With this fix, the SSIM of CDFI on the Middlebury test drops from 0.983 to 0.966 which is quite significant. It would hence be great if Table 3 could get revised such that future work that references it is not subject to the same inconsistencies. Thanks!

quantization in evaluation

Thanks for sharing your code! I just looked into it a little bit and it seems there is no quantization in the evaluation?

CDFI/test.py

Lines 36 to 47 in d7f79e5

    
           frame_out = model(in0, in1) 
        
           lps = lpips(self.gt_list[idx].cuda(), frame_out, net_type='squeeze') 
        
           imwrite(frame_out, output_dir + '/' + self.im_list[idx] + '/' + output_name + '.png', range=(0, 1)) 
        
           frame_out = frame_out.squeeze().detach().cpu().numpy() 
        
           gt = self.gt_list[idx].numpy() 
        
           psnr = skimage.metrics.peak_signal_noise_ratio(image_true=gt, image_test=frame_out) 
        
           ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)), 
        
                                                        np.transpose(frame_out, (1, 2, 0)), multichannel=True)

However, it is common practice to quantize your interpolation estimate before computing any metrics as shown in the examples below. If you submit results to a benchmark, like the one from Middlebury, you will have to quantize the interpolation estimates to save them as an image so it has been the norm to quantize all results throughout the evaluation.

https://github.com/sniklaus/sepconv-slomo/blob/46041adec601a4051b86741664bb2cdc80fe4919/benchmark.py#L28
https://github.com/hzwer/arXiv2020-RIFE/blob/15cb7f2389ccd93e8b8946546d4665c9b41541a3/benchmark/Vimeo90K.py#L36
https://github.com/baowenbo/DAIN/blob/9d9c0d7b3718dfcda9061c85efec472478a3aa86/demo_MiddleBury.py#L162-L166
https://github.com/laomao0/BIN/blob/b3ec2a27d62df966cc70880bb3d13dcf147f7c39/test.py#L406-L410

The reason why this is important is that the quantization step has a negative impact on the metrics. So if one does not quantize the results of their method before computing the metrics while the results from other methods had the quantization step in place, then the evaluation is slightly biased. Would you hence be able to share the evaluation metrics for CDFI with the quantization? This would greatly benefit future work that compares to CDFI to avoid this bias. And thanks again for sharing your code!

LPIPS computation issue

CDFI/test.py

Line 123 in 0de1f7e

    
           lps = lpips(frame1.cuda(), torch.tensor(ref).unsqueeze(0).cuda(), net_type='squeeze')

You seem to use ‘squeeze net’ when testing lpips index, which may cause some problems with your Table 3 comparison experiment.

The results reported by Softsplat are consistent with EDSC, but your reported EDSC results cannot be consistent with the origin EDSC paper (refer to https://arxiv.org/pdf/2006.08070.pdf Table 4). In your paper, the LPIPS of EDSC is much better than SoftSplat. CAIN and EDSC are better than DAIN, which is counter-common sense. This makes this part of the data look very strange.

I suggest modifying this part of the data so that future researchers can follow your work well. Thank you very much.

From EDSC:

From CDFI:

	ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)),
	np.transpose(frame_out, (1, 2, 0)), multichannel=True)

	frame_out = model(in0, in1)

	lps = lpips(self.gt_list[idx].cuda(), frame_out, net_type='squeeze')

	imwrite(frame_out, output_dir + '/' + self.im_list[idx] + '/' + output_name + '.png', range=(0, 1))

	frame_out = frame_out.squeeze().detach().cpu().numpy()
	gt = self.gt_list[idx].numpy()

	psnr = skimage.metrics.peak_signal_noise_ratio(image_true=gt, image_test=frame_out)
	ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)),
	np.transpose(frame_out, (1, 2, 0)), multichannel=True)

tding1 / cdfi Goto Github PK

cdfi's People

Contributors

Stargazers

Watchers

Forkers

cdfi's Issues

Recommend Projects

Recommend Topics

Recommend Org