qijiezhao / pseudo-3d-pytorch Goto Github PK

pytorch version of pseudo-3d-residual-networks(P-3D), pretrained model is supported

License: MIT License

Python 100.00%

pseudo-3d-pytorch's Introduction

Pseudo-3D Residual Networks

This repo implements the network structure of P3D[1] with PyTorch, pre-trained model weights are converted from caffemodel, which is supported from the author's repo

Requirements:

pytorch
numpy

Structure details

In the author's official repo, only P3D-199 is released. Besides this deepest P3D-199, I also implement P3D-63 and P3D-131, which are respectively modified from ResNet50-3D and ResNet101-3D, the two nets may bring more convenience to users who have only memory-limited GPUs.

Pretrained weights

(Pretrained weights of P3D63 and P3D131 are not yet supported)

(tips: I feel sorry to canceal the download urls of pretrained weights because of some private reasons. For more information you could send emails to me.) (New tips: Model weights now are available.)

1, P3D-199 trained on Kinetics dataset:

BaiduYun url Google Drive

2, P3D-199 trianed on Kinetics Optical Flow (TVL1):

BaiduYun url Google Drive

3, P3D-199 trained on Kinetics600, RGB, 224&299:

BaiduYun url Google Drive (Change the value of GAP kernel from 5 to 7 if 224, to 9 if 299)

Example Code

from __future__ import print_function
from p3d_model import *
import torch

model = P3D199(pretrained=True,num_classes=400)
model = model.cuda()
data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2
out=model(data)
print(out.size(),out)

Ablation settings

ST-Structures:

All P3D models in this repo support various forms of ST-Structures like ('A','B','C') ,('A','B') and ('A'), code is as follows.
```
model = P3D63(ST_struc=('A','B'))
model = P3D131(ST_struc=('C'))
```
Flow and RGB models:

Set parameter modality='RGB' as 'RGB' model, 'Flow' as flow model. Flow model i trained on TVL1 optical flow images.
```
model= P3D199(pretrained=True,modality='Flow')
```
Finetune the model

when finetuning the models on your custom dataset, use get_optim_policies() to set different learning speed for different layers. e.g. When dataset is small, Only need to train several deepest layers, set slow_rate=0.8 in code, and change the following lr_mult,decay_mult.

please cite this repo if you take use of it.

Experiment Result (Out of the paper)

(All the following results are generated by End-to-End manners).

Some of them have outperforms state of the arts.

Action recognition(mean accuracy on UCF101):

modality/model	RGB	Flow	Fusion
P3D199 (Sports-1M)	88.5%	-	-
P3D199 (Kinetics)	91.2%	92.4%	98.3%

Action localization(mAP on Thumos14):

steps: perframe+watershed

Step	perframe	localization
P3D199(Sports-1M	0.451	0.25
P3D199(Kinetics)	0.569(fused)	0.307

Reference:

[1]Learning Spatio-Temporal Representation with Pseudo-3D Residual,ICCV2017

pseudo-3d-pytorch's People

Contributors

Stargazers

Watchers

Forkers

bityangke chenzhao0 chengmuni66 mlzxy yetianjhu eglxiang zcrwind ruiann willdamon rwduzhao papercoming liygcheng thinklock ivyxixi jinkunguo whmin humengdoudou hyzcn zhanghaoinf 3dmm-icme2023 chmxu hzhang57 ml-lab amoliu wanjinchang shubhampachori12110095 cp736421469 surfcao asbe lvaleriu sanolans wangshicr7 guocode zhnidj firstman001 futuregoingon jingang-cv zhengfangwu tsingzao hjmengx samxuxiang acewjh jasonjnie salt-fly griffin93 elenajhy li-haoran newlcj93 bzp92 tonyfy qmul2021wu melon-water sean0719 briancylui xieqingxing yangwangx jianyuchen23 jian-bo-li ioekg dmenig awangenh junmuzi qchenzi slideicy cy5211 sudosurf kar98kbang hehewang625327 neudeep blackboy5004 dancelogue priteshgohil yangshushuaige smartparrot chiehchiu qfei97 liyantett shineyusong panpan0111 srzhao tommylitlle puffyq yilonghe haitian2du ygest irfanicmll zzwei1 designer00 lorna-liu pionnerx spyflying willforcv carolinecheng233 joyhhheee bobo-artist saqibmamoon elix-tech lilipopololo sui6662012 betciso

pseudo-3d-pytorch's Issues

About Image transform

Hello! I tried to use your released p3d pytorch model. But i got a wrong result, every class got a low score under 0.01.
Is it my way to transform image into tensor result in this?

model = P3D199(pretrained=True)
model = model.eval().to(device)

transforms = Compose([Resize(160), CenterCrop(160), ToTensor(),
             Normalize(mean=(0.485,0.456,0.406),std=(0.229, 0.224, 0.255),inplace=True)])

clip = [transforms(img) for img in clip] # 16 frames
clip = torch.stack(clip, 0).permute(1, 0, 2, 3).unsqueeze(0).to(device)

out=model(clip)

Thank you for your advice

Why exe installer is required to install the pre-trained model?

Is it save to do so?

Feature Extraction

Hi,
p3d = P3D199(pretrained=True,num_classes=400,modality='RGB')
modules=list(p3d.children())[:-2]
nw_p3d=nn.Sequential(*modules)
data=torch.autograd.Variable(torch.rand(10,3,16,224,224))
p3d_out=nw_p3d(data)

I wrote the above code to extract video features. But it gave the error, ValueError: Expected 4D tensor as input, got 5D tensor instead. What will be the reason?
Thanks

Whether the conv details are right?

def conv_S(in_planes,out_planes,stride=1,padding=1):
    # as is descriped, conv S is 1x3x3
    return nn.Conv3d(in_planes,out_planes,kernel_size=(1,3,3),stride=1,
                     padding=padding,bias=False)

def conv_T(in_planes,out_planes,stride=1,padding=1):
    # conv T is 3x1x1
    return nn.Conv3d(in_planes,out_planes,kernel_size=(3,1,1),stride=1,
                     padding=padding,bias=False)


==================changed below=======================

def conv_S(in_planes,out_planes,stride=1,padding=1):
    # as is descriped, conv S is 1x3x3
    return nn.Conv3d(in_planes, out_planes, kernel_size=(1, 3, 3), stride=(1, stride, stride),
                     padding=(0, padding, padding), bias=False)

def conv_T(in_planes,out_planes,stride=1,padding=1):
    # conv T is 3x1x1
    return nn.Conv3d(in_planes, out_planes, kernel_size=(3, 1, 1), stride=(stride, 1, 1),
                     padding=(padding, 0, 0), bias=False)

Always predict to the same Class

@qijiezhao I tried your code and the pretrain mode on Kinectics, but I always got strange results: even though I fed different videos (in different class) to the model, most of them were predicted to the same label. I fist thought there was something wrong with my preprocessing code, but when I tried to feed it with random data, the result is the same:

if __name__ == '__main__':
    model = P3D199(pretrained=True, num_classes=400)
    model = model.cuda()
    model.eval()

    batch_num = 1000  # test 1000 batches(randomly)
    cnt_M = np.zeros(400)
    for j in range(batch_num):
        data = torch.autograd.Variable(
            torch.rand(6, 3, 16, 160, 160)).cuda()  # if modality=='Flow', please change the 2nd dimension 3==>2
        out = model(data)
        out = torch.nn.functional.softmax(out, dim=1)
        vid_result = out.data.cpu().numpy()
        score = np.mean(vid_result, axis=0) # take the average score of this 6 clips as the video-score
        max_scr = np.max(score)
        max_ind = np.argmax(score)
        cnt_M[max_ind] += 1

    cnt_max = np.max(cnt_M)
    ind_max = np.argmax(cnt_M)
    print(cnt_M)
    print("most_predict_index: " + str(ind_max) + "    count/all: " + "%d / %d" % (
        cnt_max, batch_num))

and my result is :

load pretrained model success.
[    0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.  1000.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.
     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.]
most_predict_index: 291    count/all: 1000 / 1000

I tried this for many time, they were always predicted to the same label. Is there some problem with the pre-trained model? Thx

transferred pytorch pretrain model

You said that pre-trained model weights are converted from caffemodel in the author's repo,
However, caffe is too hard to download, can you share the parameters you transferred to pytorch from caffemodel?

questions about some training details

hi
thanks for the code.
i have some questions about the training of the pre-trained network.
1.what is the size of the input data? in the paper, the size is 316160*160. is your pre-trained network consistent with the original?
2. what is the input frame stride and how do you sample the clips?

look forward to your reply, thanks

The function of the get_optim_policies()

I found a problem that when I used the function get_optim_policies(), I got an error "ValueError: New atomic module type: <class 'torch.nn.modules.batchnorm.BatchNorm2d'>. Need to give it a learning policy". I change the code "elif isinstance(m, torch.nn.BatchNorm3d)" into "elif isinstance(m, torch.nn.BatchNorm3d) or isinstance(m, torch.nn.BatchNorm2d):".

Another question is that I don't find "lr_mult" and "decay_mult" arguments in pytorch. Do these parameters affect the learning rate?

How about evaluation in Kinetics?

How about evaluation in Kinetics?
and Is there some image normalization?
Is normalization same Resnet in pytorch?

How to train pretrained weights of P3D63 ?

Hi,
I'm new to deep learning, wondering how to train a pretrained model of P3D63, Could you give me some suggestions or Would you provide a P3D63 pretrained model in future? Thank you for your patience.

is there mirror site except baidu drive?

@qijiezhao, Hi, thank you for your great job.
I want to follow your code and fine-tune on my own dataset.
So I want to download your pre-trained parameter, however due to regional problems, it is impossible to access Baidu Drive.
Could you consider to mirror the parameter file to other hosting services such as Google Drive and Dropbox?

The model didnot convergence

Hi, I have the trouble to train the model, the model didnot convergence,. I really need your help, thx very much. my some code:
The optimizer setting:
policies = get_optim_policies(model) criterion = nn.CrossEntropyLoss().cuda() optimizer = torch.optim.SGD(policies, args.lr, momentum=args.momentum, weight_decay= rgs.weight_decay)

The transform setting:
` train_transform = video_transforms.Compose([
video_transforms.Scale((182)),
video_transforms.MultiScaleCrop((160, 160), scale_ratios),
video_transforms.RandomHorizontalFlip(),
video_transforms.ToTensor(),
normalize
])

test_transform = video_transforms.Compose([
        video_transforms.Scale((182)),
        video_transforms.CenterCrop((160)),
        video_transforms.ToTensor(),
        normalize
])`

the Train steps:
`
def train(train_loader, model, criterion, optimizer, epoch):
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
top1 = AverageMeter()
top3 = AverageMeter()
model.train()
end = time.time()
for i, (inp, target) in enumerate(train_loader):
# measure data loading time

    #show_loader_item(inp, target)

    data_time.update(time.time() - end)
    inp = inp.float().cuda(async=True)
    target = target.cuda(async=True)
    input_var = torch.autograd.Variable(inp)
    target_var = torch.autograd.Variable(target)

    output = model(input_var)
    loss = criterion(output, target_var)
    writer.add_scalar('data/loss', loss, i + epoch * (len(train_loader)))

    # measure accuracy and record loss
    prec1, prec3 = accuracy(output.data, target, topk=(1,3))
    losses.update(loss.data[0], inp.size(0))
    top1.update(prec1[0], inp.size(0))
    top3.update(prec3[0], inp.size(0))

    # compute gradient and do SGD step
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # measure elapsed time
    batch_time.update(time.time() - end)
    end = time.time()
    if i % args.print_freq == 0:
        #for name, param in model.named_parameters():
        #    writer.add_histogram(name, param, i)
        print('Epoch: [{0}][{1}/{2}]\t'
              'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
              'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
              'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
              'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
              'Prec@3 {top3.val:.3f} ({top3.avg:.3f})'.format(
               epoch, i, len(train_loader), batch_time=batch_time,
               data_time=data_time, loss=losses, top1=top1, top3=top3))

Can you help me to check if there are some mistakes?

Accuracy on training set of Kinetics dataset using RGB pre-trained model

Hello Qijie,

Now, I really need you help now...since I just tried your pre-trained model on Kinetics dataset is low....

My codes are as following:

parser = argparse.ArgumentParser(description='video(< 17 second) label prediction')

parser.add_argument('video_filename')

# start_time = time.time()

args = parser.parse_args()

yt_vid, extension = args.video_filename.split('/')[-1].split('.')

vid_reader = imageio.get_reader(args.video_filename, 'ffmpeg')
img_list = get_img_list_from_vid_reader(vid_reader, extension)
img_list = img_list.transpose(3, 0, 1, 2)

img_list = np.expand_dims(img_list, axis = 0)

model = P3D199(pretrained=True,num_classes=400)
model = model.cuda()
# data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2

input_tensor = torch.from_numpy(img_list)   # transformed numpy array to Tensor
data=torch.autograd.Variable(input_tensor).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2
out=model(data)
print("type(out):", type(out) )
print (out.size(),out)
out_array = out.data.cpu().numpy()  # transformed Variable Tensor to numpy array
print("The index for max probability: ")
print(np.argmax(out_array))  # find the biggest probability corresponding the index.

About the Input Size

data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()
Does the number 10 is batch_size ?

About finetuning on UCF101

Hi~
Thanks for sharing the codes and some experiments results.
Recently I have also tried finetuing the P3D199 Kinetics models on UCF101, but only got about 85% accuracy of action recognition task. I just wonder do you have any other tricks or did I make some mistakes..? Thanks!

How to train Flow and Fusion?

Thanks for your great work. I want to apply the P3D network to two stream network. Would you like to provide more details about Flow training and how to fusion the two parts? Such as the dataloader, Thanks a lot.

Extension to different input size

I'm facing problems setting a different input size (e.g. 32,160,160 instead of 16,160,160) because dimensions mismatch (I'm getting ValueError: Expected input batch_size (128) to match target batch_size (64)). The problem is in x = x.view(-1,sizes[1],sizes[3],sizes[4]) and I think it should be x = x.view(-1,sizes[1]*sizes[2],sizes[3],sizes[4]), but how to change the layer accordingly?
Thank you

predicted-result will be different for same video input and model

Hi Qijie,

I just found the computed probability will be changed for same input and model. For example, for the video 12065806-102-009-013814.mp4, I predicted this video using your codes (have added softmax layer) and the predicted result will be different if run the command "python3 p3d_model.py 12065806-102-009-013814.mp4" multiple times

In my view, same video input, same model and same weights, same codes, It must be same predicted result. Could you please help to analysis or fix it? Thanks in advance.

pretrained weights link is not opening

Hey, can anyone give me the working link from where i can downloads the weights.

the meaning of res5

Hi，
Thanks for your code about pesudo-3d.I have a question about the network.That's why the network has 5 dim([batch_size, t_size, n_size, n_size, out_dim]) at res4? but the output of res4 will be reshape 4 dim ,and the two last one is 10.what's the meaning of it ?it will be changed to conv2d .what's the meaning about it? @qijiezhao

weights=torch.load(pretrained_file)['state_dict']

when run weights=torch.load(pretrained_file)['state_dict']
Traceback (most recent call last):
File "/home/zys/.pycharm_helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/zys/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/zys/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/zys/Downloads/pseudo-3d-pytorch/p3d_model.py", line 378, in
model = P3D199(pretrained=True,num_classes=400)
File "/home/zys/Downloads/pseudo-3d-pytorch/p3d_model.py", line 295, in P3D199
weights=torch.load(pretrained_file)['state_dict']
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 261, in load
return _load(f, map_location, pickle_module)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 416, in _load
deserialized_objects[key]._set_from_file(f, offset)
RuntimeError: storage has wrong size: expected 4330777570371529151 got 512

how to change the number of class ?

Thank you for you great work , I want to use it in my work ,but I can't change the number of the class. Could please tell how to change the number of the class and how to finetune on my own dataset

Stride is always set to one

Hey,
the stride is given as argument:

pseudo-3d-pytorch/p3d_model.py

Line 12 in 09bba23

def conv_S(in_planes,out_planes,stride=1,padding=1):

but it does return the convolution with a stride of always one:

pseudo-3d-pytorch/p3d_model.py

Line 14 in 09bba23

return nn.Conv3d(in_planes,out_planes,kernel_size=(1,3,3),stride=1,

Same for conv_T.

Thanks for sharing the code.

About the experiment results

You have posted the transfer learning results of the P3D models on UCF101 and THUMOS14. I can't find them from the original paper. Did you do that on your own? If so, have you tested the model on the Kinetics val/test set? How about the performance?

Pre-trained model files are broken?

Hi, @qijiezhao .
I really appreciate your work!

I tried to use your pre-trained model (https://pan.baidu.com/s/1o8VFtMy#list/path=%2F).
However, these tar files cannot be extracted with following error message.

tar: Unrecognized archive format
tar: Error exit delayed from previous errors.

Could you check and give me valid tar files?

Thanks,

Pre-trained model giving wrong results on Kinetics

@qijiezhao I am using the RGB pre-trained model you provided p3d_rgb_199.checkpoint.pth.tar, but it's giving me wrong results for my input frames. I even tried testing it on a video from the training set of the Kinetics dataset, but it got also a wrong label. I am preprocessing the data as can be seen below, and I am using the labels in ascending order.

For instance, I tried a video of a person playing tennis and got those as my top 5 results with their corresponding scores:

7.55878  dancing ballet
5.82084  stretching arm
4.84113  side kick
4.80862  playing squash or racquetball
4.80691  exercising arm

Can you please confirm that the pre-trained model you provided is valid?

def get_clip(clip_name):
    clip = sorted(glob(join('data', clip_name, '*.png')))

    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    preprocess = transforms.Compose([
        transforms.Resize((160, 160)),
        transforms.ToTensor(),
        normalize
    ])

    frames = []
    for frame in clip:
        image = Image.open(frame)
        image = preprocess(image)
        frames.append(image.unsqueeze(0))

    frames = torch.cat(frames, 0)

    clip = frames.permute(1, 0, 2, 3)  # ch, fr, h, w
    clip = clip.unsqueeze(0)
    return clip

def read_labels_from_file(filepath):
    with open(filepath, 'r') as f:
        labels = [line.strip() for line in f.readlines()]
    return labels

if __name__ == '__main__':

    model = P3D199(pretrained=True, num_classes=400)
    X = get_sport_clip('tennis')
    X = Variable(X)
    X = X.cuda()

    model.cuda()
    model.eval()
    prediction = model(X)
    prediction = prediction.data.cpu().numpy()

    # read labels
    labels = read_labels_from_file('kinetics_labels.txt')

    # print top predictions
    top_inds = prediction[0].argsort()[::-1][:5]  # reverse sort and take five largest items
    print('\nTop 5:')
    for i in top_inds:
        print('{:.5f} {} {}'.format(prediction[0][i], i, labels[i]))

Could you provide me the pretrained models for finutune?

Hi, I saw your pretrained models last time,but you cancel the links for some important reasons. If you are pleasant ,could you send me your pretrained models ? I will not spread the source or use for business without your perssion.my email is [email protected] you very much!!!

Why it exist negative number after computed finally?

Hello QiJie,

Thanks for your great work.

I am doing experiment to confirm whether your pre-trained models works for my requests, I have made the codes working well, but I got many negative numbers and the numbers (more than 1, such as 2.8574)from the finally predicted results as below. Could you please help me? Thanks in advance.

Columns 0 to 9
0.2547 2.8574 0.2466 0.5315 0.3087 0.8649 0.4103 -0.6601 -1.4141 -0.2270
Columns 10 to 19
-0.7751 0.1545 0.3906 0.5003 0.7063 -0.7299 1.2138 -0.9078 -0.5577 -0.3110
Columns 20 to 29
0.5861 0.3854 -2.2098 -1.2202 -0.6508 -0.7924 1.5414 0.1893 -0.7203 -0.6547
Columns 30 to 39
-0.6998 -0.2219 -0.6565 -1.1104 0.5413 0.1847 1.1829 1.1412 -0.7261 -0.9064
Columns 40 to 49
-0.3360 0.2168 -0.4412 0.0375 1.2200 0.1787 0.0969 0.3945 0.7286 1.1807
Columns 50 to 59
0.3262 0.3818 -0.3003 0.4018 -0.6653 -0.0746 0.2734 0.5684 -0.8847 -1.1838
Columns 60 to 69
1.1265 -0.1349 -0.2033 1.7768 -0.5521 1.4568 -0.4313 0.4930 0.0090 0.5893
Columns 70 to 79
0.4483 0.4776 -0.8270 -0.1999 0.4683 -0.6691 -0.1553 0.8590 -1.0097 1.4962
Columns 80 to 89
-1.2980 1.0693 -0.1839 0.1081 0.3478 -0.6089 1.1642 0.4668 -0.8454 -0.2026
Columns 90 to 99
-0.3866 0.6512 -0.9677 -1.0000 -0.5366 -0.9803 0.5690 -0.1478 -1.4045 0.0646
Columns 100 to 109
1.9957 1.3143 0.3878 1.0429 0.5111 1.1285 -0.7205 0.5839 -0.1410 1.8795
Columns 110 to 119
0.6499 1.0486 2.6186 2.2763 1.4925 1.8187 1.1356 0.7856 -0.3900 -1.0579
Columns 120 to 129
-0.6244 -0.0601 1.3689 -0.1556 -0.4125 0.4500 -0.4232 1.8318 1.0345 0.3353
Columns 130 to 139
-0.1830 -0.3989 -0.4298 -0.1682 -1.2609 -0.7833 -0.3433 0.6277 0.5956 -0.6453
Columns 140 to 149
-0.0364 -0.9813 -0.7421 0.1030 -0.6083 1.0187 0.0918 -0.3473 -0.4790 1.7018
Columns 150 to 159
2.5567 -0.4137 1.1153 0.1984 -0.5811 0.5228 0.2125 0.2379 1.6737 0.8861
Columns 160 to 169
-1.0300 -0.3617 -1.4262 -1.1236 0.0033 0.4325 -1.3333 -1.7200 0.5755 0.2818
Columns 170 to 179
-0.8611 0.1419 -0.5707 -0.8100 -1.3173 0.8708 1.1916 -1.5630 -0.1792 -0.6229
Columns 180 to 189
2.1504 -1.2198 -1.1930 -0.3797 -0.0317 0.1053 0.2186 -0.2541 0.7459 -1.1295
Columns 190 to 199
-0.5629 -0.9077 -0.0785 0.2610 -0.0439 -0.9229 0.8159 0.9987 0.4082 -0.6426
Columns 200 to 209
0.5981 0.1352 -0.2190 1.8740 0.3333 -1.6436 -1.4595 -0.2055 -2.0410 0.0428
Columns 210 to 219
0.4188 0.4601 1.2329 0.6426 -0.0635 -0.0622 -0.1042 0.2755 -0.4148 0.7330
Columns 220 to 229
0.7128 -0.8897 -0.8962 -0.1775 0.1521 -0.4070 -1.3691 0.0299 0.7416 0.6440
Columns 230 to 239
-0.1439 -0.1009 0.1973 1.7817 -0.6647 -0.8101 0.0690 -0.7127 -0.2753 -0.7935
Columns 240 to 249
-0.9452 -0.0944 -0.2082 1.4317 0.5112 -0.3383 -0.7704 0.5039 1.2479 1.4883
Columns 250 to 259
0.9024 -0.3343 0.0966 -0.1657 -2.1660 -0.1980 2.3596 -0.5610 0.6940 0.2634
Columns 260 to 269
-0.4295 1.0217 0.7458 0.5033 0.9743 0.4465 -0.5388 -0.0811 -0.7372 0.3217
Columns 270 to 279
-0.7245 -1.0703 -1.4170 -0.3830 -0.4575 1.1041 0.9542 1.1681 -0.0909 1.3096
Columns 280 to 289
-0.0418 0.1292 -1.1363 0.0784 -0.2900 -0.2781 -1.2245 -0.1769 2.4483 1.4832
Columns 290 to 299
-0.5490 0.5404 1.6246 0.1691 -0.5037 -0.1408 -0.3451 -0.6183 -0.5691 -0.6429
Columns 300 to 309
0.3554 0.1019 0.7276 -0.1883 1.6244 0.5651 -0.0830 -1.6845 -1.5468 -1.4152
Columns 310 to 319
-2.4527 -0.6569 -1.5320 0.2806 1.3925 -1.1757 1.0803 0.6409 -1.1188 1.2455
Columns 320 to 329
1.5661 -1.6605 -1.5542 -2.0289 -2.0886 0.9124 -0.5289 0.2374 0.2726 -1.4111
Columns 330 to 339
-0.0333 1.6842 -0.3154 0.0249 0.5457 -1.2823 -1.0470 -0.6791 0.8771 -1.0866
Columns 340 to 349
-1.7285 -1.8984 -1.0631 0.3520 0.5620 0.4164 -0.1236 0.2988 -0.2146 0.0687
Columns 350 to 359
-0.7481 -1.0270 0.0341 0.7840 -0.7411 0.6185 0.2475 1.6499 -0.6291 0.8590
Columns 360 to 369
-1.6230 1.5294 -0.3375 0.8977 -0.8415 0.4278 -0.3155 -1.3621 -0.3663 0.6557
Columns 370 to 379
-0.1923 -0.5912 -0.3559 -0.3140 0.2376 -1.0435 -1.4257 -0.1515 1.7692 0.7018
Columns 380 to 389
-0.2411 0.4794 1.0348 -1.2981 -0.9805 0.1539 -0.4159 -0.3165 0.5872 1.0687
Columns 390 to 399
-0.2811 -0.5407 0.2960 -1.5490 -0.0983 0.1577 -0.6394 1.0029 0.1816 -0.9346
[torch.cuda.FloatTensor of size 1x400 (GPU 0)]

Dimension need double-check. Results are differenct when using caffemodel and torchmodel(with caffe weight)

Hi Qijie:
I really appreciate your work. Recently, I am focus on video summarization and I want to copy P3D caffemodel into PyTorch one.

I just found that given same input video clip, the result varied largely when using caffe model and pytorch model with caffe weights. see below.

(This is caffe result, blob['fc_kinetic'])

(This is pytorch result)

I did some analysis myself and I found that after the first pooling, the feature map size in your code is (1, 64, 8, 39, 39) while in caffe, (1, 64, 8, 40, 40).

I re-check the pooling param setting both in your code and official prototxt and found no difference. Then I realize that this may be because pooling implementation is differenct in caffe and pytorch.

See:

And I do observe tiny difference after the first pooling layer.

I also attach my convert script file, If possible, could you help me to figure it out why the result is so different?

With my sincere thanks.

convert.py.txt

Subtracting mean value before inputting video in the wild?

Hi, @qijiezhao , I am reading and running this repo. I find mean value is defined "self.input_mean=...", but not used. so when use my own video data, is it better to subtract the mean value before inputting into p3d?
By the way, is the input channels of image are RGB instead of BGR？

About the accuracy

Hi，I used your code to train the P3D199 on UCF101，with the pretrained weight you supported. My experiment top-1 result is 81%, which is far from the paper 93%, could you support the details about your training? Mine:
Batch size:32
Learning rate: 0.001, divided by 10 each 10 epoch
training data transformation：RandomResizeCrop(160), RandomHorizontalFlip(0.5), Totensor(), Normalize()
val data transformation: Resize(182,242), CenterCrop(160), Totensor(), Normalize()
Looking forward to your reply