Coder Social home page Coder Social logo

egovideo's People

Contributors

byron1201 avatar cg1177 avatar czczup avatar hyf015 avatar opengvlab-admin avatar xings19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

egovideo's Issues

Validation data for SCOD

May I know what data format and which directories should I be placing my validation set to run the SCOD benchmark? Thank you!!

How to modify number of temporal views for the FHP task

Hi!

I am curious to know, what parameters are needed to be modified to change the number of temporal views (V) as defined in your experiments? Do you also use the same number of frames (T) both during training and testing stages?

image

Thanks!

code for NLQ fusion

Hi,

Thank you for releasing the features. I was unable to find the code to fuse the features. I'm interested in reproducing the results of the paper on the NLQ dataset. Can you share these codes?
image

Question about the provided video feature for MQ and NLQ

Thanks a lot for your nice work and providing your video feature.
I have a question about the provided feature.
image
"The video features extracted by VideoMAE-L pretrained on verb and noun subset. ". Does it means that the features are used for the experiment "K700 → Verb" in the paper.

Hardware Requirements for Training?

@czczup @cg1177 A quick follow-up question: could you disclose the GPU memory size required by the training script for the Swin-L IN-22K+O365 model for SCOD task (or just the GPU type/per GPU memory used in this work)? Also whether you used whole-precision? Thanks again!

Access to the FHP checkpoints

In the readme it is mentioned that the code and checkpoints of pretraining for the FHP task are released, can you please guide me where can I find those?

SCOD How to visualize bounding boxes of model?

I am running the evaluation code for the SCOD task and I downloaded the outputs to a pickle file. Looking at the output of the mode, it seems to be returning an array of size 100x5 for each validation image. I am a little confused as to how to translate this output to actual bounding boxes... Any advice on how I can do this? Thank you!!

Access to pretrained model for STA task

Hi!

Can you please let me know what pretrained model are you using for STA task training? The ego4d_sta_train.sh contains following line: MODEL_PATH='/mnt/petrelfs/share_data/chenguo/ego_forecasting/pretrained_models/vitl_v_f.pt', can you say to which model is it referring to?

Thanks!

ego4d_verb_pretrain_vitl_k700.pt Top-1: 16.60%, Top-5: 49.78%

Accuracy of the network on the 167745 test videos: Top-1: 16.60%, Top-5: 49.78%

export LC_ALL="en_US.UTF-8"

OUTPUT_DIR='./workdir/ego4d_verb_pretrain_vitl_k700'
DATA_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/v2/full_scale'
MODEL_PATH='/home/yangninghua/data/1/PerformDutiesDataset/ActionRecognition/Ego4d/videomae_cls/checkpoint/ego4d_verb_pretrain_vitl_k700.pt'

GPUS=4
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-39500}
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}


# batch_size can be adjusted according to the graphics card
# vit_large_patch16_224_ego4d batch=16 使用显存12260MiB
# batch=42 使用显存23022MiB, 4张卡,999次迭代  167745行
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=$GPUS \
        --master_port $PORT --nnodes=$NNODES \
        --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR \
        run_ego4d_cls_pretrain.py \
    --model vit_large_patch16_224_ego4d \
    --nb_noun_classes 0 \
    --nb_verb_classes 118 \
    --data_set ego4d_verb \
    --data_path ${DATA_PATH} \
    --finetune ${MODEL_PATH} \
    --log_dir ${OUTPUT_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 42 \
    --num_sample 1 \
    --warmup_epochs  1 \
    --input_size 224 \
    --short_side_size 224 \
    --save_ckpt_freq 1 \
    --num_frames 16 \
    --opt adamw \
    --lr 5e-4 \
    --opt_betas 0.9 0.999 \
    --weight_decay 0.05 \
    --epochs 10 \
    --dist_eval \
    --test_num_segment 2 \
    --test_num_crop 3 \
    --enable_deepspeed \
    --eval

NLQ(*) and MQ(*) backbone checkpoints

Hi,

Thank you for releasing the code, checkpoints, and features. I was unable to find the checkpoints corresponding to the NLQ and MQ verb/noun features. I'm interested in extracting features for videos outside the NLQ/MQ dataset. Can you share these checkpoints? It will also be great if you can share the script to extract features from videos. Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.