zhihou7 / hoi-cl Goto Github PK

Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration

Home Page: https://sites.google.com/view/hoi-cl

License: MIT License

MATLAB 7.75% Shell 0.51% Python 91.74%

hico-det hoi compositional-learning compositionality affordance affordance-learning cvpr2021 eccv2020 transfer-learning eccv2022

hoi-cl's People

Contributors

Stargazers

Watchers

Forkers

sui6662012 romerobarata zhuxuhan youngfly11 abreza xiao-jiajia aiiu-lab pradkalkar xugy16 jjhw strategist922

hoi-cl's Issues

question about the affordance?

hello, thanks for your work. In the paper, I am confused about the affordance, it seems that the affordance is about the objects, but in figure 1 the affordance is "rideable", in figure 3 the Affordance Predictions are "cut", “eat", so what's the affordance?

How to train with custom data

Hi,thank u for ur great work!
I'm working on the HOI detection recently.I want to make my own dataset as the form of HICO-DET. Do u know how to annotate the dataset or what tools should I use ?
Thanks again!

Affordance Transfer Learning code

Could you tell me which files contain the code for
Affordance Transfer Learning for Human-Object Interaction Detection (CVPR2021) please?

Thank You,
Anju

Compositional Splits

Hello,
I hope you are doing well. Where can I find the compositional splits you used in this series of papers? Basically, I want to know which images of HICO-DET go into unseen and which images go into seen splits in the testing set?

issue of downloading resnet101 using ./misc/download_dataset.sh

Hello. I have an error with ./misc/download_dataset.sh. It says that the url for https://drive.google.com/uc?id=0B1_fAEgxdnvJR1N3c1FYRGo1S1U is no longer accessible and the error says
"Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses. "

Will there be a way to get around this error?

Thank you.

Hi, I have some question about Fabricated branch

In every step training, do you input all the obj embedding and vert into the .Fabricate model, and then filter out the wrong combination? Or do you only take one obj at a time?

Pretrained model

hi, dear @zhihou7 ,
Could you please provide the trained model for FCL?
I want to employ this model to predict the results on images?

Thanks very much for your excellent works and codes.!

yours
Wenfeng

File Not Found error

I was training ATL on my own dataset. I stopped training midway and tried restarting it again.
But, I am getting the below error:

Traceback (most recent call last):
File "affordance/HOI-CL/tools/Train_ATL_HICO.py", line 210, in
sw.train_model(sess, args.max_iters)
File "affordance/HOI-CL/tools/../lib/models/train_Solver_HICO_MultiBatch.py", line 144, in train_model
self.from_snapshot(sess)
File "affordance/HOI-CL/tools/../lib/models/train_Solver_HICO.py", line 174, in from_snapshot
saver.restore(sess, self.switch_checkpoint_path(ckpt.model_checkpoint_path))

tensorflow.python.framework.errors_impl.NotFoundError: affordance/HOI-CL/Data/Weights/ATL_union_batch1_semi_l2_def4_vloss2_rew2_aug5_3_x5new_coco_res101;

The dimension about the Av and Ao

Hello, it's very kind and excellent of you to share your work! The logic of this article is very clear, but as a green-hand, I am confused about the Nv and No in Av and Ao. The Nv and No are the numbers of verbs and objects about which dataset or things else? Looking forward to your answer~ ^_^

Running Time

I would like to know the expected running time when training ATL.
With an input size of 5000 images, the training has been running for 12 hours and still running.
Is this expected?

I did not change any settings.

Questions about dimensions of tensors

Hi, thanks for your great work.

Just for clarification, it would be great to know the dimension of tensors in Section 3.2.
Below is what I've understood about the tensor dimension when using the HICO-DET dataset.
If there's any misunderstanding, please kindly let me know.

\tilde{l}_o : (1, 80)
A_o : (80, 600)
l_v : (1, 117)
A_v : (117, 600)

Therefore, \bar{y} : (1, 600). Is this correct?

And also, since the composed HOI label should be in the original 600 HOI triplet set, is it correct that discovering a novel HOI triplet is impossible using this method and the main focus of the work is correctly learning affordances via feature composition?

Again, thanks for sharing your great work.

Result of training ATL?

What is the expected result of training ATL on a new dataset (5000 images)?

The training finished without errors and with this message:
models.train_Solver_HICO_MultiBatch - INFO - iter: 30000 / 30000, im_id: 59816, total loss: 0.056606, lr: 0.010000, speed: 1.719 s/iter
done solving

Process finished with exit code 0

But, the "Results" folder (HOI-CL/Results) is empty.
What is the expected output?

Thank You

Pytorch version

Hello, zhihou:
I'm interesting in HOI CL, this repo have the pytorch version? i'm not familiar with tensorflow.
Best regards

Ablation Study in FCL

Hi! Thank for your excellent work!
I am confusing about the ablation study.
Could you explain every noise_type value meaning and which one stand for verb fabricator?(noise_type=0,2,3,8,4,5,7,6)

HOI-CL/lib/networks/Fabricator.py

Lines 160 to 182 in b9ed42c

    
               def obtain_gen_type(self): 
        
                   """ 
        
                   This is for ablation study 
        
                   :return: 
        
                   """ 
        
                   noise_type = 0 
        
                   if self.network.model_name.__contains__('_woa_'): 
        
                       noise_type = 2 
        
                   elif self.network.model_name.__contains__('_won_'): 
        
                       # no noise 
        
                       noise_type = 3 
        
                   elif self.network.model_name.__contains__('_won1_'): 
        
                       # no noise 
        
                       noise_type = 8 
        
                   elif self.network.model_name.__contains__('_n1_'): 
        
                       noise_type = 4 
        
                   elif self.network.model_name.__contains__('_woa1_'): 
        
                       noise_type = 5 
        
                   elif self.network.model_name.__contains__('_woa2_'): 
        
                       noise_type = 7 
        
                   elif self.network.model_name.__contains__('_woo_'): 
        
                       noise_type = 6 
        
                   return noise_type

HOI-COCO data format

Hi, Thank you again for your great work!

In the Trainval_GT_VCOCO_obj_21.pkl, which contains the split of HOI-COCO, for each training sample, the format seems to be
[image_id, action_list1, human_box, object_box, action_list2, object_list].
May I ask what is the difference between action_list1 and action_list2?

Training ATL on new image set

If I want to train the affordance transfer learning code on a new dataset, how many training images should I have for:

i) Human-object interaction
ii) Objects

zero-shot setting

Hi, thanks for you work!
In the zero-shot setting, for an unknown Human-Object Interaction (HOI), the approach is to either remove the corresponding HOI label (including object, person, and action) or to keep the bounding boxes and labels for objects and persons known, but assign the action as unknown.

set the co-occurrence matrix

    hi @rouge012,

the co-occurrence matrix $A\in R^{N_v \times N_o} $ is a two dimension matrix, where $N_v$ indicates the length of verb categories and $N_o$ indicate the length of object categories. We can initialize $A$ as a zero matrix. For each object, there are annotated verbs. We can set the corresponding position of the matrix $A$ as 1. For each example, if the apple is combinable with "eat", "cut" in the dataset, we set corresponding position of and in $A$ as 1.

Feel free to post if you have further questions

Regards,

Originally posted by @zhihou7 in #4 (comment)

Questions on code (ATL)

Could you answer the below questions please?

What do the following keywords in lib/ult/ult.py indicate?

i) Neg_select
ii) pos_h_boxes
iii) neg_h_boxes
iv) pattern_type
v) pattern_channel

In which file do you mention the input path for the training dataset? In this case,
'HOI-CL/Data/hico_20160224_det/images/train2015/' ?
I executed python tools/Train_ATL_HOCO.py

Thank You.

Confusion about Affordance Features

Hello, thanks for your nice works. After reading the ATL paper I am confused about the affordance features. You said in the paper

We first extract the human, object, and affordance features via the ROI-Pooling from the feature pyramids.

What are these affordance features actually? I mean from where these features are pooled?

object label?

Hi, could you explain the * in Table3 in ATL?
You described it as "* means we only use the boxes of the detection results", but how do you use the category of the detection results in training phrase and inference phrase ?

View images by category

I downloaded the HICO-DET dataset. Is it possible to view the images by category. For ex) all images with bottles
bottle - drink with
bottle - hold
bottle - carry

etc ?

unseen zero-shot seeting

Hi, thanks for your work! I am interested in the unseen composition and unseen object zero-shot setting in FCL paper, and I am wondering where can I find the code for generating the training and testing sets for these two zero-shot settings. Thank you!

questions about scl

Hello, thanks for your excellent works. I have some questions about SCL(ECCV 2022). When I run the code "python tools/Train_ATL_HICO.py --model VCL_union_batch1_semi_vloss2_ml5_zs11_ICL_VERB_def4_l2_aug5_3_x5new_bothzs_res101_affordance_AF713_9 --num_iteration 500000 ", I encounter AttributeError: 'HOIICLNet' object has no attribute 'obj_to_HO_matrix_orig'. Is there something missing in this code?

Model used in affordance feature extraction

Hi,

I'm experimenting on the affordance features from ATL. I see there are pretrained networks posted, but in feature extraction script, it's using model names like 'ATL_union_batch1_semi_l2_def4_vloss2_rew2_aug5_3_x5new_coco_res101'. I'm wondering if this is provided somewhere or I have to generate on my own?

Thanks in advance!

HICO-DET download not working

When I run the first command in the readme - i.e. running the "download_dataset.sh" script; the HICO-DET dataset is not getting downloaded as a gzip file from the google drive link and thus tar command is not able to extract it. It seems there is some issue in lib/ult/Download_data.py file. Rather, when I inspected the contents of the downloaded "gz" file, its merely HTML.

Please help.

ATL as an off-the-shelf affordance recognition module

Hi,

Thanks for your excellent work. Is the pre-trained ATL model suitable as an off-the-shelf affordance recognition module for object images extracted from other datasets? If so, could you please give me some pointers where to look for around in the codebase for that? Thank your for your attention.

Kind regards,
Romero

About number of objects in V-COCO

Hi, Thank you for your great work!

I found in ATL and FCL, you mentioned that there are only two kinds of objects. I wonder what does it mean because when I look at the annotation of VCOCO, I found there are 50+ types of objects in the annotations.

Thank you!

	def obtain_gen_type(self):
	"""
	This is for ablation study
	:return:
	"""
	noise_type = 0
	if self.network.model_name.__contains__('_woa_'):
	noise_type = 2
	elif self.network.model_name.__contains__('_won_'):
	# no noise
	noise_type = 3
	elif self.network.model_name.__contains__('_won1_'):
	# no noise
	noise_type = 8
	elif self.network.model_name.__contains__('_n1_'):
	noise_type = 4
	elif self.network.model_name.__contains__('_woa1_'):
	noise_type = 5
	elif self.network.model_name.__contains__('_woa2_'):
	noise_type = 7
	elif self.network.model_name.__contains__('_woo_'):
	noise_type = 6
	return noise_type