lukejaffe / gfn Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 9.0 1.71 MB

Gallery Filter Network for Person Search

License: MIT License

Dockerfile 0.34% Python 90.47% Jupyter Notebook 9.19%

person-search

gfn's People

Contributors

Stargazers

Watchers

Forkers

cv-ip dustasa zqx951102 liviust beiningwu leminhhuan72 phuvinhnguyen ducbluee

gfn's Issues

dont know why

I'm not sure of the exact reason, but the program is not running anymore even after I've updated with pip install ray[tune]==1.13.0.

(bcsj) [G19830015@admin1 GFN-main]$ osr_run --trial_config=./configs/cuhk_train_final.yaml
Traceback (most recent call last):
File "/public/home/G19830015/.local/bin/osr_run", line 33, in
sys.exit(load_entry_point('osr-lib==1.1.0', 'console_scripts', 'osr_run')())
File "/public/home/G19830015/.local/bin/osr_run", line 25, in importlib_load_entry_point
return next(matches).load()
File "/public/home/G19830015/miniconda3/envs/bcsj/lib/python3.7/site-packages/importlib_metadata/init.py", line 209, in load
module = import_module(match.group('module'))
File "/public/home/G19830015/miniconda3/envs/bcsj/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/public/home/G19830015/.local/lib/python3.7/site-packages/osr_lib-1.1.0-py3.7.egg/osr/engine/main.py", line 21, in
ModuleNotFoundError: No module named 'ray.tune.integration.torch'

this url is missing：https://drive.google.com/uc?id=0B6tjyrV1YrHeYnlhNnhEYTh5MUU

Unable to download dataset PRW via link

Occlusion and Resolution

Hello, I would like to ask, if I am doing it based on SeqNet, how to set up the two data sets of Occlusion and Resolution in SeqNet?

Wrong checkpoint path in Readme

In Readme.md, in checkpoints link,
for PRW dataset, checkpoint is cuhk...
for CUHK dataset, checkpoint is prw...
Pls rename checkpoint or update link to avoid confusion.

Tks.

I got an error RuntimeError:

The error when getting the person sim

person sim: tensor([], device='cuda:0')

# For each query image
for query_detect in query_output_list:
    print(query_detect)
    # Get query person embeddings
    query_person_emb = query_detect['det_emb']

    print("query person emb:", query_person_emb )

    print(gallery_output_list)
    # For each gallery image
    for gallery_output_dict in gallery_output_list:
        ## Get gallery person embeddings
        gallery_person_emb = gallery_output_dict['det_emb']

        ## Compute person similarity: cosine similarity of person embeddings
        person_sim = torch.mm(
            F.normalize(query_person_emb, dim=1),
            F.normalize(gallery_person_emb, dim=1).T
        ).flatten()
        
        print("person sim:", person_sim)
        ## Store person similarity
        if 'person_sim' not in gallery_output_dict:
            gallery_output_dict['person_sim'] = []
            
        gallery_output_dict['person_sim'].append(person_sim)

Here is the error

[{'det_boxes': tensor([], device='cuda:0', size=(0, 4)), 'det_scores': tensor([], device='cuda:0'), 'det_labels': tensor([], device='cuda:0'), 'det_emb': tensor([], device='cuda:0', size=(0, 2048)), 'gt_emb': tensor([[-3.4526e-02,  2.3393e-03,  7.4060e-05,  ..., -1.4855e-02,
          3.2582e-02, -1.1846e-02]], device='cuda:0'), 'scene_emb': tensor([[-0.0055,  0.0030,  0.0254,  ...,  0.0161, -0.0118, -0.0061]],
       device='cuda:0')}, {'det_boxes': tensor([], device='cuda:0', size=(0, 4)), 'det_scores': tensor([], device='cuda:0'), 'det_labels': tensor([], device='cuda:0'), 'det_emb': tensor([], device='cuda:0', size=(0, 2048)), 'gt_emb': tensor([[-3.4526e-02,  2.3393e-03,  7.4060e-05,  ..., -1.4855e-02,
          3.2582e-02, -1.1846e-02]], device='cuda:0'), 'scene_emb': tensor([[-0.0055,  0.0030,  0.0254,  ...,  0.0161, -0.0118, -0.0061]],
       device='cuda:0')}]
query berpson embedding: tensor([], device='cuda:0', size=(0, 2048))
tensor([], size=(0, 1))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-82-ec31bd52fe00>](https://localhost:8080/#) in <cell line: 3>()
     16         with torch.no_grad():
     17             print(model.gfn.get_scores(query_person_emb, query_scene_emb, gallery_scene_emb))
---> 18             qg_scene_sim = model.gfn.get_scores(query_person_emb, query_scene_emb, gallery_scene_emb).flatten().item()
     19 
     20         ## Store query-scene similarity

RuntimeError: a Tensor with 0 elements cannot be converted to Scalar

# Training Error Report

Hi there, I have tried train GFN in my own device(3090ti-24G) by using
osr_run --trial_config=./configs/prw_train_final.yaml
set batch size to 4 and change debug to True,
while I got failed as below, have you ever meet the same problem? Hoping for your reply.

==> Computing detections and embeddings
100% 6112/6112 [11:01<00:00,  9.24it/s]
==> Computing detection performance
100%|██████████| 6112/6112 [00:09<00:00, 654.58it/s]
num_det_tot: 27972
gt/tot: 23809 25062
100%|██████████| 6112/6112 [00:07<00:00, 798.01it/s]
num_det_tot: 25062
gt/tot: 25062 25062
det:
{'[email protected]': 0.9250273006517983, '[email protected]': 0.9500039901045407}
gt:
{'[email protected]': 1.0, '[email protected]': 1.0}
==> Computing retrieval performance (protocol)
==> Protocol: test
  0%|          | 0/2057 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/.local/bin/osr_run", line 33, in <module>
    sys.exit(load_entry_point('osr-lib==1.0.0', 'console_scripts', 'osr_run')())
  File "/home/.local/lib/python3.7/site-packages/osr_lib-1.0.0-py3.7.egg/osr/engine/main.py", line 439, in main
  File "/home/.local/lib/python3.7/site-packages/osr_lib-1.0.0-py3.7.egg/osr/engine/main.py", line 333, in run
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/.local/lib/python3.7/site-packages/osr_lib-1.0.0-py3.7.egg/osr/engine/evaluate.py", line 201, in evaluate_performance
  File "/home/.local/lib/python3.7/site-packages/osr_lib-1.0.0-py3.7.egg/osr/engine/evaluate.py", line 505, in evaluate_retrieval_orig
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 233, in average_precision_score
    average_precision, y_true, y_score, average, sample_weight=sample_weight
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/metrics/_base.py", line 75, in _average_binary_score
    return binary_metric(y_true, y_score, sample_weight=sample_weight)
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 206, in _binary_uninterpolated_average_precision
    y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 859, in precision_recall_curve
    y_true, probas_pred, pos_label=pos_label, sample_weight=sample_weight
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 737, in _binary_clf_curve
    assert_all_finite(y_score)
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/utils/validation.py", line 134, in assert_all_finite
    _assert_all_finite(X.data if sp.issparse(X) else X, allow_nan)
  File "/home/anaconda3/envs/osr/lib/python3.7/site-packages/sklearn/utils/validation.py", line 116, in _assert_all_finite
    type_err, msg_dtype if msg_dtype is not None else X.dtype
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

An implement detail about gfn threshold

I don't quite understand how hard threshold of gfn are used in your code. Your paper mentions that GFN will use a hard threshold, low-scoring scenes are discarded and no need to perform detection.
But in inference, I think all scenes are detected and no variable like gfn_score_thresh had been have been mentioned or used.

    def inference(self, images: List[Tensor], targets:Optional[List[Dict[str, Tensor]]]=None, inference_mode:str='both') -> List[Dict[str, Tensor]]:
        #
        original_image_sizes = [(img.shape[-2], img.shape[-1]) for img in images]
        num_images = len(original_image_sizes)
        images, targets = self.transform(images, targets)
        bb_features = self.backbone(images.tensors)

        # Get image features from the GFN
        if self.use_gfn:
            scene_emb = self.gfn.get_scene_emb(bb_features).split(1, 0)
        else:
            scene_emb = [torch.empty(0) for _ in range(num_images)]

        #
        reid_features = bb_features

        detections = [{} for _ in range(num_images)]
        embeddings = [torch.empty(0) for _ in range(num_images)]
        if (inference_mode in ('gt', 'both')) and (targets is not None):
            # query
            boxes = [t["boxes"] for t in targets]
            section_lens = [len(b) for b in boxes]
            box_features = self.roi_heads.reid_roi_pool(reid_features, boxes, images.image_sizes)
            box_features = self.roi_heads.reid_head(box_features)
            _embeddings, _ = self.roi_heads.embedding_head(box_features)
            embeddings = _embeddings.split(section_lens, dim=0)
        if (inference_mode in ('det', 'both')) or (inference_mode in ('gt', 'both')):
            # gallery
            rpn_features = bb_features
            proposals, _ = self.rpn(images, rpn_features, targets)
            detections, _ = self.roi_heads(
                bb_features, proposals, images.image_sizes, targets
            )
            detections = self.transform.postprocess(
                detections, images.image_sizes, original_image_sizes
            )

        # Reorganize outputs into single list of dict
        output_list = [{
            'det_boxes': d['boxes'],
            'det_scores': d['scores'],
            'det_labels': d['labels'],
            'det_emb': d['embeddings'],
            'gt_emb': e,
            'scene_emb': s,
        } for d, e, s in zip(detections, embeddings, scene_emb)]

        # Return output
        return output_list

In evaluate, it is the same situation. Although the gfn_scores are calculated, they are only used as weights and do not seem to be compared with any hard thresholds.
The only variable related to gfn_score's thresh I can find is gfn_filter_thresh, which keeps 99% of positive gallery scenes, as you mentions in your paper. But it seems that this variable has not been further used.

Your explanation can help me better understand. Looking forward to your reply and assistance.

I would like to know how the occlusion and low-resolution datasets used in Table 1 were set up.

Hello, I would like to know how the occlusion and low-resolution datasets used in Table 1 were set up. Do I need to manually gather statistics from the original database, or is there a parameter setting during training that takes care of it? I look forward to your response. Thank you very much!

Torchscript drive file in owner's trash

Hi for web_demo.ipynb, downloading torchscript files from readme says file is in owner's trash

(the file for torchscript_path = '../torchscript/cuhk_final.pt')

the checkpoints file.pt is available, tried just in case it was swapped with torchscript, but it produces error "PytorchStreamReader failed locating file constants.pkl: file not found"

Thank you!

which indicators should focus on?

Hello, I have tested your code and tested it on the PRW dataset. I want to know which indicators I should focus on, namely, the Map and Top-1 that you refer to in your paper.
input: osr_run --trial_config=./configs/prw_test_final.yaml

``
(func pid=19147) ==> SUCCESS!!!
Result for WrappedDistributedTorchTrainable_cda35_00000:
date: 2023-03-22_21-59-16
done: true
experiment_id: 56fe3ccccab14c33a1421bb441dc2be7
experiment_tag: '0'
hostname: user-virtual-machine
iterations_since_restore: 1
node_ip: 10.60.150.135
pid: 18464
[email protected]: 0.934801125125824
test_cross_cam_id_det_gfn_mAP: 0.5643779315391021
test_cross_cam_id_det_gfn_match: 1.0
test_cross_cam_id_det_gfn_recall: 0.9788652820880083
test_cross_cam_id_det_gfn_top1: 0.8215848322800194
test_cross_cam_id_det_mAP: 0.5525456165277787
test_cross_cam_id_det_match: 1.0
test_cross_cam_id_det_recall: 0.9788652820880083
test_cross_cam_id_det_top1: 0.8045697617890131
test_cross_cam_id_gt_gfn_mAP: 0.5782933392838289
test_cross_cam_id_gt_gfn_match: 1.0
test_cross_cam_id_gt_gfn_recall: 1.0
test_cross_cam_id_gt_gfn_top1: 0.8230432668935342
test_cross_cam_id_gt_mAP: 0.5664176155812705
test_cross_cam_id_gt_match: 1.0
test_cross_cam_id_gt_recall: 1.0
test_cross_cam_id_gt_top1: 0.8045697617890131
test_cross_cam_id_image_gfn_frac_filter: 0.11711236787108889
test_cross_cam_id_image_gfn_mAP: 0.16209892104473658
test_cross_cam_id_image_gfn_thresh_filter: 0.4776982069015503
test_cross_cam_id_image_gfn_top1: 0.42343218279047157
[email protected]: 0.9568270688692043
test_same_cam_id_det_gfn_mAP: 0.8505308214692695
test_same_cam_id_det_gfn_match: 1.0
test_same_cam_id_det_gfn_recall: 0.9790137053352912
test_same_cam_id_det_gfn_top1: 0.9859017987360233 test_same_cam_id_gt_mAP: 0.8492902613364299
test_same_cam_id_gt_match: 1.0
test_same_cam_id_gt_recall: 1.0
test_same_cam_id_gt_top1: 1.0
test_same_cam_id_image_gfn_frac_filter: 0.2125033424679168
test_same_cam_id_image_gfn_mAP: 0.6889602817530124
test_same_cam_id_image_gfn_thresh_filter: 0.5215099453926086
test_same_cam_id_image_gfn_top1: 1.0
test_test_det_gfn_mAP: 0.5827848703489255
test_test_det_gfn_match: 1.0
test_test_det_gfn_recall: 0.9788257565934494
test_test_det_gfn_top1: 0.924161400097229
test_test_det_mAP: 0.5760174356415955
test_test_det_match: 1.0
test_test_det_recall: 0.9788257565934494
test_test_det_top1: 0.8945065629557608
test_test_gt_gfn_mAP: 0.5990268233035214
test_test_gt_gfn_match: 1.0
test_test_gt_gfn_recall: 1.0
test_test_gt_gfn_top1: 0.9309674282936315
test_test_gt_mAP: 0.5917157777048339
test_test_gt_match: 1.0
test_test_gt_recall: 1.0
test_test_gt_top1: 0.8998541565386485
test_test_image_gfn_frac_filter: 0.11459317284876058
test_test_image_gfn_mAP: 0.20687682098861937
test_test_image_gfn_thresh_filter: 0.4808691143989563
test_test_image_gfn_top1: 0.7841516771998055
time_since_restore: 8954.510276317596
time_this_iter_s: 8954.510276317596
time_total_s: 8954.510276317596
timestamp: 1679493556
timesteps_since_restore: 0
training_iteration: 1
trial_id: cda35_00000
warmup_time: 16.191797971725464

== Status ==
Current time: 2023-03-22 21:59:20 (running for 02:29:53.97)
Memory usage on this node: 10.7/62.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/2 GPUs, 0.0/34.04 GiB heap, 0.0/17.02 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /diskvdb/yyl/zqx_PS/paper/GFN-main/logging/prw_final
Number of trials: 1/1 (1 TERMINATED)
+----------------------------------------------+------------+---------------------+--------+------------------+---------------+-------------------+----------------------+
| Trial name | status | loc | iter | total time (s) | [email protected] | [email protected] | test_test_gt_match |
|----------------------------------------------+------------+---------------------+--------+------------------+---------------+-------------------+----------------------|
| WrappedDistributedTorchTrainable_cda35_00000 | TERMINATED | 10.60.150.135:18464 | 1 | 8954.51 | 0.934801 | 0.956827 | 1 |
+----------------------------------------------+------------+---------------------+--------+------------------+---------------+-------------------+----------------------+

2023-03-22 21:59:20,951 INFO tune.py:702 -- Total run time: 8994.12 seconds (8993.95 seconds for the tuning loop).

Train Error report in CUHK-SYSU

Hi I have tried train GFN in my own device(Tesla T4 16G) in cuhk dataset by using
osr_run --trial_config=./configs/cuhk_train_final.yaml

just set
batch size : 4
emb_norm_type: 'layernorm' (use batchnorm may also occur propblem like error.txt)

and i got a problem in epoch 22
error.txt

Debug is true?

Hello, I have changed debug to True to train GFN, but I could not achieve the effect in the paper. The value of test_test_det_gfn_mAP is 0.487, and the value of test_test_det_gfn_top1 is 0.871. But the values for [email protected] and [email protected] are normal, at 0.9264 and 0.9512, respectively. What is the reason? Hoping for your reply.

An error while model loading in web_demo.ipynb

I’am trying to run the web_demo.ipynb but meet an error as follow.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_11836\3154094266.py in <module>
      1 # Load torchscript version of model
----> 2 model = torch.jit.load(torchscript_path)

c:\Users\SUSIE\Anaconda3\envs\osr\lib\site-packages\torch\jit\_serialization.py in load(f, map_location, _extra_files)
    160     cu = torch._C.CompilationUnit()
    161     if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 162         cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
    163     else:
    164         cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: 
Unknown builtin op: torchvision::roi_align.
Could not find any similar ops to torchvision::roi_align. This op may not exist or may not be currently supported in TorchScript.
:
  File "code/__torch__/torchvision/ops/roi_align.py", line 18
  else:
    rois = unchecked_cast(Tensor, boxes)
  _6 = ops.torchvision.roi_align(input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned)
       ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _6
'roi_align' is being compiled since it was called from '_multiscale_roi_align'
Serialized   File "code/__torch__/torchvision/ops/poolers.py", line 128
  _30 = "scales and mapper should not be None"
  _31 = __torch__.torchvision.ops.poolers._convert_to_roi_format
  _32 = __torch__.torchvision.ops.roi_align.roi_align
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _33 = "This Python function is annotated to be ignored and cannot be run"
  _34 = uninitialized(Tensor)
'_multiscale_roi_align' is being compiled since it was called from 'MultiScaleRoIAlign.forward'
Serialized   File "code/__torch__/torchvision/ops/poolers.py", line 19
    _0 = __torch__.torchvision.ops.poolers._filter_input
    _1 = __torch__.torchvision.ops.poolers._setup_scales
    _2 = __torch__.torchvision.ops.poolers._multiscale_roi_align
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    featmap_names = self.featmap_names
    x_filtered = _0(x, featmap_names, )

A similar problem (https://stackoverflow.com/questions/65705160/torch-vision-c-interface-error-unknown-builtin-op-torchvisionnms) was solved by replacing the torch version. I also read some related issue (https://github.com/pytorch/vision/issues) but I didn't find a solution.
My environment is pytorch=1.12.0 + torchvision=0.13.0 as yours in conda.yaml.

how to resume ？and how to save final ckpt.pt

Converting a Trained Model to Torchscript Format with PyTorch

I'm glad that the code is running successfully! However, I encountered an error while converting my trained model to Torchscript:
RuntimeError: getattr's second argument must be a string literal: File "/root/miniconda3/lib/python3.8/site-packages/pytorch_metric_learning/utils/common_functions.py", line 275 def reset_stats(input_obj): for attr_list in ["_record_these_stats"]: for r in getattr(input_obj, attr_list, []): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
It seems there is an issue with the following code:

`# def reset_stats(input_obj):

for attr_list in ["_record_these_stats"]:

for r in getattr(input_obj, attr_list, []):

setattr(input_obj, r, 0)

`
I have been trying to resolve this for over two weeks and am not sure how to accurately and correctly convert the model to Torchscript format. Do you have any suggestions?