Coder Social home page Coder Social logo

pseco's Introduction

Class-agnostic object counting with PseCo

This repo provides the official implementation of our PseCo (CVPR 2024) for class-agnostic object counting, which combines the advantages of two foundation models of computer vision: CLIP and Segment Anything Model.

Point, Segment and Count: A Generalized Framework for Object Counting
https://arxiv.org/abs/2311.12386
Abstract: Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names, \emph{a.k.a} few-shot and zero-shot counting. Current state-of-the-art methods highly rely on density maps to predict object counts, which lacks model interpretability. In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection. Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability: (\textbf{i}) SAM to segment all possible objects as mask proposals, and (\textbf{ii}) CLIP to classify proposals to obtain accurate object counts. However, this strategy meets the obstacles of efficiency overhead and the small crowded objects that cannot be localized and distinguished. To address these issues, our framework, termed PseCo, follows three steps: point, segment, and count. Specifically, we first propose a class-agnostic object localization to provide accurate but least point prompts for SAM, which consequently not only reduces computation costs but also avoids missing small objects. Furthermore, we propose a generalized object classification that leverages CLIP image/text embeddings as the classifier, following a hierarchical knowledge distillation to obtain discriminative classifications among hierarchical mask proposals. Extensive experimental results on FSC-147 dataset demonstrate that PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection, with additional results on large-scale COCO and LVIS datasets.

If you found this code helps your work, do not hesitate to cite my paper or star this repo!

Framework

Qualitative Comparisons

Qualitative Comparisons

Results on object counting: FSC-147

Install requirements

Install the environments

python -m venv venv
venv/bin/activate

Install detectron2: https://detectron2.readthedocs.io/en/latest/tutorials/install.html

python -m pip install detectron2==0.6 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

Install torch (2.0.0) and torchvision: https://pytorch.org/get-started/previous-versions/

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

and then install other requirements:

pip install -r requirements.txt

Training

The training scripts on FSC-147 and FSCD-LVIS are in folders: /fsc147 and /fscd_lvis with the following steps:

# 1. Generate the image features, annotations, and example/text CLIP prompts
# It is done in advance to reduce the training time, although it is impractical on large-scale datasets, eg, COCO (almost 0.5TB).
# One may need at least 20G memory to train the model
1_generate_data.ipynb
# 2. Train the point decoder
2_train_heatmap.ipynb
# 3. Extract the proposals and CLIP image embeddings
# with support of multiple GPUs
torchrun --master_port 17673 --nproc_per_node=4 extract_proposals.py
# 4. Train the ROI classification head
## Ours few-shot
python train_roi_head.py --wandb --entity zzhuang
## Ours zero-shot
python train_roi_head.py --wandb --entity zzhuang --zeroshot
## ViLD
4_2_train_vild.ipynb

I have provided the preprocessed datasets in:

One may need to merge the split files into single large file:

cat all_predictions_vith.pth* > all_predictions_vith.pth
rm -rf all_predictions_vith.ptha*
cat all_data_vith.pth* > all_data_vith.pth
rm -rf all_data_vith.ptha*

We can also enable the WANDB to visualize the training!

Set the wandb parameters to true, and login to wandb.ai:

wandb login xxx

DEMO

This demo uses preprocessed data, and cannot be executed on Colab due to limited resources. demo_fsc147.ipynb Explore in Colab

This demo can be executed on Colab. demo_in_the_wild.ipynb Explore in Colab

I am so sorry that I have no time to write an interactive demo.

Citation

If you found this code or our work useful please cite us:

@inproceedings{zhizhong2024point,
  title={Point, Segment and Count: A Generalized Framework for Object Counting},
  author={Zhizhong, Huang and Mingliang, Dai and Yi, Zhang and Junping, Zhang and Hongming, Shan},
  booktitle={CVPR},
  year={2024}
}

pseco's People

Contributors

hzzone avatar

Stargazers

Yogesh  avatar  avatar Alexander Kalinovsky avatar kaiyuan Xu avatar WANG XIN avatar Jiping Jin avatar Qin Liu avatar Germán Ferrando  avatar Gurkirt Singh avatar  avatar Eteph avatar  avatar Mohieddine Drissi avatar Liu Lanxin avatar cryan avatar Daniel Kai avatar  avatar Michael Gregory avatar Xianing Chen avatar Tracyummy avatar  avatar Zhiheng_Ma avatar Lawrence avatar Xiaobing Han avatar ZuoBinhua avatar Prajwal Singh avatar  avatar Sanctuary avatar Zanoni avatar Robert Luo avatar tuanzi avatar  avatar VARUN SAKUNIA avatar LrWm avatar  avatar Igor Moiseev avatar Nick Imanzi avatar  avatar Allan Kouidri avatar  avatar  avatar Renaud Bouckenooghe avatar Varun Ganjigunte Prakash avatar Hoàng Hồng Sơn  avatar Claus Steinmassl avatar Markus Stoll avatar Ben Ahlbrand avatar LAKSHMANAN avatar Mingliang Dai avatar Roman Kryvokhyzha avatar TheCocce avatar Maksim Osminin avatar Thavarasa Prasanth avatar Husseine KEITA avatar Abu Anas Shuvom avatar  avatar  avatar  avatar Jeff Carpenter avatar Eugene Liu avatar

Watchers

Lilong Wen avatar  avatar Sanctuary avatar

pseco's Issues

The meaning of cls_loss2?

Thank you for your excellent work and open source! I notice that another loss named cls_loss2 is computed during training, which is not introduced in paper. I'm curious about the effect of cls_loss2, can you explain it? Thank you!

`with torch.autograd.set_grad_enabled(True) and torch.autocast(device_type='cuda', enabled=amp):
cls_outs = cls_head(features, anchor_boxes, query_features)

    cls_loss = F.binary_cross_entropy_with_logits(cls_outs, query_labels, reduction='none')
    loss_mask = (query_labels >= 0).float()
    cls_loss = (cls_loss * loss_mask).sum() / (loss_mask.sum() + 1e-5)

    cls_outs2 = cls_head(features_a, clip_boxes, clip_target_features)
    cls_loss2 = F.binary_cross_entropy_with_logits(cls_outs2, clip_query_labels, reduction='none')
    loss_mask = (clip_query_labels >= 0).float()
    cls_loss2 = (cls_loss2 * loss_mask).sum() / (loss_mask.sum() + 1e-5)

    loss = cls_loss + cls_loss2 * cls_loss2_weight

    update_params = (n_iter % acc_grd_step == 0)
    loss = loss / acc_grd_step
    scaler(loss, optimizer=optimizer, update_grad=update_params)`

Mask Decoder Output

Based on the diagram from the paper, it seems like there are mask proposals outputted from the mask decoder. If this is the case, I want to save these masks somewhere, but in the demo_in_the_wild.ipynb file, I can't really tell where the mask outputs are at. Can you locate me to this point?

Using prompts from different images

Hi, nice work here! Very glad you share. I was wondering if you have tried or if it is possible to use visual prompts from other images than the prediction one. Thanks!

About MLP_small_box_w1_fewshot.tar

Could you please clarify which training file trained the MLP_small_box_w1_fewshot.tar file that cls_head is loaded from? I suspect it was trained by 4_1_train_roi_head.py and saved without an extension. When I renamed it to .tar, I found that their dictionary keys are different. Therefore, I'm unsure about the role of 'cls_head-10000' and how to obtain the MLP_small_box_w1_fewshot.tar file."

About weight file loading failed

Sorry to bother you, I was fascinated by your excellent work. Try running your code, I've followed your steps to cat's weight file for the preprocessed dataset. At that time, these combined weights could not be loaded normally. Is there a problem with my cat? By the way, these files are unusually large.
微信图片_20240516171712
微信图片_20240516171816

Evaluation on FSC-147

I want to test the newly trained and slightly adjusted PseCo model on the FSC benchmark. Is there a code that you used for evaluation by chance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.