Vision-Language Learning |
Composed Video Retrieval |
Composed Video Retrieval via Enriched Context and Discriminative Embeddings |
composed-video-retrieval |
CVPR'24 |
Self-supervision |
Multi-Spectral Satellite Imagery |
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery |
satmae_pp |
CVPR'24 |
Vision-Language Learning |
Video grounding |
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding |
Video-GroundingDINO |
CVPR'24 |
Vision-Language Learning |
Language Driven VLM for Remote Sensing |
Geochat: Grounded large vision-language model for remote sensing |
GeoChat |
CVPR'24 |
Vision-Language Learning |
Leaverging LLM to generate complex scenes (Zero-Shot) |
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts |
llmblueprint |
ICLR'24 |
Self-supervision |
Self-structural Alignment of Foundational Models (Zero-Shot) |
Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment |
S3A |
AAAI'24-Oral |
Vision-Language Learning |
Test-Time Alignment of Foundational Models (Zero-Shot) |
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization |
PromptAlign |
NeurIPS'23 |
Vision-Language Learning |
Regulating Foundational Models |
Self-regulating Prompts: Foundational Model Adaptation without Forgetting |
PromptSRC |
ICCV'23 |
Network Engineering |
Video Recognition |
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition |
Video-FocalNets |
ICCV'23 |
Vision-Language Learning |
Face Anti-spoofing |
FLIP: Cross-domain Face Anti-spoofing with Language Guidance |
FLIP |
ICCV'23 |
3D Medical Segmentation |
Adversarial Training |
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation |
VAFA |
MICCAI'23 |
Vision-Language Learning |
Facial Privacy |
CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search |
Clip2Protect |
CVPR'23 |
Vision-Language Learning |
Video Recognition (Zero-shot) |
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting |
Vita-CLIP |
CVPR'23 |
Prompt learning |
Image Recognition (Category Discovery) |
PromptCAL for Generalized Novel Category Discovery |
PromptCAL |
CVPR'23 |
Prompt learning |
Adversarial Attack |
Boosting Adversarial Transferability using Dynamic Cues |
DCViT-AT |
ICLR'23 |
Self-supervision |
Video Recognition |
Self-Supervised Video Transformer |
SVT |
CVPR'22-Oral |
Contrastive learning |
Adversarial Defense |
Stylized Adversarial Training |
SAT |
IEEE-TPAMI'22 |
Self-supervision |
Adversarial Attack |
Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations |
ARP |
BMVC'22-Oral |
Self-supervision |
Image Recognition |
How to Train Vision Transformer on Small-scale Datasets? |
VSSD |
BMVC'22 |
Self-distillation |
Image Recognition (Domain Generalization) |
Self-Distilled Vision Transformer for Domain Generalization |
SDViT |
ACCV'22-Oral |
Attention Analysis |
Understanding Vision Transformer |
Intriguing Properties of Vision Transformers |
IPViT |
NeurIPS'21-Spotlight |
Self-ensemble |
Adversarial Attack |
On Improving Adversarial Transferability of Vision Transformers |
ATViT |
ICLR'21-Spotlight |
Distribution matching |
Adversarial Attack |
On Generating Transferable Targeted Perturbations |
TTP |
ICCV'21 |
Contrastive learning |
Image Recognition |
Orthogonal Projection Loss |
OPL |
ICCV'21 |
Self-supervision |
Adversarial Defense |
A Self-supervised Approach for Adversarial Robustness |
NRP |
CVPR'20-Oral |
Relativistic optimization |
Adversarial Attack |
Cross-Domain Transferability of Adversarial Perturbations |
CDA |
NeurIPS'19 |
Gradient Smoothing |
Adversarial Defense |
Local Gradients Smoothing: Defense Against Localized Adversarial Attacks |
LGS |
WACV'19 |