xmed-lab / clipn Goto Github PK
View Code? Open in Web Editor NEWICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
License: MIT License
ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
License: MIT License
Highly appreciate your work and would like to explore new models in your research. However, I lack sufficient computational power for pre-training. May I inquire if you could provide the pre-trained model parameters for the CC-3M dataset? Thank you very much.
Hi, I'm going to test my own datasets of CLIPN, but didn't see any links to download the weights of CLIPN.
How could I get the weights?
Really appreciate your work. It is really impressive.
I'd like to replicate the results using CIFAR-100 as the in-distribution dataset. Is it possible to access the code for the CIFAR-100 dataset or receive guidance on image transformations and dataset splitting, such as using CIFAR-10 as the out-of-distribution dataset?
Thank you very much.
First of all, thank you for your work.
The method is promising and your article is very interesting, so I tried to use it in two way:
I'm using the .pt weights you kindly provided, and I tried to implement the ATD and the CTW methods.
However the results were really bad leading me to think I missed something, on my first usecase the prompt was only:
"A photo of a person with a {}" ("A photo of a person without a {}") with "hat", "cap", "helmet" as the class names.
Using ATD everything is considered as an OOD, using CTW almost everything is considered as an ID.
I have some question regarding your paper:
Do you have a reference or a paper explaining where Eq.4 comes from? So regarding the CTW method, Eq.4 should be over 0.5 for the classification to be OOD.
And also from where comes the Eq.8?
As for the Eq.6, to compute pij, this is a kind of softmax right? Just adding the temperature parameter?
In this case, wouldn't the ATD method be unusable when you only have one class and just want to discard the FP as pij is equal to 1?
The first thing that came to my mind was to find the index of maximum value in logits, and check logits[index] > logits_no[index] to check if it's an ID or an OOD, however I suppose it's mathematically incorrect as you didn't mention it in your paper, and the test I ran also led to bad results.
Here are the functions I wrote for ATD and CTW from what I understood from your paper, they are kind of raw as it's a wip. I used the code in "handcrafted" folder, from what I understood this is the one to use when dealing with custom prompts and not the learned ones.
Both of them takes the logits and logits_no computed this way:
logits = F.normalize(feat, dim=-1, p=2) @ fc_yes.T
logits_no = F.normalize(feat, dim=-1, p=2) @ fc_no.T
As well as a tau parameter, I set it to 1 for now.
def CTW(logits_yes, logits_no, tau):
yes = logits_yes[0].detach().tolist()
no = logits_no[0].detach().tolist()
pij = []
denominator = 0
for i in range(len(yes)):
denominator += math.exp(yes[i] / tau)
for i in range(len(yes)):
pij.append(math.exp(yes[i] / tau) / denominator)
pijno = []
for i in range(len(no)):
pijno.append(math.exp(no[i]/tau) / (math.exp(yes[i]/tau) + math.exp(no[i]/tau)))
index = pij.index(max(pij))
bestood = pijno[index]
return (index, 1 - bestood > bestood)
def ATD(logits_yes, logits_no, tau):
ood = 1.
yes = logits_yes[0].detach().tolist()
no = logits_no[0].detach().tolist()
pijno = []
for i in range(len(no)):
pijno.append(math.exp(no[i]/tau)/(math.exp(yes[i]/tau) + math.exp(no[i]/tau)))
pij = []
denominator = 0
for i in range(len(yes)):
denominator += math.exp(yes[i]/tau)
for i in range(len(yes)):
pij.append(math.exp(yes[i]/tau)/denominator)
index = pij.index(max(pij))
for i, pno in enumerate(pijno):
ood -= (1 - pno)*pij[i]
res = 0
for pyes in pij:
if pyes > ood:
res = 1
return (index, res)
The return value is 1 if it's an ID and 0 otherwise.
The model is in eval mode and I use process_test function returned by load_model() function to preprocess the images I load using Pil Image.open().
So I don't know if I did something wrong or if I "just" need to retrain the model.
Thank for your help!
Appreciate your impressive work.
In the table 2 of the main paper, is the MSP, MaxLogit results reproduced on CLIP or CLIPN? I test the MaxLogit on CLIP (VitB-32) on CIFAR100 (id) and CIFAR10 (ood), but only get 74.8%AUROC.
I tried to do pip install -r ./requirements
as instructed but encountered a couple issues:
error in blessings setup command: use_2to3 is invalid.
indicating an error when installing blessings==1.6
, which in fact can be solve by pip install setuptools==58
refcloud-init
command-not-found
cupshelpers
defer==1.06
distro-info===0.23ubuntu1
language-selector==0.1
nvidia-cublas-cu11==11.10.3.66
through pip
on MacOSERROR: Could not find a version that satisfies the requirement cloud-init==23.1.2 (from versions: none)
ERROR: No matching distribution found for cloud-init==23.1.2
A quick google search suggests that cloud-init
seems to be (terribly undocumented) something usually built-in in a virtual machine.
Any suggestions to solve the issue of installing these packages ??
Hello,
Thank you for the interesting work! Could you provide the weights for the ViT-B-32 model?
Only the ViT-B-16 version is available in the README.
Thanks in advance!
Best,
Hi, How can we get the model weights of CLIPN
In train_one_epoch function, your model outputs 4 variables (image_features, text_features, text_features_no, logit_scale). However, in evaluate function, your model outputs only 3 variables without text_features_no. This is quite strange because you training text enocder no but do not use it during evaluation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.