Comments (5)
shall we need to normalize the CLIP embedding before inputting it to the model?
from clip-based-nsfw-detector.
I downloaded several images from laion2B-en-aesthetic and tried to use the CLIP model (ViT-L/14) to get the clip embedding and take it as input to the NSFW detector. However, the results were different from those shown on laion2B-en-aesthetic.
from clip-based-nsfw-detector.
from clip-based-nsfw-detector.
Yes, I tried to leverage the normalization from improved-aesthetic-predictor.
def normalized(self, a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2 == 0] = 1
return a / np.expand_dims(l2, axis)
def __call__(self, clip_embs):
if isinstance(clip_embs, torch.Tensor):
clip_embs = self.normalized(clip_embs.cpu().numpy())
return self.model.predict_on_batch(clip_embs)
But the results still could not match your results shown on laion2B-en-aesthetic. I checked the img_embs in your provided dataset, and their data type is float16. I tried to use fp16 inference to get the float16 embeddings, but the results were still not correct. Did I miss something? Thanks.
with autocast(enabled=fp16_model):
clip_embs = extractor.encode_image(data['image'].cuda())
I constructed the CLIP model as following:
class ClipExtractor(nn.Module):
def __init__(self, model_name="ViT-L/14", jit=False):
super().__init__()
self.model, self.transform = clip.load(model_name, device='cpu', jit=self.jit)
@torch.no_grad()
def encode_image(self, images):
with torch.no_grad():
images_embeddings = self.model.encode_image(self.transform(images))
return images_embeddings
from clip-based-nsfw-detector.
@rom1504 @christophschuhmann Any ideas on my problems? Thanks.
from clip-based-nsfw-detector.
Related Issues (12)
- add doc
- What is the safety_settings.yml used for?
- ViT-H-14 variant
- How to use the violence detection prompts?
- Would it possible to have a seperate NSFW prompt too? like the violence detection one?
- add inference code
- Annotations for the NSFW test set? HOT 6
- How do you determine the thresholds???
- Wrong definition of NSFW p values in readme?
- torch version HOT 4
- Colab demo notebook raises exception HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clip-based-nsfw-detector.