Coder Social home page Coder Social logo

basti0nz / private-detector Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bumble-tech/private-detector

0.0 0.0 0.0 61 KB

Bumble's Private Detector - a pretrained model for detecting lewd images

Home Page: https://medium.com/bumble-tech/bumble-inc-open-sources-private-detector-and-makes-another-step-towards-a-safer-internet-for-women-8e6cdb111d81

License: Apache License 2.0

Python 100.00%

private-detector's Introduction

Private Detector

This is the repo for Bumble's Private Detectorโ„ข model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \
Sample Output
Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

Command Description Type Default
train_id ID for this particular training run str
train_json JSON file(s) which describes classes and contains lists of filenames of data files List[str]
eval_json Validation json file which describes classes and contains lists of filenames of data files str
num_epochs Number of epochs to train for int
batch_size Number of images to process in a batch int 64
checkpoint_dir Directory to store checkpoints in str
model_dir Directory to store graph in str .
data_format Data format: [channels_first, channels_last] str channels_last
initial_learning_rate Initial learning rate float 1e-4
min_learning_rate Minimal learning rate float 1e-6
min_eval_metric Minimal evaluation metric to start saving models float 0.01
float_dtype Float Dtype to use in image tensors: [16, 32] int 16
steps_per_train_epoch Number of steps per train epoch int 800
steps_per_eval_epoch Number of steps per evaluation epoch int 1
reset_on_lr_update Whether to reset to the best model after learning rate update bool False
rotation_augmentation Rotation augmentation angle, value <= 0 disables it float 0
use_augmentation Add speckle, v0, random or color distortion augmentation str
scale_crop_augmentation Resize image to the model's size times this scale and then randomly crop needed size float 1.4
reg_loss_weight L2 regularization weight float 0
skip_saving_epochs Do not save good checkpoint and update best metric for this number of the first epochs int 0
sequential Use sequential run over randomly shuffled filenames vs equal sampling from each class bool False
eval_threshold Threshold above which to consider a prediction positive for evaluation float 0.5
epochs_lr_update Maximum number of epochs without improvement used to reset/decrease learning rate int 20

private-detector's People

Contributors

steeeephen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.