dlmacedo / distinction-maximization-loss Goto Github PK

A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.

License: Apache License 2.0

Python 88.59% Shell 11.41%

classification deep-learning machine-learning open-set-recognition out-of-distribution-detection pytorch robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-estimation

distinction-maximization-loss's Introduction

Visit "The Robust Deep Learning Library" (our newest work) to quickly use this loss and much more:

The Robust Deep Learning Library

Distinction Maximization Loss (DisMax)

Efficiently Improving Out-of-Distribution Detection and Uncertainty Estimation by Replacing the Loss and Calibrating

We keep single network inference efficiency. No hyperparameter tuning. We need to train only once. SOTA.

Read the full paper: Distinction Maximization Loss: Efficiently Improving Out-of-Distribution Detection and Uncertainty Estimation by Replacing the Loss and Calibrating.

Train on CIFAR10, CIFAR100, and ImageNet.

Results

Dataset=ImageNet, Model=ResNet18, Near OOD=ImageNet-O

Loss [Score]	Class (ACC)	Near OOD (AUROC)
Cross-Entropy [MPS]	69.9	52.4
DisMax [MMLES]	69.6	75.8

Dataset=CIFAR

Use DisMax in your project!!!

Replace the SoftMax loss with the DisMax loss changing few lines of code!

Replace the model classifier last layer with the DisMax loss first part:

class Model(nn.Module):
    def __init__(self):
    (...)
    #self.classifier = nn.Linear(num_features, num_classes)
    self.classifier = losses.DisMaxLossFirstPart(num_features, num_classes)

Replace the criterion by the DisMax loss second part:

model = Model()
#criterion = nn.CrossEntropyLoss()
criterion = losses.DisMaxLossSecondPart(model.classifier)

Preprocess before forwarding in the training loop:

# In the training loop, add the line of code below for preprocessing before forwarding.
inputs, targets = criterion.preprocess(inputs, targets) 
(...)
# The code below is preexistent. Just keep the following lines unchanged!
outputs = model(inputs)
loss = criterion(outputs, targets)

Detect during inference:

# Return the score values during inference.
scores = model.classifier.scores(outputs)

Run the example:

python example.py

Code

Software requirements

Much code reused from deep_Mahalanobis_detector, odin-pytorch, and entropic-out-of-distribution-detection.

Please, install all package requirments runing the command bellow:

pip install -r requirements.txt

Preparing the data

Please, move to the `data` directory and run all the prepare data bash scripts:

# Download and prepare out-of-distrbution data for CIFAR10 and CIFAR100 datasets.
./prepare_cifar.sh
# Download and prepare out-of-distrbution data for ImageNet.
./prepare_imagenet.sh

Reproducing the experiments

Train and evaluate the classification, uncertainty estimation, and out-of-distribution detection performances:

./run_cifar100_densenetbc100.sh*
./run_cifar100_resnet34.sh*
./run_cifar100_wideresnet2810.sh*
./run_cifar10_densenetbc100.sh*
./run_cifar10_resnet34.sh*
./run_cifar10_wideresnet2810.sh*
./run_imagenet1k_resnet18.sh*

Analyzing the results

Print the experiment results:

./analyze.sh

Citation

Please, cite our papers if you use our loss in your works:

@article{DBLP:journals/corr/abs-2205-05874,
  author    = {David Mac{\^{e}}do and
               Cleber Zanchettin and
               Teresa Bernarda Ludermir},
  title     = {Distinction Maximization Loss:
  Efficiently Improving Out-of-Distribution Detection and Uncertainty Estimation
  Simply Replacing the Loss and Calibrating},
  journal   = {CoRR},
  volume    = {abs/2205.05874},
  year      = {2022},
  url       = {https://doi.org/10.48550/arXiv.2205.05874},
  doi       = {10.48550/arXiv.2205.05874},
  eprinttype = {arXiv},
  eprint    = {2205.05874},
  timestamp = {Tue, 17 May 2022 17:31:03 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2205-05874.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@article{DBLP:journals/corr/abs-2208-03566,
  author    = {David Mac{\^{e}}do},
  title     = {Towards Robust Deep Learning using Entropic Losses},
  journal   = {CoRR},
  volume    = {abs/2208.03566},
  year      = {2022},
  url       = {https://doi.org/10.48550/arXiv.2208.03566},
  doi       = {10.48550/arXiv.2208.03566},
  eprinttype = {arXiv},
  eprint    = {2208.03566},
  timestamp = {Wed, 10 Aug 2022 14:49:54 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2208-03566.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

distinction-maximization-loss's People

Contributors

Stargazers

Watchers

Forkers

ricklentz techthiyanes jxzhangjhu eren23

distinction-maximization-loss's Issues

Using DIsMax loss for my own classifier.

Hi,
Thanks for open sourcing your work.
Let's say my current classifer is a 3 layer MLP with Dropout as shown below
'''
model.classifier=nn.Sequential(
nn.Linear(in_features=1536,out_features=625),
nn.ReLU(),
nn.Dropout(p=0.3),
nn.Linear(in_features=625,out_features=256)
nn.Linear(in_features=256,out_features=8),
)
'''
As such what changes shall i make in this in order to use the distinction maximzation loss
From the readme model must be defined as follows

class Model(nn.Module):
    def __init__(self):
    (...)
    #self.classifier = nn.Linear(num_features, num_classes)
    self.classifier = losses.DisMaxLossFirstPart(num_features, num_classes)

Small score difference between target and "background"/ OOD class

Thank you very much for your research contribution. An easy-to-use loss function that can differentiate between target classes and "background"/ OOD classes is extremely useful.

I want to use your solution so that my classifier, trained to predict the gender of cartoon faces, ignores images which don't contain a face (e.g., predicts the gender of a "background" or non-face image with a very low confidence). Unfortunately, I'm finding that my model doesn't greatly differentiate between faces and "background" images.

For example, the tensors are very similar in value:

Where am I going wrong? I greatly appreciate any feedback you can give me.

How I train the model
Example batch:

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    history = []

    for epoch in range(num_epochs):
        result = {}
        epoch_start = time.time()
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('-' * 10)

        # each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0
            correct_females = 0
            correct_males = 0

            # iterate over data
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    inputs, labels = criterion.preprocess(inputs, labels) #dismax

                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                for i, pred in enumerate(preds):
                    if labels[i] == 0 and labels[i] == pred: 
                      correct_females += 1
                    if labels[i] == 1 and labels[i] == pred:
                      correct_males += 1
            
            if phase == 'train':
                # record & update learning rate
                result['lrs'] = get_lr(optimizer)
                scheduler.step()

            # for printing to command line
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double().item() / dataset_sizes[phase]
            female_acc = correct_females / gender_dist[phase]['females']
            male_acc = correct_males / gender_dist[phase]['males']
            # for plotting
            result[phase+'_loss'] = epoch_loss
            result[phase+'_acc'] = epoch_acc
            result[phase+'_female_acc'] = female_acc
            result[phase+'_male_acc'] = male_acc

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                # Save if the model has best accuracy till now
                torch.save(model, data_dir+'_model_best.pt')

            # print stats to command line
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
            total_females = gender_dist[phase]['females']
            total_males = gender_dist[phase]['males']
            print(f'Female Acc: {correct_females}/{total_females} Male Acc: {correct_males}/{total_males}')
            print(f'Female Acc: {female_acc:.4f} Male Acc: {male_acc:.4f}')

            if phase == 'val':
                epoch_time = time.time() - epoch_start
                print('Time per epoch {:.0f}m {:.0f}s'.format(
                    epoch_time // 60, epoch_time % 60))

        history.append(result)
        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    torch.save(history, data_dir+'_history.pt')

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, history

# Load pretrained ResNetXX Model
# dismax: https://github.com/dlmacedo/distinction-maximization-loss
import losses # dismax
model = models.resnet50(pretrained=True)

num_ftrs = model.fc.in_features
model.fc = losses.DisMaxLossFirstPart(num_ftrs, len(class_names)) # dismax
model = model.to(device)
criterion = losses.DisMaxLossSecondPart(model.fc) # dismax

# Observe that all parameters are being optimized
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) #lr=0.003

# Freeze until 2nd block (included): https://raminnabati.com/post/002_adv_pytorch_freezing_layers/
ct = 0
for child in model.children():
  ct += 1
  if ct < 6:
      for param in child.parameters():
          param.requires_grad = False

# Decay LR
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)

# Train
model, history = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=30

How I evaluate the model (one image at a time)

## Make predictions, and save images with title
import torch.nn.functional as F
import pandas as pd
model.eval()

with torch.no_grad():
  for x in [f'{data_dir}']:
    # batch size 1 so can get file names correctly
    for i, (inputs, labels) in enumerate(dataloaders[x]):
      inputs = inputs.to(device)

      outputs = model(inputs)
      scores = model.fc.scores(outputs).item() # dismax 
      _, preds = torch.max(outputs, 1)
      path, _ = dataloaders[x].dataset.samples[i]
      fname = path.split('/')[-1]

      for j in range(inputs.size()[0]):
        ax = plt.subplot(1, 1, 1)
        ax.axis('off')

        pred = class_names[preds[j]]
        ax.set_title(f'Predicted Gender: {pred} ({scores}%)') # conf:.2f

        inp = inputs.cpu().data[j].numpy().transpose((1, 2, 0))
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        inp = std * inp + mean # denormalise image
        inp = np.clip(inp, 0, 1)
        ax.imshow(inp) # has to show otherwise canvas is blank
        plt.savefig(f'{DST_PATH}/{pred}/{fname}')
        plt.clf

Text data usage ?

Hello,
Is this method applicable for text data ?
I want this method for text data with using tokenizer. How can I use ?
Thanks in advance

Conference or Journal

Hello,

may I ask what conference or journal you plan to submit your work to,

or are you already submitting it?