Coder Social home page Coder Social logo

structural-regularity's Introduction

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

PaperProjectC-scores for CIFAR-10C-scores for CIFAR-100C-scores for ImageNetCheckpoints

We demonstrate the held out training algorithm and c-score estimation procedure with an example on MNIST. The c-score estimation on larger and more challenging datasets (CIFAR / ImageNet) are essentially the same as this example shows, except that extra infrastructures such as GPU clusters, job scheduling, checkpoint saving and resuming, are needed. Because MNIST is small and can be easily fit with a small network and very few epochs, we are able to provide a demo to show the core algorithm with minimum dependency on irrelevant infrastructure code, which could run in reasonable time on a single GPU. We also provide pre-computed c-scores on CIFAR-10/CIFAR-100 and ImageNet for people who are interested in playing with those datasets.

Example Code on MNIST

The demo contains a single python file mnist.py, which train multi-layer perceptrons on MNIST to estimate the C-scores, and plot examples as ranked by the estimated C-scores.

The code has the following dependencies:

After running, the code will save the computed cscores in cscores.npy and export a figure in mnist-examples.pdf like the one below. It shows some MNIST training examples from each of the 10 classes. The left block shows the examples with the highest C-scores, and the right block shows the examples with the lowest C-scores.

MNIST Examples

On a single NVidia V100 GPU, with subset ratio being 0.1, 0.2, ..., 0.9 and 200 runs for each subset ratio, it takes less than 2 hours to run.

Note: tensorflow-datasets stores the MNIST examples in a different order from the official MNIST dataset binary.

Pre-computed Scores and Pre-trained Checkpoints

We provide pre-computed C-score for download. The files are in Numpy's data format exported via numpy.savez. Please see the project website for detailed description of the file format and download links.

Pre-trained model checkpoints can be found here with supportive code to load and run evaluations with those models.

Disclaimer

This is not an officially supported Google product.

structural-regularity's People

Contributors

pluskid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

structural-regularity's Issues

pre-computed MNIST c-scores

Hi, I would like to ask if you could upload the pre-computed c-scores for the MNIST dataset. I could find the cifar10/100 and imagenet ones, and I'm having difficulties running JAX under windows (even with WSL).

Best regards,
Artur

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.