Coder Social home page Coder Social logo

alejoacelas / arena_2.0_exhibit Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 129.1 MB

Solution to ML assignments from the Alignment Research Engineering Accelerator (ARENA) in-person program

Python 21.49% HTML 14.08% Dockerfile 0.03% Shell 0.12% Jupyter Notebook 64.26% Makefile 0.01% Batchfile 0.01%
cuda mechanistic-interpretability nlp pytorch rl torch-lightning transformers

arena_2.0_exhibit's Introduction

This is a frozen copy of the ARENA 2.0 GitHub repo containing the exercises we completed (my pair for the day and me) as part of the in-person version of the ARENA program. The code we contributed can be found in the answers.py within the exercises folder for each chapter. The solutions for some days of the curriculum are missing as I wasn't dilligent enough committing our progress every day.

Description of the Curriculum

You can find a summary of each of the chapters below. For more detailed information (including the different ways you can access the exercises), click on the links in the chapter headings.

The material on this page covers the first five days of the curriculum. It can be seen as a grounding in all the fundamentals necessary to complete the more advanced sections of this course (such as RL, transformers, mechanistic interpretability, and generative models).

Some highlights from this chapter include:

  • Building your own 1D and 2D convolution functions
  • Building and loading weights into a Residual Neural Network, and finetuning it on a classification task
  • Working with weights and biases to optimise hyperparameters
  • Implementing your own backpropagation mechanism

The material on this page covers the next 8 days of the curriculum. It will cover transformers (what they are, how they are trained, how they are used to generate output) as well as mechanistic interpretability (what it is, what are some of the most important results in the field so far, why it might be important for alignment).

Some highlights from this chapter include:

  • Building your own transformer from scratch, and using it to sample autoregressive output
  • Using the TransformerLens library developed by Neel Nanda to locate induction heads in a 2-layer model
  • Finding a circuit for indirect object identification in GPT-2 small
  • Intepreting model trained on toy tasks, e.g. classification of bracket strings, or modular arithmetic
  • Replicating Anthropic's results on superposition

Unlike the first chapter (where all the material was compulsory), this chapter has 4 days of compulsory content and 4 days of bonus content. During the compulsory days you will build and train transformers, and get a basic understanding of mechanistic interpretability of transformer models which includes induction heads & use of TransformerLens. The next 4 days, you have the option to continue with whatever material interests you out of the remaining sets of exercises. There will also be bonus material if you want to leave the beaten track of exercises all together!

Reinforcement learning is an important field of machine learning. It works by teaching agents to take actions in an environment to maximise their accumulated reward.

In this chapter, you will be learning about some of the fundamentals of RL, and working with OpenAI’s Gym environment to run your own experiments.

Some highlights from this chapter include:

  • Building your own agent to play the multi-armed bandit problem, implementing methods from Sutton & Bardo
  • Implementing a Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to play the CartPole game
  • Applying RLHF to autoregressive transformers like the ones you built in the previous chapter

With the advent of large language models, training at scale has become a necessity to create highly competent models. In this chapter we will go through the basics of GPUs and distributed training, along with introductions to libraries that make training at scale easier.

Some highlights from this chapter include:

  • Quantizing your model to INT8 for blazing fast inference
  • Implementing distributed training loops using torch.dist
  • Getting hands on with Huggingface Accelerate and Microsoft DeepsSpeed

arena_2.0_exhibit's People

Contributors

callummcdougall avatar pranavgade20 avatar alejoacelas avatar atagade avatar aprillion avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.