Coder Social home page Coder Social logo

buildnanogpt-plus's Introduction

Build NanoGPT Plus

This repo is mainly a refactored, modularized, and extended version of the original BuildNanoGPT. The BuildNanoGPT project together with Andrej's step by step video is one of the best learning material even for deep learning researchers. I learned a lot from it. The problem with the original project is that it is not modularized and is not convenient for some follow-up usages and test. So I decided to refactor it and extend it to make it more modular and more easily used and experimented with.

I include model implementations other than just gpt-2. Currently, I added the implementations of Llama, which is done by hengjiUSTC's learn-llm with small modifications. Well, that implementation seems quite like partially copied the implementation in transformers. I will add more models in the future.

Below is a list of what have been done in this project:

  • Decouple model implementation, evaluation, and training.

The training script is train.py, and the model implementation, and the evaluation with hellaswag, or generation are moved to the corresponding files. Therefore, it is convenient to evaluate a trained model on hellaswag or generate text with the trained model.

  • Continue Training.

Now, a partially trained model can be loaded in train.py to continue training it rather than start from scratch all the time.

  • Add Key Value Cache.

Now for inference, key value cache is added (only to LlaMa Now) for faster inference.

  • Printing Training Progress and Estimated Time.

You had to calculate the training progress and completion time. Now, it is printed.

  • Add Model format Convertion.

The trained model now can be converted to the Huggingface transformers format with the function convert_to_hf from convert_to_hf.py.

  • Support loading any transformers model.

In the past, only specified pretrained models can be loaded. Now they can all be loaded. The config can be read from transformers model configs.

  • Add Comparison Evaluation by an LLM.

Now, we can compare the performance of two LLMs on completion by a well trained LLM with auto_evaluation.py

  • Add Training Transformers CausalLM model.

Now, it can train the transformers' CausalLM model in train.py.

  • Add Model Loading and Generation.

The function load_model is added to load the trained model and generate text with the function complete both in utils.py.

  • Plot all training logs

Previously, only the loss is plotted. Now, all the training logs that are printed in the command line can be ploted with plot_console_log.py.

TODO:

  • Add support for training with transformers' CausalLM model.
  • Add more models.
  • Add more evaluation datasets.
  • Add more training datasets.

License

This project is licensed under the terms of the Apache 2.0 License.

Discord Server

Join our Discord server here.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

buildnanogpt-plus's People

Contributors

servis avatar

Watchers

Dr. Nicefellow avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.