buildnanogpt-plus's Introduction

Build NanoGPT Plus

This repo is mainly a refactored, modularized, and extended version of the original BuildNanoGPT. The BuildNanoGPT project together with Andrej's step by step video is one of the best learning material even for deep learning researchers. I learned a lot from it. The problem with the original project is that it is not modularized and is not convenient for some follow-up usages and test. So I decided to refactor it and extend it to make it more modular and more easily used and experimented with.

I include model implementations other than just gpt-2. Currently, I added the implementations of Llama, which is done by hengjiUSTC's learn-llm with small modifications. Well, that implementation seems quite like partially copied the implementation in transformers. I will add more models in the future.

Below is a list of what have been done in this project:

Decouple model implementation, evaluation, and training.

The training script is train.py, and the model implementation, and the evaluation with hellaswag, or generation are moved to the corresponding files. Therefore, it is convenient to evaluate a trained model on hellaswag or generate text with the trained model.

Continue Training.

Now, a partially trained model can be loaded in train.py to continue training it rather than start from scratch all the time.

Add Key Value Cache.

Now for inference, key value cache is added (only to LlaMa Now) for faster inference.

Printing Training Progress and Estimated Time.

You had to calculate the training progress and completion time. Now, it is printed.

Add Model format Convertion.

The trained model now can be converted to the Huggingface transformers format with the function convert_to_hf from convert_to_hf.py.

Support loading any transformers model.

In the past, only specified pretrained models can be loaded. Now they can all be loaded. The config can be read from transformers model configs.

Add Comparison Evaluation by an LLM.

Now, we can compare the performance of two LLMs on completion by a well trained LLM with auto_evaluation.py

Add Training Transformers CausalLM model.

Now, it can train the transformers' CausalLM model in train.py.

Add Model Loading and Generation.

The function load_model is added to load the trained model and generate text with the function complete both in utils.py.

Plot all training logs

Previously, only the loss is plotted. Now, all the training logs that are printed in the command line can be ploted with plot_console_log.py.

TODO:

Add support for training with transformers' CausalLM model.
Add more models.
Add more evaluation datasets.
Add more training datasets.

License

This project is licensed under the terms of the Apache 2.0 License.

Discord Server

Join our Discord server here.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

menicefellow / buildnanogpt-plus Goto Github PK