Coder Social home page Coder Social logo

flm-101b's Introduction

|En

FLM-101B: An Open LLM and How to Train It with $100K Budget

FLM-101B is an open-source decoder-only LLM with 101 billion parameters. During the training process, model growth technique was employed. The model rapidly acquires knowledge on a small-scale model(16B) in the early stages of training and gradually scales up to 101B, resulting in a cost-effective 100B-scale LLM training(costing approximately $100,000). FLM-101B supports both Chinese and English languages. It has a context window length of 2048 in training. Thanks to the use of xPos rotary position embedding, it allows for efficient expansion of the window size during inference.

To advance the development of 100B-scale Large Language Models (LLMs), FLM-101B has now been fully open-sourced.

As part of the FLM-101B open source project, this code repository is the training framework for FLM-101B. It is a detached fork of Meagtron-LM. On this basis, we have carried out some development and optimization, including but not limited to: adding FreeLM model training method, Loss Prediction, xPos, and a new configurable dataloader system, etc.

We have prepared a comprehensive code repository that is essential to our paper's main claims. The reproduction code has been released at https://huggingface.co/CofeAI/FLM-101B. And this repository(FLM-101B) is the pre-training code. We intend to make the complete repository publicly accessible upon the publication of our paper.

Why use FLM-101B

  • It's an open-source 100B scaled Chinese-English bilingual model.
  • It's the largest known language model trained with xPos.
  • It's the largest known language model that successfully implements μp transfer and loss prediction.
  • It's the largest known language model that successfully implements progressive learning with model growth.

Parallel Strategies & HFU(Hardware FLOPs Utilization)

This training framework supports training of models with three parameter sizes mentioned in our paper: 16B, 51B and 101B. Theoretically, it can support the training of TB-level models (but the convergence has not been confirmed experimentally). We conducted actual training experiments on a cluster of 24 x NVIDIA A800 GPUs. The following table shows our actual FLM-101B configured parallel strategy scheme and floating point computing efficiency.

Params(billion) TP Size PP Size DP Size Number of GPUs Batch Size TFLOP/s per GPU GPU Utilization
16 2 1 96 192 2304 162 51.90%
51 4 2 24 192 2304 160 51.30%
101 4 4 12 192 2160 165 52.88%

System Requirements

Hardware requirements

FLM-101B requires to run on a NVIDIA-A100/A800-GPU-equipped cluster. Typically, in order to support large model training, we recommend that the configuration is no less than:

  • Total number of cluster nodes: 16
  • GPUs per node: 8
  • Memory per node: 1TB
  • Inter-GPU communication solution: NVLink
  • Inter-Node communication solution: InfiniBand (400/800 Gb/s)

Software requirements

OS Requirements

This package is supported for Linux. The package has been tested on the following systems: Linux: Ubuntu 20.04

Python Dependencies

FLM-101B mainly depends on: requirements.txt and apex package.

Installation Guide

  1. Install apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check .  2>&1 | tee build.log
  1. Install cofe-ai/FLM-101B
git clone https://github.com/cofe-ai/FLM-101B.git
cd FLM-101B
pip install -r requirements.txt

Important Methods

Citation

If you find our work useful, please consider citing FLM-101B:

@article{flm-101b,
  author       = {Xiang Li and Yiqun Yao and Xin Jiang and Xuezhi Fang and Xuying Meng and
                  Siqi Fan and Peng Han and Jing Li and Li Du and Bowen Qin and Zheng Zhang and
                  Aixin Sun and Yequan Wang},
  title        = {FLM-101B: An Open LLM and How to Train It with \$100K Budget},
  year         = {2023}
}

You may also consider FreeLM's original work in your reference:

@article{freelm,
  author       = {Xiang Li and Xin Jiang and Xuying Meng and Aixin Sun and Yequan Wang},
  title        = {FreeLM: Fine-Tuning-Free Language Model},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.01616}
}

Contact

[email protected]

License

This project is covered under the Apache 2.0 License.

flm-101b's People

Contributors

keshuichonglx avatar

Stargazers

 avatar Fang avatar Emory Mcneish avatar Michael Harris avatar  avatar Marvin avatar  avatar  avatar  avatar  avatar  avatar pe653 avatar

Watchers

Yequan Wang avatar Song Dingjie (宋定杰) avatar Fang avatar  avatar

flm-101b's Issues

Some thoughts on optimizing FLM-101B training framework

I truly appreciate the FLM-101B team open-sourcing this large-scale language model! After reading the paper, I also have some thoughts on optimizing the training framework, mainly in these aspects:

Progressive data selection strategy: using different datasets for models of different scales to achieve gradual enhancement.
Parameter update driven growth: inserting new layers based on layer update status.
Layer-wise learning rates: setting independent learning rates for different layers.
Genetic algorithm based model expansion.
Incremental fine-tuning for transfer learning.
I drafted a document elaborating these ideas in details. If the team finds it relevant, I'd be very happy to have the opportunity to further discuss thoughts on optimizing the training framework. Please reply to this issue or contact me via [email protected].

Again, thank you for the contributions of the FLM-101B team!

Some thoughts on optimizing FLM-101B training framework

I truly appreciate the FLM-101B team open-sourcing this large-scale language model! After reading the paper, I also have some thoughts on optimizing the training framework, mainly in these aspects:

Progressive data selection strategy: using different datasets for models of different scales to achieve gradual enhancement.
Parameter update driven growth: inserting new layers based on layer update status.
Layer-wise learning rates: setting independent learning rates for different layers.
Genetic algorithm based model expansion.
Incremental fine-tuning for transfer learning.
I drafted a document elaborating these ideas in details. If the team finds it relevant, I'd be very happy to have the opportunity to further discuss thoughts on optimizing the training framework. Please reply to this issue or contact me via [email protected].

Again, thank you for the contributions of the FLM-101B team!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.