Coder Social home page Coder Social logo

bamboo's Introduction

Bamboo

We introduce Bamboo-v0.1, a new 7B LLM that boasts high sparsity while delivering performance equivalent to Mistral-7B. This repo provides the details of the model.

Models

Model Transformers(HF) PowerInfer/llama.cpp(GGUF)
Bamboo-7B-base-v0.1 Bamboo-base-v0.1 Bamboo-base-v0.1-gguf
Bamboo-7B-DPO-v0.1 Bamboo-DPO-v0.1 Bamboo-DPO-v0.1-gguf

Performance with different sparsity

Recent studies (Zhang et al., 2024) have shown that the activation sparsity exists in LLMs by only keeping the top-k activation neurons in each layer. In this subsection, we show the performance of Bamboo with different sparsity with activation magnitude thresholding to select neurons. We evaluate the perplexity on wikitext-2-raw-v1.

Top-k Neurons PPL
100% 6.484
20% 6.484
15% 6.485
12% 6.497
10% 6.524

CDF of neurons distribution

Here we report the CDF of neurons distribution of Bamboo-7B-base-v0.1 for every FFN layer. We profile neurons' activation with cosmopedia dataset for around 1M tokens.

CDF of neurons distribution

Model Performance

Our evaluation is based on the framework lm-evaluation-harness. The evaluation details are listed as follows:

  • Huggingface LLM Leaderboard tasks.
  • Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
Average MMLU Winogrande TruthfulQA Hellaswag GSM8K Arc-C HumanEval BBH
Bamboo 57.1 63.89 76.16 44.06 82.17 52.84 62.20 25.6 50.35
Mistral-v0.1 56.5 62.65 79.24 42.62 83.32 40.18 61.43 26.21 56.35

Inference Efficiency

bamboo-live-demo.mp4

Both PowerInfer and llama.cpp fully utilized the same hardware of Intel Core i7-13700 (8 threads) and Nvidia RTX 2080Ti (11GB).

Below is a detailed comparison of inference speeds (tokens/second) achieved on Bamboo-7B-base with PowerInfer and llama.cpp across various hardware configurations.

Scenario Hardware with PowerInfer with llama.cpp Speedup
CPU+GPU Hybird Core i7-13700(8T) + RTX 2080Ti(11GB) 33.50 7.64 4.38x
Full GPU RTX 4090(24GB) 92.46 58.34 1.58x
Full CPU Core i9-13900K(8T) 9.94 4.78 2.08x

Contamination Results

Here we report our contamination results using https://github.com/fblgit/detect-pretrain-code-contamination/tree/winofix. We use llama-2-7b as reference model. When the result is greater than 0.85, it is highly likely that the dataset has been trained.

Model TruthfulQA Winogrande ARC MMLU Hellaswag GSM8K
Bamboo 0.22 0.02 0.08 0.24 0.02 0.99
Mistral-v0.1 0.45 0.03 0.08 0.24 0.04 0.91

Note that GSM8K often scores very highly on this toolkit, according to https://huggingface.co/spaces/Yeyito/llm_contamination_detector

Limitations

  • Bamboo, having undergone training with only 200B tokens, may still exhibit performance gaps in certain tasks.
  • The Bamboo model has only been trained on English-language datasets, hence its capabilities in other languages are still lacking.
  • The model may produce unexpected outputs due to its small size and probabilistic generation paradigm.

Future Work

  • Mixtral 8x7B level sparse activation model
  • Better base and chat-7B model

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage.

Citation

Please kindly cite using the following BibTeX:

@misc{bamboo,
    title={Bamboo: Harmonizing Sparsity and Performance in Large Language Models}, 
    author={Yixin Song, Haotong Xie, Zeyu Mi, Li Ma, Haibo Chen},
    year={2024}
}

bamboo's People

Contributors

hodlen avatar yixinsong-e avatar zeyumi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

minisal diogodsa

bamboo's Issues

Non-Relu LLM inference sparse activation speedup

@YixinSong-e @ZeyuMi Very excellent work! By the way, have you compared the inference speedup of Non-Relu LLM, such as original mistral-7B/llama-7B? If non-Relu LLM is also sparse in a degree(Figure 1 in ReLU2 Wins paper), maybe we can directly accelerate inference by sparse activation without extra expensive relufication step.
So I'm not sure if you have done some non-Relu LLM experiments? If the result is not positive, which reason could be the key factor?

  1. Non-Relu LLM is not sparse enough to get a significant speedup.
  2. It's harder to predict the activation sparsity for non-Relu LLM.
    Or some others? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.