Coder Social home page Coder Social logo

bipe's Introduction

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation πŸ”₯, ICML 2024

News

πŸ”₯May 2 2024: BiPE is accepted to ICML 2024!

πŸ”₯Apr 11 2024: Release a 1.6B BiPE-RoPE model pre-trained on 300B tokens, demonstrating consistent extrapolation ability comparable to that of the 151 million-parameter version.

πŸ”₯Apr 4 2024: Initial commits. More codes (YaRN finetuning, SCROLLs finetuning) are coming soon.

Overview

This repository contains the source code for

↓Overview of BiPE

Setup Environment

conda create -n bipe python=3.9
conda activate bipe
pip3 install -r requirements.txt

Data for Pretraining

We use the Pile for pretraining with all copyrighted data removed.

cd BiPE;
DATA_DIR=./data # the directory to save the data
python3 download_data.py --dataset-cache-dir $DATA_DIR

Pretraining

The scripts under script/ covers the commands for training and perpleixity evaluation.

For training, the key modifications for BiPE are getting token ids (intra-segment) and position ids (inter-segment) by the get_bilevel_ids function. Then, the token ids are used to get absolute positional encodings (get_ape_embeddings) and the position ids are used to get relative positional encodings. For example, you can start training 151M BiPE-RoPE model with the following command:

cd BiPE
OUTPUT_DIR=./output  # path to save checkpoints and tensorboard
DATA_DIR=./data  # path to load data
CONFIG_NAME=config/bipe_rope.json
bash script/train.sh

You can change CONFIG_NAME to choose different positional encoding variants. (choose from [config/bipe_rope.json, config/bipe_alibi.json, config/rope.json, config/alibi.json)

Perplexity Evaluation

For perplexity evaluation, you can use the following command:

cd BiPE;
DATA_DIR=./data  # path to load data
MODEL=./bipe_rope # model checkpoint path
bash script/eval.sh

You can also download our pre-trained models (Note that the 1.6B model is pre-trained with a batch size of 1024):

Model HuggingFace Checkpoint πŸ€—
BiPE_RoPE-151M link
BiPE_RoPE-1.6B link
RoPE-151M link
BiPE_ALiBi-151M link
ALiBi-151M link

For example, to evaluate BiPE-RoPE-151M, you can use the following command:

git lfs install
git clone https://huggingface.co/hzy00/BiPE_RoPE-151M
DATA_DIR=./data  # path to load data
MODEL=./BiPE_RoPE-151M # model checkpoint path
bash script/eval.sh

Citations


@inproceedings{
he2024two,
title={Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation},
author={Zhenyu He and Guhao Feng and Shengjie Luo and Kai Yang and Liwei Wang and Jingjing Xu and Zhi Zhang and Hongxia Yang and Di He},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=luqH1eL4PN}
}

bipe's People

Contributors

zhenyuhe00 avatar

Stargazers

Cheng Jun-Yan avatar 爱可可-ηˆ±η”Ÿζ΄» avatar ChenxinAn avatar Jiabao Ji avatar  avatar  avatar  avatar kyle avatar  avatar Ge Wu avatar Xiang Zhao avatar Sherlock avatar Daxiong avatar Longze Chen avatar Xinle Cheng avatar  avatar Qizhi Pei avatar Shengjie Luo avatar Kaiyuan Gao avatar  avatar

Watchers

 avatar  avatar

Forkers

jingmouren

bipe's Issues

It seems that bipe_alibi is not working yet.

It seems that bipe_alibi is not working yet.

get_ape_embeddings returns a tuple, which is different from embed_tokens.

All codes since here do not work.

if self.config.rpe_type == "bipe_alibi":
    inputs_embeds = self.get_ape_embeddings(torch.stack([input_ids, token_ids], dim=-1))
else:
    inputs_embeds = self.embed_tokens(input_ids)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.