Coder Social home page Coder Social logo

rerope's Introduction

Rectified Rotary Position Embeddings (ReRoPE)

Using ReRoPE, we can more effectively extend the context length of LLM without the need for fine-tuning.

Blog

Idea

Results

Calculated the loss on llama2-13b with samples_15k.jsonl:

Method loss
RoPE-4k(original llama2-13b) 1.4967
RoPE-8k(original llama2-13b) 8.8615
NTK-RoPE-4k(not dynamic) 1.6081
NTK-RoPE-8k(not dynamic) 1.5417
NTK-RoPE-16k(not dynamic) 1.5163
ReRoPE-w1024-4k 1.4996
ReRoPE-w1024-8k 1.4267
ReRoPE-w1024-16k 1.4001

ReRoPE's performance at training length (4k) has hardly decreased, and it possesses the ideal property of "longer context, lower loss".

Usage

Dependency: transformers 4.31.0

Run python test.py to test chatting or run python eval_loss.py to calculate loss with llama2.

From here and here, we can see what modifications ReRoPE/Leaky ReRoPE has made compared to the original llama implementation.

Other

Triton Implementation of ReRoPE: https://gist.github.com/chu-tianxiang/4307937fd94b49c75b61a6967716bae9

Cite

@misc{rerope2023,
  title={Rectified Rotary Position Embeddings},
  author={Jianlin Su},
  year={2023},
  howpublished={\url{https://github.com/bojone/rerope}},
}

Communication

QQ discussion group: 67729435, for WeChat group, please add the robot WeChat ID spaces_ac_cn

rerope's People

Contributors

bojone avatar tpoisonooo avatar

Stargazers

Ziyi Wu avatar Christian Simon avatar Huiqiang Jiang avatar MaTR1x avatar Lesi Chen avatar baymin avatar  avatar yucc-leon avatar Rei avatar Benhao Huang avatar Shuai Xie avatar Xuesi Wang avatar  avatar  avatar 小赵要努力 avatar hcwei avatar Jeongwhan Choi avatar Mingyuan Luo avatar Rand Xie avatar Y.L. avatar Jerry Yin avatar  avatar  avatar wj882018 avatar Darral avatar  avatar  avatar Wing Lian avatar Hongbo Zhao avatar song avatar Sherlock avatar Nikos Karampatziakis avatar EatenBagpipe avatar 电线杆 avatar Sophia avatar  avatar ___TEMPEST___ avatar  avatar  avatar XLiang avatar YUAN avatar  avatar Jun Zhan avatar ShengguangZhou avatar Yuanzhe Chen avatar Lei Wu avatar Chunliang Zhao avatar  avatar  avatar Wenqiao Zhu avatar SeeFun avatar unipus-ai avatar ZYD'ing avatar Mashihan avatar Xiong Jun Wu(熊君武) avatar  avatar Xiangyu Pan avatar Xubing Ye avatar  avatar GasonBai avatar Yuzhe Wang avatar  avatar Shida Wang avatar Liliang Ren avatar Dmitry Nikitko avatar Xiong Lin avatar Misha Brukman avatar  avatar xxw avatar Limour avatar  avatar MingFei avatar Zhao Zhongyang avatar Bruno Pio avatar  avatar Ziqian Zhong avatar berton avatar eipi10 avatar 南栖 avatar Junxi Yin avatar Jiawei avatar GUANGYAN avatar Jonas Oppenlaender avatar 徐梓杨 avatar  avatar  avatar  avatar  avatar zola avatar  avatar Bioinformatics Code avatar Zhenyu (Allen) Zhang avatar Dinghao Zhou avatar suc16 avatar hux avatar Sean Jensen-Grey avatar Yiyun avatar felix-wang avatar Coding.... avatar Jiayu Liu avatar

Watchers

Jonathan Fly avatar Vishal Goklani avatar latyas avatar Dongfang Li avatar Rishikesh (ऋषिकेश) avatar  avatar Feng Chen avatar  avatar Wang Peng avatar typoverflow avatar Yotam avatar

rerope's Issues

problem with rerope_patch

请问rerope的实验结果是基于这个仓库的pytorch代码吗?我本地无法直接正确运行

Blogs in English

Thank you very much for sharing your awesome work!

As the blogs mentioned in README are in Chinese, I am working on translating them into English. I guarantee you I am not using any AI translations. I believe people working on expanding context length will love these blogs and draw inspirations from them.

I have finished some parts, and for people who can't read Chinese, please check

Generating same token

I was running the code using your Rerope implementation with vicuna-7b for Code completion task but each time it is producing the same sequence of tokens like nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody nobody'. I am using transformer 4.41 can you please inform me what could be the issue behind the generation. Dataset that I am using here is https://huggingface.co/datasets/microsoft/LCC_python. I was actually implementing this to reproduce the baseline results of Hirope..

<q_len=1> question

rerope/rerope_patch.py

Lines 68 to 74 in 3710a35

if q_len == 1:
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
position_ids = (position_ids[:, -1] - position_ids).clip(max=window)
_, key_states = apply_rotary_pos_emb(None, key_states, cos, -sin, position_ids)
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

In line 71, can you explain why sin is negated? And also why does it have nothing to with window size?

Monkey patch for original LLaMA2 code.

作者您好,感谢您的分享!
目前我的工作基于meta发布的原始llama2的代码和权重上,不知道本仓库是否有更新基于原始llama2仓库的rerope的patch的实现方式的计划呢?

Dataset of ReROPE eval

Hi I was going through a paper on the topic of length extrapolation on code models namely "HiRoPE: Length Extrapolation for Code Models" (https://arxiv.org/pdf/2403.19115v1). In this paper authors have mentioned about the ReROPE-eval dataset. Can you please help me to get the ReRoPE-eval dataset because I have not found any dataset in this repository.

cos and sin dimension

cos = torch.cat([cos[:, :, :window], cos2[:, :, window + offset:]], axis=2)
sin = torch.cat([sin[:, :, :window], sin2[:, :, window + offset:]], axis=2)

cos and sin tensors are two dimensional, but here it is indexed as 3-d, is this a problem?

ntk_rope_mixed_init 中old_init是否可以简化,省略inv_freq、_set_cos_sin_cache()步骤

作者您好~
在ntk_rope_mixed_init的实现中,首先计算了 old_init(self, dim, max_position_embeddings, base, device),但在我理解中old_init除了定义下面的值外,
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base

计算inv_freq,self.register_buffer("inv_freq", inv_freq, persistent=False),self._set_cos_sin_cache()这几步都会在ntk_rope_mixed_init接下来的计算中被替换掉,因此是否可以将old_init简化为仅下式呢?这样可以节省很多的显存
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base

运行 test.py 显存爆了

请问苏神的GPU配置是什么? 我两个A100 80G 直接运行python test.py 显存爆了, 也没找出原因

测了一下千问

image
前几个问题可以,后面就坍塌了,ntk表现也差不多。可能是模型的问题·。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.