problem with rerope_patch

Rectified Rotary Position Embeddings (ReRoPE)

Using ReRoPE, we can more effectively extend the context length of LLM without the need for fine-tuning.

Blog

https://kexue.fm/archives/9706 (Chinese)
https://kexue.fm/archives/9708 (Chinese)
https://normxu.github.io/Rethinking-Rotary-Position-Embedding-2/ (English by @NormXU)
https://normxu.github.io/Rethinking-Rotary-Position-Embedding-3/ (English by @NormXU)

Idea

Results

Calculated the loss on llama2-13b with samples_15k.jsonl:

Method	loss
RoPE-4k(original llama2-13b)	1.4967
RoPE-8k(original llama2-13b)	8.8615
NTK-RoPE-4k(not dynamic)	1.6081
NTK-RoPE-8k(not dynamic)	1.5417
NTK-RoPE-16k(not dynamic)	1.5163
ReRoPE-w1024-4k	1.4996
ReRoPE-w1024-8k	1.4267
ReRoPE-w1024-16k	1.4001

ReRoPE's performance at training length (4k) has hardly decreased, and it possesses the ideal property of "longer context, lower loss".

Usage

Dependency: transformers 4.31.0

Run python test.py to test chatting or run python eval_loss.py to calculate loss with llama2.

From here and here, we can see what modifications ReRoPE/Leaky ReRoPE has made compared to the original llama implementation.

Other

Triton Implementation of ReRoPE: https://gist.github.com/chu-tianxiang/4307937fd94b49c75b61a6967716bae9

Cite

@misc{rerope2023,
  title={Rectified Rotary Position Embeddings},
  author={Jianlin Su},
  year={2023},
  howpublished={\url{https://github.com/bojone/rerope}},
}

Communication

QQ discussion group: 67729435, for WeChat group, please add the robot WeChat ID spaces_ac_cn

	if q_len == 1:
	cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
	position_ids = (position_ids[:, -1] - position_ids).clip(max=window)
	_, key_states = apply_rotary_pos_emb(None, key_states, cos, -sin, position_ids)
	key_states = repeat_kv(key_states, self.num_key_value_groups)
	value_states = repeat_kv(value_states, self.num_key_value_groups)
	attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

	cos = torch.cat([cos[:, :, :window], cos2[:, :, window + offset:]], axis=2)
	sin = torch.cat([sin[:, :, :window], sin2[:, :, window + offset:]], axis=2)

bojone / rerope Goto Github PK

rerope's Introduction

Rectified Rotary Position Embeddings (ReRoPE)

Blog

Idea

Results

Usage

Other

Cite

Communication

rerope's People

Contributors

Stargazers

Watchers

Forkers

rerope's Issues

Recommend Projects

Recommend Topics

Recommend Org