pointnetwork / point-alpaca Goto Github PK

Python 100.00%

point-alpaca's Introduction

point-alpaca

What is this?

This is released weights recreated from Stanford Alpaca, an experiment in fine-tuning LLaMA on a synthetic instruction dataset.

This is not LoRA, this is a full fine-tune for 3 epochs on 8x A100 80 GB, loss ≈2 ➔ ≈0.5.

Can I try this somewhere?

Yes! Announcement thread to our frontend where you can try the 7B: https://twitter.com/PointNetwork/status/1637178814210908160

Try it here: https://alpaca.point.space

What are hardware requirements to run it locally?

It takes 16 GB of VRAM unquantized, 8 GB of VRAM when 8-bit quantized (11 GB of normal RAM to load it).

It's confirmed that it can run on a single RTX 3090 unquantized. To try 8-bit mode, set load_in_8bit=True in chat.py

How to distill the weights

Put LLaMA weights into original/ folder, such that 7B version would be at original/7B
Download point-alpaca diffs into encrypted/ folder:

wget -P encrypted/ -i filelist.txt

Run the following command to decrypt:

for f in "encrypted"/*; do if [ -f "$f" ]; then python3 decrypt.py "$f" "original/7B/consolidated.00.pth" "result/"; fi; done

Windows users can use the equivalent powershell command:

Get-ChildItem "encrypted" | ForEach-Object {
    if($_.Attributes -eq 'Archive') {
        python3 decrypt.py $_.FullName "original/7B/consolidated.00.pth" "result/"
    }
}

You will have finetuned weights in the result/ folder.

Now that you have them, you can delete the files in encrypted/ folder.

How to chat with the model

Other people will probably build better UIs, but for now, try running python3 chat.py

But before that, install requirements via pip3 install -r requirements.txt (We really recommend installing it in a separate environment, for example, via conda)

Questions? Suggestions?

Find us in our Telegram chat: https://t.me/pointnetworkchat

Why are weights "encrypted"?

We are not allowed to publish weights for LLaMA, of course, even finetuned, but there is no problem publishing the difference, a patch that we suggest to apply to the files. The encryption is a simple XOR between files (not very secure - not recommended for other applications!), ensuring that only the people that have access to the original weights (from completely legal sources, of course) can transform them into finetuned weights.

What are the checksums so I can check if something is wrong?

$ md5sum encrypted/*
4b8622230b59b3f3bcad791c8c1bae51  encrypted/added_tokens.json.75e3ca5df2973756aa612cb17246ef6020a68ff8d94671508987d373642f7a36.enc
876376085d79041818bb7a41bced7819  encrypted/config.json.caf9cac32580e31af8254f66c5a070741d70b15a651721748189180325b7d5a8.enc
44b1feec4c0d1b7c87da24b81c8b8b9e  encrypted/generation_config.json.c5c8961ed243834883fb4e45e8850d3873d6100fde97817f59d275a90eba269d.enc
d127aabb6ad5375bfa97c6ac529c166d  encrypted/pytorch_model-00001-of-00003.bin.90d2ab95a32aeb9362814d8b86db2af5454baab8ea3aa8230c271d6962abb9db.enc
e4b12501e99cf6a30a244af20f5c20ec  encrypted/pytorch_model-00002-of-00003.bin.f3c10a4f5c8beafc6667d34557b64ba479e4dde6ef10672287857b329b7e3229.enc
d212294c06feeb0f14672b68417dbc9e  encrypted/pytorch_model-00003-of-00003.bin.72bf4c96aa6b0c7b56b0336791960da9c75de324ea1131ea4bfc20fde41115c8.enc
e813854dede95a03e5f5b459c7fb32b2  encrypted/pytorch_model.bin.index.json.07ca8edea996b6c3274395fdb2b6c9108f2ffdd610ae55e35c126c21a9d535b1.enc
62503bbf4e91f2b50bf9834757d555d3  encrypted/special_tokens_map.json.4ad09c72922c015ba04f09eabebe38fb34ecb721ca712922c62038eaf2d0bc61.enc
39ec1b33fbf9a0934a8ae0f9a24c7163  encrypted/tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc
2c34a03919b6b2b299ad6f77713d0ba0  encrypted/tokenizer_config.json.a5f5efb2240276709a923b1404e08d93cc896fd1bd31fbe173e1e2789ea210ef.enc
560ecf526666cbd485b81f0f16bb9972  encrypted/trainer_state.json.43964ae247e74f4055fe1cf99a7a16efc3114402a1cd918b3cd9e2ebf2858ca9.enc
fe8b25ba7c8dd66d57ce1d3d60f13abd  encrypted/training_args.bin.02f8c3ba14e3c48c05f76880975d7385c878b0e5a0863e352c82f331150d2bd4.enc

$ md5sum original/7B/consolidated.00.pth
6efc8dab194ab59e49cd24be5574d85e  original/7B/consolidated.00.pth

$ md5sum result/*
880c59f7618832454595e9820960c360  result/added_tokens.json
d39ed682be60de38e12c5d1974c45620  result/config.json
5300908d1f82b0bc7a4bc79ea00dad66  result/generation_config.json
5d17f8837f9f15538acd65b7d37add2c  result/pytorch_model-00001-of-00003.bin
834b0748527482d60236bc1ec0c71750  result/pytorch_model-00002-of-00003.bin
03dda8d1057b06632fecf399020353b4  result/pytorch_model-00003-of-00003.bin
82559775d42e04199b5a8be8df974b36  result/pytorch_model.bin.index.json
40df8792c753f0d3f5786829efdd2954  result/special_tokens_map.json
eeec4125e9c7560836b4873b6f8e3025  result/tokenizer.model
f2da7d9c67a3b7d2e60a17c540055b85  result/tokenizer_config.json
883795093c1f18baa9b111880b800bf1  result/trainer_state.json
f07e553d22ebe37908bc996953f1bb11  result/training_args.bin

What about larger models?

13B is coming for sure, larger versions - maybe. Consider supporting us if you want it done faster. :)

point-alpaca's People

Contributors

Stargazers

Watchers

point-alpaca's Issues

can the nice team of pointnetwork share some production details ?

Hi,

I think I just got banned from the telegram chat for a hosting related question regarding alpaca. Although it says:
Questions? Feedback? Chat with us in our Telegram community!

Sorry if I misunderstood the purpose of that chat so let me ask my question here again:

Can someone please share some insights on how to host the alpaca the way you did it ? I noticed that it is snappy and I would like to understand how you do it. Stack, optimizations, hardware ? Load balancing, dynamic batching ?

Thanks for any recommendations

Running the model on alpaca.cpp

I've got the decrypted model, however I am unable to run chat.py due to insufficient video memory. How would I go about running the model on alpaca.cpp?
Also, a question on size: given that the original 7B model is a single 13.5GB file, how come the decryption process produces 3 files adding up to 27GB? Thanks!

Retrain for 13B+

Would it be possible for you guys to do this?

Question about fine-tuning LLaMa

I was wondering if we could fine-tune LLaMa with our own training data and then apply this to transform it into Alpaca and it would work, or would it be better to fine-tune Alpaca directly? Is it possible at all?

Error: Checksums do not match. The file may be corrupted

I get Error: Checksums do not match. The file may be corrupted when attempting to decrypt the following files

tokenizer_config.json.a5f5efb2240276709a923b1404e08d93cc896fd1bd31fbe173e1e2789ea210ef.enc

and

special_tokens_map.json.4ad09c72922c015ba04f09eabebe38fb34ecb721ca712922c62038eaf2d0bc61.enc

However, I get the correct checksums for the two (encrypted) files above, as well as for consolidated.00.pth. For the other decrypted files, the checksums are also correct.

However, the decrypted files tokenizer_config.json and special_tokens_map.json are empty.

Do you know what might be causing the issue? Thank you in advance.

Any plans to...?

https://github.com/nomic-ai/gpt4all Any plans to train llama on this dataset? That would be great.

Question about random output

First of all, thank you for your work, but when I use it, the same input will encounter different generation results, how to fix this generation?

As shown in the figure, this is the result of several new runs, but the results are completely different.

Corrupted Model Weights

Hey,

I am trying to update the 7B model weights. For all weigths, except one, it works fine.

The corrupted one is: pytorch_model-00003-of-00003.bin.72bf4c96aa6b0c7b56b0336791960da9c75de324ea1131ea4bfc20fde41115c8.enc

Any idea why this particular one is not working?

Greetings

Error: Checksums do not match. The file may be corrupted.

I saw the closed issue about the same, but I just pulled the repo and used the scripts. I think if a brand new installation returns this problem, it might worth investigating.

Decrypting file encrypted/config.json.caf9cac32580e31af8254f66c5a070741d70b15a651721748189180325b7d5a8.enc with 5 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/generation_config.json.c5c8961ed243834883fb4e45e8850d3873d6100fde97817f59d275a90eba269d.enc with 5 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/pytorch_model-00001-of-00003.bin.90d2ab95a32aeb9362814d8b86db2af5454baab8ea3aa8230c271d6962abb9db.enc with 5 workers
Writing final chunks...
Error: Checksums do not match. The file may be corrupted.

load_model("./result")

There is a mistake on line 42 of the chat.py file.
It should be load_model("./result")

Can you share md5 of the files in encrypted folder?

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 while running on an RTX 3060 12GB, using 8-bit.

After loading the 8bit model I am facing the following issue:

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00,  [28/1000$
Human: asd

/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py:1201: UserWarning: You have modi
fied the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be
 removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/m
ain_classes/text_generation)
  warnings.warn(
Traceback (most recent call last):
  File "/home/sadmin/point-alpaca/chat.py", line 102, in <module>
    go()
  File "/home/sadmin/point-alpaca/chat.py", line 72, in go
    generated_ids = generator(
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contex
t
    return func(*args, **kwargs)
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/home/sadmin/miniconda3/envs/pa/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

What I tried so far:

quantization_config = BitsAndBytesConfig(
    llm_int8_threshold=1.0,
)

as a variable, then

quantization_config=quantization_config to model = transformers.LLaMAForCausalLM.from_pretrained([...]).cuda{}
Also tried to just passllm_int8_threshold=1.0to the loader - both ways it loads the model, but at generation I have another error:

    return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!

Hardware: RTX 3060 12GB, Ryzen 5700X, 24GB RAM

Decryption on several files with matching md5 fail

Model files are fine but a few of the .json files aren't working.

Decrypting file encrypted/added_tokens.json.75e3ca5df2973756aa612cb17246ef6020a68ff8d94671508987d373642f7a36.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/config.json.caf9cac32580e31af8254f66c5a070741d70b15a651721748189180325b7d5a8.enc with 31 workers
Writing final chunks...
Error: Checksums do not match. The file may be corrupted.
Decrypting file encrypted/generation_config.json.c5c8961ed243834883fb4e45e8850d3873d6100fde97817f59d275a90eba269d.enc with 31 workers
Writing final chunks...
Error: Checksums do not match. The file may be corrupted.
Decrypting file encrypted/pytorch_model-00001-of-00003.bin.90d2ab95a32aeb9362814d8b86db2af5454baab8ea3aa8230c271d6962abb9db.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/pytorch_model-00002-of-00003.bin.f3c10a4f5c8beafc6667d34557b64ba479e4dde6ef10672287857b329b7e3229.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/pytorch_model-00003-of-00003.bin.72bf4c96aa6b0c7b56b0336791960da9c75de324ea1131ea4bfc20fde41115c8.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/pytorch_model.bin.index.json.07ca8edea996b6c3274395fdb2b6c9108f2ffdd610ae55e35c126c21a9d535b1.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/special_tokens_map.json.4ad09c72922c015ba04f09eabebe38fb34ecb721ca712922c62038eaf2d0bc61.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/tokenizer_config.json.a5f5efb2240276709a923b1404e08d93cc896fd1bd31fbe173e1e2789ea210ef.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/trainer_state.json.43964ae247e74f4055fe1cf99a7a16efc3114402a1cd918b3cd9e2ebf2858ca9.enc with 31 workers
Writing final chunks...
Decryption completed.
Decrypting file encrypted/training_args.bin.02f8c3ba14e3c48c05f76880975d7385c878b0e5a0863e352c82f331150d2bd4.enc with 31 workers
Writing final chunks...
Decryption completed.

Latest Commit Gives Incorrect Decryption

The decrypt.py file in the latest commit doesn't give the correct decryption according to MD5 checksum. However, commit 9ee0219, which doesn't utilize multiple CPU cores, give the correct decryption.

Question about Alpaca maximum sequence length

LLaMA seems to have a maximum sequence length of 2048 as stated and written in its source code. However I see that's the token config as a maximum sequence length of 512 here. Does Alpaca have a smaller maximum sequence length than LLaMA? Is that possible?

encounter exception running chat.py: PytorchStreamReader failed reading zip archive: failed finding central directory

Decryption works fine with no issue.

When running py chat.py, encounter following error:

Loading ./result...
gpu_count 1
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards:  33%|██████████████████▎                                    | 1/3 [00:07<00:14,  7.28s/it]
Traceback (most recent call last):
  File "/home/dsu/ai/xf/src/transformers/modeling_utils.py", line 415, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "/home/dsu/p3/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/home/dsu/p3/lib/python3.10/site-packages/torch/serialization.py", line 283, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dsu/ai/xf/src/transformers/modeling_utils.py", line 419, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dsu/ai/palpaca/mychat.py", line 41, in <module>
    load_model("./result")
  File "/home/dsu/ai/palpaca/mychat.py", line 27, in load_model
    model = transformers.LlamaForCausalLM.from_pretrained(
  File "/home/dsu/ai/xf/src/transformers/modeling_utils.py", line 2709, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/dsu/ai/xf/src/transformers/modeling_utils.py", line 3023, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/home/dsu/ai/xf/src/transformers/modeling_utils.py", line 431, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for './result/pytorch_model-00002-of-00003.bin' at './result/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I'm running with latest transformer 4.28.0.dev0 (pulled the code today), which has LlamaTokenizer, hence is the message I got earlier regarding tokenizer class warning. Tried the specific transfomers in requirements.txt (git+https://github.com/zphang/transformers.git@68d640f7c368bcaaaecfc678f11908ebbd3d6176), got the same error.

torch version: 2.0.0.

Anyone encounters similar issue and suggestion to resolve the issue? Thanks.

How to support you for 13B

Hi there,

How do we support you to train 13B?