Comments (3)
hey there - yes I definitely plan to allow the use of the new codegen model and possibly others like Santacoder.
The new models have a slightly different model architecture to the original codegen models which means that there will need to be some modification to get them to work - the best solution may be to directly implement the new archtecture in GGML rather than converting via GPT-j - I need to do some exploration to work out the best path.
from turbopilot.
After some exploration, I have completed the following conversion script, and can directly convert the original codegen2 model to ggml, There is no need to convert to GPTJ first.
The codegen2-1B successful operation, and the output of codegen2-7B seems to be abnormal.
import sys
import struct
import json
import torch
import numpy as np
from accelerate import init_empty_weights
from transformers import AutoModelForCausalLM, AutoTokenizer
def bytes_to_unicode():
bs = (
list(range(ord("!"), ord("~") + 1)) + list(range(ord("¡"), ord("¬") + 1)) + list(range(ord("®"), ord("ÿ") + 1))
)
cs = bs[:]
n = 0
for b in range(2**8):
if b not in bs:
bs.append(b)
cs.append(2**8 + n)
n += 1
cs = [chr(n) for n in cs]
return dict(zip(bs, cs))
if len(sys.argv) < 2:
print("Usage: codegen2-to-ggml.py codegen2-1B(dir)\n")
sys.exit(1)
# output in the same directory as the model
dir_model = sys.argv[1]
with open(dir_model + "/vocab.json", "r", encoding="utf8") as f:
encoder = json.load(f)
with open(dir_model + "/added_tokens.json", "r") as f:
encoder_added = json.load(f)
with open(dir_model + "/config.json", "r") as f:
hparams = json.load(f)
ftype = 0
fname_out = sys.argv[1] + "/ggml-model-f32.bin"
model = AutoModelForCausalLM.from_pretrained(dir_model, trust_remote_code=True, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(dir_model)
list_vars = model.state_dict()
fout = open(fname_out, "wb")
fout.write(struct.pack("i", 0x67676d6c)) # magic: ggml in hex
fout.write(struct.pack("i", hparams['vocab_size']))
fout.write(struct.pack("i", hparams["n_positions"]))
fout.write(struct.pack("i", hparams["n_embd"]))
fout.write(struct.pack("i", hparams["n_head"]))
fout.write(struct.pack("i", hparams["n_layer"]))
fout.write(struct.pack("i", hparams["rotary_dim"]))
fout.write(struct.pack("i", ftype))
byte_encoder = bytes_to_unicode()
byte_decoder = {v:k for k, v in byte_encoder.items()}
fout.write(struct.pack("i", hparams['vocab_size']))
for word,idx in sorted(tokenizer.vocab.items(), key=lambda x: x[1]) :
text = bytearray([byte_decoder[c] for c in word if c in byte_decoder])
if(len(text)) < 1:
text = bytearray(word.encode('utf8'))
fout.write(struct.pack("i", len(text)))
fout.write(text)
empty_vocab = hparams['vocab_size'] - tokenizer.vocab_size
for i in range( hparams['vocab_size'] - len(encoder) - len(encoder_added)):
text = "<|endoftext|>".encode("utf8")
fout.write(struct.pack("i", len(text)))
fout.write(text)
new_list_vars = {}
for name in list_vars.keys():
if name.endswith("attn.qkv_proj.weight"):
data = list_vars[name]
n_dims = len(data.shape)
assert n_dims == 2
n_embd = hparams["n_embd"]
q_unshaped, v_unshaped, k_unshaped = torch.split(data.reshape(8, -1, n_embd), n_embd//8, dim=1)
q_shaped, v_shaped, k_shaped = (q_unshaped.reshape(-1, n_embd), v_unshaped.reshape(-1, n_embd), k_unshaped.reshape(-1, n_embd))
new_list_vars[name.replace(".qkv_proj.", ".q_proj.")] = q_shaped
new_list_vars[name.replace(".qkv_proj.", ".v_proj.")] = v_shaped
new_list_vars[name.replace(".qkv_proj.", ".k_proj.")] = k_shaped
else:
new_list_vars[name] = list_vars[name]
list_vars = new_list_vars
for name in list_vars.keys():
data = list_vars[name].squeeze().numpy()
if name.endswith("attn.masked_bias") or name.endswith(".attn.bias") or name.endswith("attn.causal_mask"):
continue
n_dims = len(data.shape);
ftype_cur = 0;
if data.dtype != np.float32:
print(" Converting to float32")
data = data.astype(np.float32)
ftype_cur = 0
str = name.encode('utf-8')
fout.write(struct.pack("iii", n_dims, len(str), ftype_cur))
for i in range(n_dims):
fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
fout.write(str);
data.tofile(fout)
fout.close()
print("Done. Output file: " + fname_out)
print("")
from turbopilot.
hey there @czkoko - this is great progress thank you for contributing. I'd love to add this script to the repo which will allow us to support the 1b codegen2 model - along with merging recent ggml libraries that support starcoder this will allow Turbopilot to support some really interesting new use cases and additional libraries
from turbopilot.
Related Issues (20)
- Local build failing to run (NO AVX2) HOT 3
- use WebSocket for Real-time reception
- terminated by signal SIGABRT (Abort)
- Fauxpilot client does not communicate with TurboPilot 🚀server
- How to use it with cuda in v0.0.5 HOT 2
- Any chance for cuda 12 support? HOT 1
- Is there any roadmap to add support for replit models? HOT 2
- Add support for StableCode
- Support Huggingface Code plugin
- "symbol not found" error in docker image running under ARM64 HOT 3
- How to build for Mac OS Apple Silicon? HOT 2
- ggml_new_tensor_impl: not enough space in the context's memory pool HOT 7
- docker turbopilot:v0.1.0-cuda12 not using gpu HOT 3
- OOM - Segmentation fault (core dumped) HOT 3
- CAN NOT RUN TURBOPILOT USING DOCKER HOT 5
- Only huggingface client works, and crashes server HOT 3
- Support for Code Llama HOT 1
- Docker Image Fail to load model
- Failed to load model wizardcoder (Illegal Instruction)
- [Feature request] Add refact model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turbopilot.