This is a port of Andrew Karpathy's llama2.c to Julia.
Important
This is part of the JuliaML course at Technical University Berlin. Therefore, the repository will not be maintained for a long time.
- Read tokenizer and model weights from
.bin
files specified in llama2.c - Inference, generation loop and chat loop
- argmax, multinomial and top-p sampling
- Tokenizer for encoding text to LLM input and decoding LLM output to text
- Multi-threading in transformer forward function (#37)
- Compatibility tested with all of Andrew Karpathy's models
Add the package to your local environment via Pkg by running
add https://github.com/kleincode/Llama2.jl
To get started, check out the docs.