kleincode / llama2.jl Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 0.0 429 KB

Llama 2 Inference in Julia. Developed for the JuliaML course at TU Berlin.

License: MIT License

Julia 100.00%

llama2.jl's Introduction

Llama2.jl

This is a port of Andrew Karpathy's llama2.c to Julia.

Important

This is part of the JuliaML course at Technical University Berlin. Therefore, the repository will not be maintained for a long time.

Features

Read tokenizer and model weights from .bin files specified in llama2.c
Inference, generation loop and chat loop
argmax, multinomial and top-p sampling
Tokenizer for encoding text to LLM input and decoding LLM output to text
Multi-threading in transformer forward function (#37)
Compatibility tested with all of Andrew Karpathy's models

Getting started

Add the package to your local environment via Pkg by running

add https://github.com/kleincode/Llama2.jl

To get started, check out the docs.

llama2.jl's People

Contributors

Stargazers

Watchers

llama2.jl's Issues

Use in-place operations in forward function

Replace matrix multiplications with mul!(out, in1, in2)
Add a @view where applicable, maybe even an @views in front of the whole forward function? But make sure to copy token_embedding_table into x in line 144!
Replace rmsnorm, softmax, swiglu with in-place versions
Remove mutable from mutable struct RunState

Feedback before Code Review 2

Hi all,

here's some quick feedback before the second code review session next week.
I'm opening this as a single issue to not spam your repository, which already looks really good!

Documentation

Please describe in the README what the repo is about and that it is part of course work at TU Berlin
Your docstrings still contain TODOs and notes to yourselves (e.g. https://kleincode.github.io/Llama2.jl/dev/#Llama2.open_file-Tuple{String}). Try to write your documentation for potential users, not for yourselves!
Please add a second doc page with a small "Getting started" guide for the second code review session. Even just demonstrating your tokenizer is enough.

Code

It's a bit odd to keep the Tokenizer struct and Tokenizer constructor in separate places. Use a single docstring for both and document the way you expect users to use your struct.

Llama2.jl/src/tokenizer.jl

Line 10 in a4c3241

struct Tokenizer

Llama2.jl/src/tokenizer.jl

Line 48 in a4c3241

function Tokenizer(tokenizer_path::String, vocab_size::Int)
You could use DocStringExtensions.jl to generate parts of your documentation (see example here)

Most of your structs hardcode element types to be Float32. I understand that this makes sense for a direct port of the C code base. However, in general, a parametric type Sampler{T} would make sense here.

Llama2.jl/src/sampler.jl

Lines 20 to 23 in a4c3241

    
           struct Sampler 
        
               temperature::Float32 
        
               topp::Float32 
        
               rng_state::MersenneTwister

Your type annotations are a bit too strong in general:

Llama2.jl/src/math_llama.jl

Line 32 in a4c3241

function softmax(x::Vector{Float32})::Vector{Float32}
- The code looks perfectly valid for any AbstractArray, not just Vector{Float32}
- Return types generally don't need to be annotated, since they force Julia to cast outputs, which can hide bugs and slow down code
- During development, I sometimes also like to use very restrictive types to catch initial bugs, but types have to be relaxed after a while. To give you an example why: your softmax function isn't compatible with ForwardDiff.jl's forward mode AD, since that uses a custom dual number type, which you prohibit.

Tests

Your test suite looks great!

Minimum Code Example/ Small Examples of Components

Could you provide more code examples for individual components? Especially with ML libraries they get really complex.

Build tokenizer & read vocab

lines 367-409 in llama2.c

Build neural net blocks

lines 182-215 in llama2.c: rmsnorm, softmax
lines 338-345 in llama2.c: SwiGLU

lines 19-75 in llama2.c
lines 77-96, 111-140 should be implemented as external constructors

Automatic weight download

I saw that you have weight download functionalities in your test.
https://github.com/kleincode/Llama2.jl/blob/main/test/utils.jl
Would be nice to have this feature also as part of your main package. Would be much easier to get started here:
https://kleincode.github.io/Llama2.jl/dev/example/#Getting-Started

PS: Also as discussed, max_steps is not working currently...

	struct Sampler
	temperature::Float32
	topp::Float32
	rng_state::MersenneTwister