Coder Social home page Coder Social logo

llama2.go's Introduction

llama2.go

Go Report Card codecov Go Reference OpenSSF Scorecard

This is a native Go inference of LLaMA-2, as of 2023-08-19 state-of-the-art open source large language model from Meta. It is ported from github.com/karpathy/llama2.c@bd18228 on 2023-08-19. Additional features may be added.

How to run?

  1. get tokenizer.bin from llama2.c
  2. get weights wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
  3. go install github.com/nikolaydubina/llama2.go@latest
  4. llama2.go -checkpoint=stories110M.bin -prompt="good morning said sun to trees"
$ llama2.go -checkpoint=stories110M.bin -prompt="good morning said sun to trees"
2023/07/29 09:30:22 config: llama2.Config{Dim:768, HiddenDim:2048, NumLayers:12, NumHeads:12, NumKVHeads:12, VocabSize:32000, SeqLen:1024}
<s>
good morning said sun to trees: "Let's organize an operation!"
The trees clapped their branches and asked "What will we do?"
Badger smiled and replied "We will build a treehouse together!"
The trees got blocks of wood and started to build. Badger put nails in the tiny pieces of wood, while the trees put the blocks together to make a
 solid base. 
When they finished their treehouse, Goodger and the trees sat inside. Badger said, "Look how fancy we made it!"
The trees smiled and nodded. They said, "It's very fancy! Thank you for helping us organize this operation." 
Then they lived happily in their fancy treehouse together!
<s>
Once upon a time, there was a boy named Timmy. Timmy was very hungry and wanted to eat his meal. He asked his mom, "What are we having for dinner
?" His mom said, "We are having chicken and rice." Timmy said, "Yum! I love chicken and rice."
While they were eating, Timmy's dad came in and said, "Hey Timmy, do you want to watch a movie after
2023/07/29 09:30:58 achieved tok/s: 28.619646

Performance

system model llama2.c llama.cpp llama2.go1 llama2.go2
Apple M1 Max 10CPU 64GB stories110M 101.84 tok/s 10.47 tok/s 39.28 tok/s
Apple M1 Max 10CPU 64GB llama2_7b 1.83 tok/s 20.36 tok/s 0.87 tok/s
Apple M1 Max 10CPU 64GB llama2_13b (segfault) 11.71 tok/s 0.38 tok/s

Optimizations

  • transformer steps parallelism
  • loop unrolling
  • in-matrix parallelism
  • (todo) SIMD
  • (todo) quantization

All optimizations are Fuzz-tested against basic algorithm, which is itself tested. To disable optimizations update llama2/transformer.go import to package without optimizations and rebuild.

Related Work and References

Footnotes

  1. No linear algebra optimizations

  2. All linear algebra optimizations

llama2.go's People

Contributors

nikolaydubina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

llama2.go's Issues

[Bug] fix steps check

llama2.go/main.go

Lines 60 to 62 in 5d67280

if steps <= 0 || steps > config.SeqLen {
steps = config.SeqLen
}

steps should be less than config.SeqLen

	// right now we cannot run for more than config.SeqLen steps
	if steps <= 0 || steps >= config.SeqLen {
		steps = config.SeqLen-1
	}

it will cause KeyCache out of range in function Transformer.

				for t := 0; t <= pos; t++ {
					// get the key vector for this head and at this timestamp
					k := s.KeyCache[(loff + t*dim + h*headSize):(loff + (t+1)*dim + h*headSize)]

test case:
-steps=1023 is ok, otherwise -steps=1024 will panic the program.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.