certik / fastgpt Goto Github PK
View Code? Open in Web Editor NEWFast GPT-2 inference written in Fortran
License: MIT License
Fast GPT-2 inference written in Fortran
License: MIT License
Currently the attention over heads runs in serial:
Line 101 in 01eb84b
We should try to parallelize it and see if we get any speedups.
model.dat
This will allow to keep changing the format without breaking things. If an older version of fastGPT is used on a newer file, it will say "Incompatible model.dat version, use the create_model.py
script to generate a compatible version."
I suggesting adding the topic gpt-2
in the About section
And change the instructions / workflow to simply download it. That way we eliminate the need to use Python at all, and things become more robust. One would only use the Python create_model.py
script if one wants to re-generate the model.dat
.
We should upload all 4 versions, so probably something like:
model-fastGPT-124M.dat
model-fastGPT-1558M.dat
Not sure if we should put the version number into the filename itself as well or not.
We should only do this after #30 is fixed, to prevent downloading incompatible model.dat
(from older or newer versions of fastGPT).
The decoding is done across tokens, but we can wait until there is a space or punctuation, and print it.
Currently we only support the default Fortran matmul and macOS Accelerate framework. Add support for OpenBLAS as well, using the same technique (implement a Fortran module for it and allow to select it in cmake).
Currently only the "greedy" sampling is implemented (the token with the highest probability is selected).
Implement other sampling methods, some options are:
Investigate what the best way to parallelize is across nodes using MPI or coarrays.
So far we only benchmarked against PyTorch+OpenBLAS. We should also benchmark against PyTorch+Accelerate.
Here are a few ways how to do it:
conda install "libblas=*=*_accelerate"
)Here is some information what kv cache is: https://kipp.ly/blog/transformer-inference-arithmetic/#kv-cache
Roughly speaking, when new tokens are added at the end of the input and new token is generated, a lot of the computation could be reused from the previous iteration. We need to cache the results and reused them.
Here is a reference implementation in picoGPT: jaymody/picoGPT#7 (and the accompanying blog post https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/) that should be straightforward to adapt.
Currently the input tokenizer is in Python, taken from the original OpenAI's implementation: https://github.com/certik/fastGPT/blob/01eb84b015d89a567245da0445c0abb7d53a8500/encode_input.py. We should implement it in Fortran. That will eliminate the need to call the Python script before running fastGPT
.
We have to write tests that exercise each code path in the Python implementation to ensure our Fortran implementation is correct.
Using the 1558M model and the following input:
python encode_input.py \
"Alan Turing theorized that computers would one day become very powerful, but even he could not imagine" \
-n 100
I get the following output with tanh()
(equal to PyTorch
):
Output tokens:
703 484 561 307 973 13 198 198 1 40 836 470 892 314 1053 1683 1775 257 3644 326 714 466 1997 326 257 1692 852 714 466 553 339 531 13 198 198 1537 783 11 5176 284 262 670 286 257 1074 286 4837 379 262 2059 286 3442 11 14727 11 9061 389 852 973 284 466 1243 326 547 1752 1807 5340 13 198 198 464 1074 468 4166 257 3644 326 460 711 262 983 286 1514 11 257 3716 4811 983 326 9018 3867 5207 1088 257 3096 13 198 198 464 3644
Decoded output as text:
how they would be used.
"I don't think I've ever seen a computer that could do anything that a human being could do," he said.
But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.
The team has developed a computer that can play the game of Go, a complex strategy game that involves moving pieces around a board.
The computer
But with fast_tanh()
I get the following output:
Output tokens:
703 484 561 307 973 13 198 198 1 40 836 470 892 314 1053 1683 1775 257 3644 326 714 466 1997 326 257 1692 852 714 466 553 339 531 13 198 198 1537 783 11 5176 284 262 670 286 257 1074 286 4837 379 262 2059 286 3442 11 14727 11 9061 389 852 973 284 466 1243 326 547 1752 1807 5340 13 198 198 464 1074 468 4166 257 3644 326 460 711 262 983 286 1514 11 257 3716 4811 983 326 9018 3867 257 3704 1088 257 3096 284 8006 517 7674
Decoded output as text:
how they would be used.
"I don't think I've ever seen a computer that could do anything that a human being could do," he said.
But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.
The team has developed a computer that can play the game of Go, a complex strategy game that involves moving a piece around a board to capture more territory
The last 9 tokens are different.
So the exact numerical shape of the tanh
function makes a difference. At the very least from reproducibility perspective we have to maintain both versions. I don't know how to judge the quality, if the quality is the same, just slightly different probabilities that in the "greedy" mode give different results, but statistically equivalent.
Batching means that multiple input streams are being computed at the same time, which can vectorize better and thus speedup the inference per token.
cmake --build build
[ 6%] Building Fortran object CMakeFiles/fastgpt.dir/tokenizer.f90.o
[ 12%] Building Fortran object CMakeFiles/fastgpt.dir/gpt2.f90.o
[ 18%] Building Fortran object CMakeFiles/fastgpt.dir/omp_dummy.f90.o
[ 25%] Building Fortran object CMakeFiles/fastgpt.dir/driver.f90.o
semantic error: Namelists not implemented yet
--> /home/christoph/computing/fastGPT/driver.f90:20:1
|
20 | namelist / input_fastGPT / n_tokens_to_generate
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note: Please report unclear or confusing messages as bugs at
https://github.com/lfortran/lfortran/issues.
gmake[2]: *** [CMakeFiles/fastgpt.dir/build.make:101: CMakeFiles/fastgpt.dir/driver.f90.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:93: CMakeFiles/fastgpt.dir/all] Error 2
gmake: *** [Makefile:101: all] Error 2
With the ifort
compiler I get a crash after a while.
$ FC=ifort cmake .. -DFASTGPT_BLAS=Fortran -DCMAKE_BUILD_TYPE=Debug
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The Fortran compiler identification is Intel 20.2.5.20211109
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort supports Fortran 90
-- Checking whether /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort supports Fortran 90 -- yes
Configuration results
---------------------
Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
Build type: Debug
Fortran compiler flags: -warn all -check all,noarg_temp_created -traceback -O1 -g
Installation prefix: /usr/local
FASTGPT_BLAS: Fortran
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ivan/lrz/fastGPT/build
Output:
(fastgpt) ivan@maxwell:~/lrz/fastGPT$ gprofng collect app ./build/gpt2
Creating experiment directory test.10.er (Process ID: 260289) ...
Loading the model...
done.
Model parameters:
n_vocab = 50257
n_ctx = 1024
n_embd = 768
n_layer = 12
n_head = 12
Input parameters:
n_seq = 6
n_tokens_to_generate = 20
Input tokens:
464 995 286 9439 14448 284
Decoded input as text:
The world of tomorrow belongs to
Running model...
1 262
2 661
3 286
4 9439
5 13
6 198
7 198
8 464
9 995
10 286
11 9439
12 14448
13 284
14 262
15 661
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
gpt2 000000000042F1FA Unknown Unknown Unknown
libpthread-2.31.s 00007F2B36905420 Unknown Unknown Unknown
gpt2 00000000004268D3 linalg_mp_matmul_ 18 linalg_f.f90
gpt2 000000000041EB03 gpt2_mod_mp_gpt2_ 165 gpt2.f90
gpt2 00000000004208D6 gpt2_mod_mp_gener 194 gpt2.f90
gpt2 000000000040A56F MAIN__ 85 main.f90
gpt2 0000000000403862 Unknown Unknown Unknown
libc-2.31.so 00007F2B3671D083 __libc_start_main Unknown Unknown
gpt2 000000000040376E Unknown Unknown Unknown
I'm guessing this is related to the stack vs heap array issue with the Intel compiler and matmul
(https://fortran-lang.discourse.group/t/testing-the-performance-of-matmul-under-default-compiler-settings/4096/27).
Add a mode to interact with fastGPT interactively as a chat bot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.