Coder Social home page Coder Social logo

fastgpt's People

Contributors

certik avatar scivision avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastgpt's Issues

Implement model.dat format versioning

  • Add a version into model.dat
  • Increment the version with every change in the format (in both the Python writer and Fortran reader)
  • In the reader, check the version, if it does not agree, refuse to load it.

This will allow to keep changing the format without breaking things. If an older version of fastGPT is used on a newer file, it will say "Incompatible model.dat version, use the create_model.py script to generate a compatible version."

add topic

I suggesting adding the topic gpt-2 in the About section

Upload model.dat online

And change the instructions / workflow to simply download it. That way we eliminate the need to use Python at all, and things become more robust. One would only use the Python create_model.py script if one wants to re-generate the model.dat.

We should upload all 4 versions, so probably something like:

  • model-fastGPT-124M.dat
  • model-fastGPT-1558M.dat

Not sure if we should put the version number into the filename itself as well or not.

We should only do this after #30 is fixed, to prevent downloading incompatible model.dat (from older or newer versions of fastGPT).

Add support for OpenBLAS on both Linux and macOS

Currently we only support the default Fortran matmul and macOS Accelerate framework. Add support for OpenBLAS as well, using the same technique (implement a Fortran module for it and allow to select it in cmake).

Implement other sampling methods

Currently only the "greedy" sampling is implemented (the token with the highest probability is selected).

Implement other sampling methods, some options are:

  • top-p
  • top-k
  • temperature (here is an example how it could be done: jaymody/picoGPT#19)
  • categorical sampling

Implement kv cache

Here is some information what kv cache is: https://kipp.ly/blog/transformer-inference-arithmetic/#kv-cache

Roughly speaking, when new tokens are added at the end of the input and new token is generated, a lot of the computation could be reused from the previous iteration. We need to cache the results and reused them.

Here is a reference implementation in picoGPT: jaymody/picoGPT#7 (and the accompanying blog post https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/) that should be straightforward to adapt.

An example where the current fast_tanh() gives different results

Using the 1558M model and the following input:

python encode_input.py \
        "Alan Turing theorized that computers would one day become very powerful, but even he could not imagine" \
        -n 100

I get the following output with tanh() (equal to PyTorch):

Output tokens:
   703   484   561   307   973    13   198   198     1    40   836   470   892   314  1053  1683  1775   257  3644   326   714   466  1997   326   257  1692   852   714   466   553   339   531    13   198   198  1537   783    11  5176   284   262   670   286   257  1074   286  4837   379   262  2059   286  3442    11 14727    11  9061   389   852   973   284   466  1243   326   547  1752  1807  5340    13   198   198   464  1074   468  4166   257  3644   326   460   711   262   983   286  1514    11   257  3716  4811   983   326  9018  3867  5207  1088   257  3096    13   198   198   464  3644
Decoded output as text:
 how they would be used.

"I don't think I've ever seen a computer that could do anything that a human being could do," he said.

But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.

The team has developed a computer that can play the game of Go, a complex strategy game that involves moving pieces around a board.

The computer

But with fast_tanh() I get the following output:

Output tokens:
   703   484   561   307   973    13   198   198     1    40   836   470   892   314  1053  1683  1775   257  3644   326   714   466  1997   326   257  1692   852   714   466   553   339   531    13   198   198  1537   783    11  5176   284   262   670   286   257  1074   286  4837   379   262  2059   286  3442    11 14727    11  9061   389   852   973   284   466  1243   326   547  1752  1807  5340    13   198   198   464  1074   468  4166   257  3644   326   460   711   262   983   286  1514    11   257  3716  4811   983   326  9018  3867   257  3704  1088   257  3096   284  8006   517  7674
Decoded output as text:
 how they would be used.

"I don't think I've ever seen a computer that could do anything that a human being could do," he said.

But now, thanks to the work of a team of researchers at the University of California, Berkeley, computers are being used to do things that were once thought impossible.

The team has developed a computer that can play the game of Go, a complex strategy game that involves moving a piece around a board to capture more territory

The last 9 tokens are different.

So the exact numerical shape of the tanh function makes a difference. At the very least from reproducibility perspective we have to maintain both versions. I don't know how to judge the quality, if the quality is the same, just slightly different probabilities that in the "greedy" mode give different results, but statistically equivalent.

Implement batching

Batching means that multiple input streams are being computed at the same time, which can vectorize better and thus speedup the inference per token.

Does not build with lfortran-0.33.1

cmake --build build
[  6%] Building Fortran object CMakeFiles/fastgpt.dir/tokenizer.f90.o
[ 12%] Building Fortran object CMakeFiles/fastgpt.dir/gpt2.f90.o
[ 18%] Building Fortran object CMakeFiles/fastgpt.dir/omp_dummy.f90.o
[ 25%] Building Fortran object CMakeFiles/fastgpt.dir/driver.f90.o
semantic error: Namelists not implemented yet
  --> /home/christoph/computing/fastGPT/driver.f90:20:1
   |
20 | namelist / input_fastGPT / n_tokens_to_generate
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 


Note: Please report unclear or confusing messages as bugs at
https://github.com/lfortran/lfortran/issues.
gmake[2]: *** [CMakeFiles/fastgpt.dir/build.make:101: CMakeFiles/fastgpt.dir/driver.f90.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:93: CMakeFiles/fastgpt.dir/all] Error 2
gmake: *** [Makefile:101: all] Error 2

Segfault with ifort

With the ifort compiler I get a crash after a while.

Build information
$ FC=ifort cmake .. -DFASTGPT_BLAS=Fortran -DCMAKE_BUILD_TYPE=Debug
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The Fortran compiler identification is Intel 20.2.5.20211109
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort supports Fortran 90
-- Checking whether /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort supports Fortran 90 -- yes


Configuration results
---------------------
Fortran compiler: /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
Build type: Debug
Fortran compiler flags: -warn all -check all,noarg_temp_created -traceback -O1 -g
Installation prefix: /usr/local
FASTGPT_BLAS: Fortran
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ivan/lrz/fastGPT/build

Output:

(fastgpt) ivan@maxwell:~/lrz/fastGPT$ gprofng collect app ./build/gpt2 
Creating experiment directory test.10.er (Process ID: 260289) ...
Loading the model...
    done.
Model parameters:
n_vocab = 50257
n_ctx   =  1024
n_embd  =   768
n_layer =    12
n_head  =    12
 
Input parameters:
n_seq                =   6
n_tokens_to_generate =  20
 
Input tokens:
   464   995   286  9439 14448   284
Decoded input as text:
The world of tomorrow belongs to
Running model...
           1         262
           2         661
           3         286
           4        9439
           5          13
           6         198
           7         198
           8         464
           9         995
          10         286
          11        9439
          12       14448
          13         284
          14         262
          15         661
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
gpt2               000000000042F1FA  Unknown               Unknown  Unknown
libpthread-2.31.s  00007F2B36905420  Unknown               Unknown  Unknown
gpt2               00000000004268D3  linalg_mp_matmul_          18  linalg_f.f90
gpt2               000000000041EB03  gpt2_mod_mp_gpt2_         165  gpt2.f90
gpt2               00000000004208D6  gpt2_mod_mp_gener         194  gpt2.f90
gpt2               000000000040A56F  MAIN__                     85  main.f90
gpt2               0000000000403862  Unknown               Unknown  Unknown
libc-2.31.so       00007F2B3671D083  __libc_start_main     Unknown  Unknown
gpt2               000000000040376E  Unknown               Unknown  Unknown

I'm guessing this is related to the stack vs heap array issue with the Intel compiler and matmul (https://fortran-lang.discourse.group/t/testing-the-performance-of-matmul-under-default-compiler-settings/4096/27).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.