Coder Social home page Coder Social logo

llmbench's Introduction

llmbench

A library for validating and benchmarking LLMs inference.

Run ScaleLLM Benchmark

python3 scalellm_run_benchmark.py --input_file /data/dataset/F_alpaca_group_10_2.json --model_dir=/data/llama-2-7b-hf --batch_size=16
  • --input_file input json file
  • --model_dir model directory
  • --batch_size running batchsize
  • --data_format v1 or v2, indicate different input json format

Run vllm Benchmark

python3 vllm_run_benchmark.py --input_file /data/dataset/Chatbot_group_10_2.json --model_dir=/data/llama-2-7b-hf --batch_size=16
  • --input_file input json file
  • --model_dir model directory
  • --batch_size running batchsize
  • --data_format v1 or v2, indicate different input json format

Run tensorrt_llm Benchmark

1. Download TensorRT-LLM

git clone https://github.com/NVIDIA/TensorRT-LLM.git

2. Huggingface Model Convert to TensorRT CKPT

python TensorRT-LLM/examples/qwen/convert_checkpoint.py --model_dir /data/qwen-7b --output_dir /data/qwen-7b-ckpt --dtype float16
  • --workers parallel number (tensor parallel number)
  • --model_dir huggingface model directory
  • --dtype type
  • --output_dir output checkpoint directory

3. Build TensorRT Engine

trtllm-build --checkpoint_dir /data/qwen-7b-ckpt --gemm_plugin float16 --use_gemm_plugin float16 --use_gpt_attention_plugin float16  --max_batch_size 256 --output_dir  /data/qwen-7b-engine
  • --max_batch_size batch_size
  • --max_input_len input length
  • --max_output_len output length
  • --output_dir output directory
  • --checkpoint_dir ckpt directory
  • --workers parallel number (tensor parallel number)

4. Run TensorRT-LLM on single GPU

python3 tensorrtllm_run_benchmark.py  --max_output_len=100  --tokenizer_dir /data/llama-2-7b-hf --engine_dir /data/llama-2-7b-engine --input_file /data/dataset/Chatbot_group_10_2.json --batch_size 16

5. Run TensorRT-LLM on two GPUs

mpirun -n 2 python run.py --max_output_len=100 --every_batch_cost_print True --tokenizer_dir /data/tensorrtllm_test/opt-13b/ --engine_dir /data/tensorrtllm_test/opt-13b-trtllm-build/ --input_file /data/opt-13b-test/Chatbot_group_10.json --batch_size 8

llmbench's People

Contributors

liutongxuan avatar guocuimi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.