Coder Social home page Coder Social logo

inferflow / inferflow Goto Github PK

View Code? Open in Web Editor NEW
215.0 7.0 21.0 1.93 MB

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

License: MIT License

Cuda 17.91% C++ 75.46% C 1.31% CMake 1.20% Shell 1.92% Batchfile 2.16% BitBake 0.05%
llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm

inferflow's Introduction

Inferflow

Version Stars Issues

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Further details can be found in our technical report.

Quick Links

  1. Getting started (on Windows | on Linux, Mac, and Windows Subsystem for Linux (WSL))
  2. Serving 34B or 40B models on a single 24GB-VRAM GPU (e.g., RTX 3090 and 4090)

Milestones

  • 2024-2-18: Added support for mixture-of-experts (MoE) models.
  • 2024-1-17: Version 0.1.0 was formally released.

Main Features

  1. Extensible and highly configurable: A typical way of using Inferflow to serve a new model is editing a model specification file, but not adding/editing source codes. We implement in Inferflow a modular framework of atomic building-blocks and technologies, making it compositionally generalizable to new models. A new model can be served by Inferflow if the atomic building-blocks and technologies in this model have been "known" (to Inferflow).
  2. 3.5-bit quantization: Inferflow implements 2-bit, 3-bit, 3.5-bit, 4-bit, 5-bit, 6-bit and 8-bit quantization. Among the quantization schemes, 3.5-bit quantization is a new one introduced by Inferflow.
  3. Hybrid model partition for multi-GPU inference: Inferflow supports multi-GPU inference with three model partitioning strategies to choose from: partition-by-layer (pipeline parallelism), partition-by-tensor (tensor parallelism), and hybrid partitioning (hybrid parallelism). Hybrid partitioning is seldom supported by other inference engines.
  4. Wide file format support (and safely loading pickle data): Inferflow supports loading models of multiple file formats directly, without reliance on an external converter. Supported formats include pickle, safetensors, llama.cpp gguf, etc. It is known that there are security issues to read pickle files using Python codes. By implementing a simplified pickle parser in C++, Inferflow supports safely loading models from pickle data.
  5. Wide network type support: Supporting three types transformer models: decoder-only models, encoder-only models, and encoder-decoder models.
  6. GPU/CPU hybrid inference: Supporting GPU-only, CPU-only, and GPU/CPU hybrid inference.

Below is a comparison between Inferflow and some other inference engines:

Inference Engine New Model Support Supported File Formats Network Structures Quantization Bits Hybrid Parallelism for Multi-GPU Inference Programming Languages
Huggingface Transformers Adding/editing source codes pickle (unsafe), safetensors decoder-only, encoder-decoder, encoder-only 4b, 8b Python
vLLM Adding/editing source codes pickle (unsafe), safetensors decoder-only 4b, 8b Python
TensorRT-LLM Adding/editing source codes decoder-only, encoder-decoder, encoder-only 4b, 8b C++, Python
DeepSpeed-MII Adding/editing source codes pickle (unsafe), safetensors decoder-only - Python
llama.cpp Adding/editing source codes gguf decoder-only 2b, 3b, 4b, 5b, 6b, 8b C/C++
llama2.c Adding/editing source codes llama2.c decoder-only                        - C
LMDeploy Adding/editing source codes pickle (unsafe), TurboMind decoder-only 4b, 8b C++, Python
Inferflow Editing configuration files pickle (safe), safetensors, gguf, llama2.c decoder-only, encoder-decoder, encoder-only 2b, 3b, 3.5b, 4b, 5b, 6b, 8b      ✔      C++

Support Matrix

Supported Model File Formats

  • Pickle (Inferflow reduces the security issue of most other inference engines in loading pickle-format files).
  • Safetensors
  • llama.cpp gguf
  • llama2.c

Supported Technologies, Modules, and Options

  • Supported modules and technologies related to model definition:

    • Normalization functions: STD, RMS
    • Activation functions: RELU, GELU, SILU
    • Position embeddings: ALIBI, RoPE, Sinusoidal
    • Grouped-query attention
    • Parallel attention
  • Supported technologies and options related to serving:

    • Linear quantization of weights and KV cache elements: 2-bit, 3b, 3.5b, 4b, 5b, 6b, 8b
    • The option of moving part of all of the KV cache from VRAM to regular RAM
    • The option of placing the input embedding tensor(s) to regular RAM
    • Model partitioning strategies for multi-GPU inference: partition-by-layer, partition-by-tensor, hybrid partitioning
    • Dynamic batching
    • Decoding strategies: Greedy, top-k, top-p, FSD, typical, mirostat...

Supported Transformer Models

  • Decoder-only: Inferflow supports many types of decoder-only transformer models.
  • Encoder-decoder: Some types of encoder-decoder models are supported.
  • Encoder-only: Some types of encoder-only models are supported.

Models with Predefined Specification Files

Users can serve a model with Inferflow by editing a model specification file. We have built predefined specification files for some popular or representative models. Below is a list of such models.

  • Aquila (aquila_chat2_34b)
  • Baichuan (baichuan2_7b_chat, baichuan2_13b_chat)
  • BERT (bert-base-multilingual-cased)
  • Bloom (bloomz_3b)
  • ChatGLM (chatglm2_6b)
  • Deepseek (deepseek_moe_16b_base)
  • Facebook m2m100 (facebook_m2m100_418m)
  • Falcon (falcon_7b_instruct, falcon_40b_instruct)
  • FuseLLM (fusellm_7b)
  • Gemma (gemma_2b_it)
  • Internlm (internlm-chat-20b)
  • LLAMA2 (llama2_7b, llama2_7b_chat, llama2_13b_chat)
  • MiniCPM (minicpm_2b_dpo_bf16)
  • Mistral (mistral_7b_instruct)
  • Mixtral (mixtral_8x7b_instruct_v0.1)
  • Open LLAMA (open_llama_3b)
  • OPT (opt_350m, opt_13b, opt_iml_max_30b)
  • Orion (orion_14b_chat)
  • Phi-2 (phi_2)
  • Qwen (qwen1.5_7b_chat)
  • XVERSE (xverse_13b_chat)
  • YI (yi_6b, yi_34b_chat)

Getting Started

Windows users: Please refer to docs/getting_started.win.md for the instructions about building and running the Inferflow tools and service on Windows.

The following instructions are for Linux, Mac, and WSL (Windows Subsystem for Linux).

Get the Code

git clone https://github.com/inferflow/inferflow
cd inferflow

Build

  • Build the GPU version (that supports GPU/CPU hybrid inference):

    mkdir build/gpu
    cd build/gpu
    cmake ../.. -DUSE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES=75
    make install -j 8
  • Build the CPU-only version:

    mkdir build/cpu
    cd build/cpu
    cmake ../.. -DUSE_CUDA=0
    make install -j 8

Upon a successful build, executables are generated and copied to bin/release/

Run the LLM Inferencing Tool (bin/llm_inference)

  • Example-1: Load a tiny model and perform inference

    • Step-1: Download the model

      #> cd {inferflow-root-dir}/data/models/llama2.c/
      #> bash download.sh
      

      Instead of running the above batch script, you can also manually download the model files and copy them to the above folder. The source URL and file names can be found from download.sh.

    • Step-2: Run the llm_inference tool:

      #> cd {inferflow-root-dir}/bin/
      #> release/llm_inference llm_inference.tiny.ini
      

      Please note that it is okay for llm_inference and llm_inference.tiny.ini not being in the same folder (llm_inference.tiny.ini is in bin/ and llm_inference is in bin/release/).

  • Example-2: Run the llm_inference tool to load a larger model for inference

    • Step-1: Edit configuration file bin/inferflow_service.ini to choose a model.

      In the "transformer_engine" section of bin/inferflow_service.ini, there are multiple lines starting with "models = " or ";models = ". The lines starting with the ";" character are comments. To choose a model for inference, please uncomment the line corresponding to this model, and comment the lines of other models. By default, the phi-2 model is selected. Please refer to docs/model_serving_config.md for more information about editing the configuration of inferflow_service.

    • Step-2: Download the selected model

      #> cd {inferflow-root-dir}/data/models/{model-name}/
      #> bash download.sh
      
    • Step-3: Edit configuration file bin/llm_inference.ini to choose or edit a query.

      In the configuration file, queries are organized into query lists. A query list can contain one or multiple queries. Different query lists are for different purposes. For example, query_list.decoder_only is for testing decoder-only models. Its detailed information can be configured in the query_list.decoder_only section. The starting line of this section is "query_count = 1", which means only one query is included in this query list. Among the following lines with key query1, only one line is uncommented and therefore effective, whereas other lines (i.e., the lines starting with a ";" character) are commented. You can choose a query for testing by uncommenting this query and commenting all the other queries. You can, of course, add new queries or change the content of an existing query.

    • Step-4: Run the tool:

      #> cd {inferflow-root-dir}/bin/
      #> release/llm_inference
      

Run the Inferflow Service (bin/inferflow_service)

  • Step-1: Edit the service configuration file (bin/inferflow_service.ini)

  • Step-2: Start the service:

    #> cd bin
    #> release/inferflow_service
    

Test the Inferflow service

Run an HTTP client, to interact with the Inferflow service via the HTTP protocol to get inference results.

  • Option-1. Run the Inferflow client tool: inferflow_client

    • Step-1: Edit the configuration file (bin/inferflow_client.ini) to set the service address, query text, and options.

    • Step-2: Run the client tool to get inference results.

    #> cd bin
    #> release/inferflow_client
    
  • Option-2 The CURL command

    You can also use the CURL command to send a HTTP POST request to the Inferflow service and get inference results. Below is an example:

    curl -X POST -d '{"text": "Write an article about the weather of Seattle.", "res_prefix": "", "decoding_alg": "sample.top_p", "random_seed": 1, "temperature": 0.7, "is_streaming_mode": false}' localhost:8080
    
  • Option-3. Use GUI REST client (e.g., the Chrome extension of Tabbed Postman).

    • URL: http://localhost:8080 (If you access the service from a different machine, please replace "localhost" with the service IP)

    • HTTP method: POST

    • Example body text: {"text": "Write an article about the weather of Seattle.", "res_prefix": "", "decoding_alg": "sample.top_p", "random_seed": 1, "temperature": 0.7, "is_streaming_mode": 0}

Compatibility with OpenAI's Chat Completions API

The Inferflow service also provides support for OpenAI's Chat Completions API. The API can be tested in one of the following ways.

  • Option-1: The OpenAI Python API Library

    Below are the sample codes. Please first install the openai Python module (pip install openai) before running the following codes.

    import openai
    
    openai.base_url = "http://localhost:8080"
    openai.api_key = "sk-no-key-required"
    
    is_streaming = True
    
    response = openai.chat.completions.create(
        model="default",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Write an article about the weather of Seattle."}
        ],
        stream = is_streaming
    )
    
    if is_streaming:
        for chunk in response:
            print(chunk.choices[0].delta.content or "", end = "")
    else:
        print(response.choices[0].message.content)
  • Option-2: The CURL command

    curl -X post -d '{"model": "gpt-3.5-turbo","messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write an article about the weather of Seattle."}], "stream": true}' http://localhost:8080/chat/completions

Reference

If you are interested in our work, please kindly cite:

@misc{shi2024inferflow,
    title={Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models},
    author={Shuming Shi and Enbo Zhao and Deng Cai and Leyang Cui and Xinting Huang and Huayang Li},
    year={2024},
    eprint={2401.08294},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Acknowledgements

Inferflow is inspired by the awesome projects of llama.cpp and llama2.c. The CPU inference part of Inferflow is based on the ggml library. The FP16 data type in the CPU-only version of Inferflow is from the Half-precision floating-point library. We express our sincere gratitude to the maintainers and implementers of these source codes and tools.

inferflow's People

Contributors

chenglin avatar ghrua avatar inferflow avatar jcyk avatar nealcly avatar shumingshi avatar whiplashzeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

inferflow's Issues

Support of Whisper

As the Whisper model is encoder-decoder structure, do you have any example of using inferflow?

Bachuan2 prompt template

Model:Baichuan2 7B Chat
Input:tell me a 50 word story
Inferflow Service Log:

decoding_alg: , strategy_id: 4, temperature: 0.00
Encoder input text:
Decoder input text:

{<reserved_106>}
tell me a 50 word story{<reserved_107>}

query_id: 1, output_len: 165, is_end: true

<reserved_106> and <reserved_107> should not be surrounded with {},but in file inferlow_service.ini they are surrounded with {}. Should I just delete {} from inferlow_service.ini or it is expected remove it in the c++ code ?

baichuan2 prompt template is: \n\n<reserved_106>\n{query}<reserved_107>\n

Is support for the OpenAI REST API in your roadmap?

We are using OpenAI REST API with vLLM. Will inferflow support OpenAI REST api later? So we can migrate to inferflow easier if inferflow is more performance than vLLM.

By the way, have yo do some performance comparison with vLLM or other LLM Inference Engine?

Connecting error: (timeout: 100) Failed to process the request (error-code: 2)

image
The bash file in step 1 requires wget, rather than installing wget, I download the files manually
I assume Step 2 is a type, as its asking to execute a configuration file (that doesn't exist)

image
As I've already downloaded the files from the example above, I'll choose to use that model (llama2.c)
I open the configuration file and I'm not sure what to edit?
I assume that I need to add a new entry here:
image
However, the naming convention is unclear.
Lets look at the entries that exist and see if we can match them to models in the model folder, here:
image
We can see the facebook_m2m200 exists (not yet downloaded) however the names do not match, i.e it is called facebook_m2m100_418m
Also bert is in the list, however its called bert_base_multilingual_cased
Therefore I conclude that this is not the entry that I need to edit.

Can you please clarity exactly what needs to be edited in this file?

image
image
image
I'm not sure what needs to be edited here either.

Can you clarify this too please?

Here's my attempt:
I'm assuming that I need to uncomment the models I downloaded earlier
image
I've tried with just one and with both

I also assume that I need set devices=1? to use the GPU ( I only built the GPU solution)
image

image
image

image
It doesn't look like I need to change anything in this file, so I keep it as it is

image
image

Log files:
image

client log
`
; --------------------------------------------
#1; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:28:4; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 17:28:4; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 17:28:4; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 17:28:4; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 17:28:4; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 17:28:4; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 10.06, 10.12 (Peak)

; --------------------------------------------
#1; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:29:33; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 17:29:33; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 17:29:33; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 17:29:33; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 17:29:33; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 17:29:33; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.75, 9.82 (Peak)

; --------------------------------------------
#1; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:30:53; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 17:30:53; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 17:30:53; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 17:30:53; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 17:30:53; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 17:30:53; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.77, 9.82 (Peak)

; --------------------------------------------
#1; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:1:10; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 18:1:10; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 18:1:10; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 18:1:10; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 18:1:10; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 18:1:10; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.74, 9.80 (Peak)

; --------------------------------------------
#1; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:17:37; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 18:17:37; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 18:17:37; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 18:17:37; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 18:17:37; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 18:17:37; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 10.33, 10.39 (Peak)

; --------------------------------------------
#1; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:17:55; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 18:17:55; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 18:17:55; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 18:17:55; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 18:17:55; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 18:17:55; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.72, 9.79 (Peak)

; --------------------------------------------
#1; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:36:36; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 18:36:36; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 18:36:36; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 18:36:36; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 18:36:36; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 18:36:36; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.75, 9.81 (Peak)

; --------------------------------------------
#1; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_client; Version: 0.1
#4; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:47:2; 0x8400(info_key); Run#60@inferflow_client.cc
decoding_strategy: sample.std
#10; 2024-1-19 18:47:2; 0x8400(info_key); Run#61@inferflow_client.cc
query_random_seed: 1
#11; 2024-1-19 18:47:2; 0x8400(info_key); Run#62@inferflow_client.cc
temperature: 0.70
#12; 2024-1-19 18:47:2; 0x200(warning); sslib::HttpClient::ExecuteInner#316@http_client.cc
Connecting error (timeout: 100)
#13; 2024-1-19 18:47:2; 0x300(error); Run#85@inferflow_client.cc
Failed to process the request (error-code: 2)
#14; 2024-1-19 18:47:2; 0x8400(info_key); sslib::AppEnv::LogProcessMemoryUsage#445@app_environment.cc
Memory usage (MB): 9.75, 9.81 (Peak)
`

service log
`
; --------------------------------------------
#1; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:27:3; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:27:3; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 17:27:3; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:29:17; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:29:17; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 17:29:17; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:30:41; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:30:41; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 17:30:41; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:59:42; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 17:59:42; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 17:59:42; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:17:49; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:17:49; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 18:17:49; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:36:33; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:36:33; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 18:36:33; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...

; --------------------------------------------
#1; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: inferflow_service; Version: 0.1.0
#4; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:43:43; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
#9; 2024-1-19 18:43:43; 0x8400(info_key); Run#13@inferflow_service_main.cc
Initializing the Inferflow service...
#10; 2024-1-19 18:43:43; 0x8400(info_key); inferflow::transformer::InferenceEngine::LoadConfig#1450@inference_engine.cc
Loading model specifications...
`

inference log
`
; --------------------------------------------
#1; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:19:14; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:29:1; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:30:33; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 17:52:30; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:17:43; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:32:58; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========

; --------------------------------------------
#1; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#109@app_environment.cc
Application environment is set successfullly.
#2; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#110@app_environment.cc
app_dir: C:\Users\micha\source\repos\inferflow-main\bin\x64_Release
#3; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#111@app_environment.cc
app_name: llm_inference; Version: 0.1.0
#4; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#116@app_environment.cc
config_dir: C:\Users\micha\source\repos\inferflow-main\bin
#5; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#117@app_environment.cc
data_root_dir: C:\Users\micha\source\repos\inferflow-main\data/inferflow/
#6; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#118@app_environment.cc
is_daemon: false
#7; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#124@app_environment.cc
Configuration = release; Platform = x64
#8; 2024-1-19 18:36:28; 0x8400(info_key); sslib::AppEnv::Init#127@app_environment.cc
========== ========== ========== ========== ========== ==========
`

To summarise, I had to join the dots as the instructions were unclear, everything looked to work fine, until I ran the client and got an error. Logs are included

As a sidenote, The error seems to imply that it failed after waiting only 100ms? That could probably be increased.

Does opt_13b model support tensor parallelism vias inferflow?

The settings are as followed:

devices = 0&1&2&3;4&5&6&7
decoder_cpu_layer_count = 0
cpu_threads = 8

max_concurrent_queries = 6

return_output_tensors = true

;debug options
is_study_mode = false
show_tensors = false

When I run the opt_13b by inferflow, the error is as followed:

Configuration = release; Platform = x64
========== ========== ========== ========== ========== ==========
Loading model specifications...
Loading model opt_13b...
vocab_size: 50272, embd_dims: 5120, decoder layers: 40, decoder heads: 40, decoder kv heads: 40
qkv_format 1 is not compatible with tensor parallelism
Failed to load the model
Failed to initialize the inference engine
Memory usage (MB): 203.21, 203.21 (Peak)
Press the enter key to quit...

Get broken output text for long input prompt

Get the following output when input tokens is 700+ which can get whole output in vllm.

{
    "ret_code": "succ",
    "time_cost": 1.03,
    "text": "[{\""
}

inferflow servcie log for this request:

query_id: 13, output_len: 729, is_end: true

Which param should I modify ?

Fails to build on Windows 11 with Visual Studio 2022 Community

Step 1) Clone the repo
image

Step 2) Open interflow.sln with Visual Studio 2022
image

Step 3) Visual Studio asks to retarget, accept the defaults and click OK
image

Step 4) Switch to release configuration
image

Step 5) Build the solution
image

Step 6) Notice the errors, build fails
image

Step 7) Notice the build folder is created, some items get built, the most of it has failed
image

image

The projects wont load, if i try to reload them:
image

image

I get the following error

The imported project "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.2.props" was not found. Confirm that the expression in the Import declaration "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\\BuildCustomizations\CUDA 12.2.props" is correct, and that the file exists on disk. C:\Users\michael\Documents\GitHub\inferflow\build\vs_projects\inferflow\inferflow_common.vcxproj

If i navigate to that folder, the file it's looking for does not exist
image

wget error while loading shared libraries cannot open shared object file no such file or directory

Steps to reproduce:

  1. Clone git repository
  2. Open visual studio and build all
  3. Ignore the thousands of warnings and download Windows SDK
  4. Delete and re-clone the repository
  5. Rebuild the solution
  6. Ignore the thousands of warnings because Windows SDK is already installed
  7. Right click solution and retarget
  8. Rebuild the solution
  9. Ignore the warnings
  10. Read the instructions
  11. CD into the llama.c directory using Windows Explorer
  12. Right click and open command prompt here
  13. type in "bash download.sh" and press enter
  14. Hit your head against the wall because wget is not installed (wtf is wget?)
  15. Search the internet for wget
  16. Find wget on a pre-world world two era website
  17. Download it
  18. Oh wait its a linux source file without an .exe happy days
  19. Download wget.exe from pirate infested website
  20. Start again -> wget.exe: error while loading shared libraries: ?: cannot open shared object file: No such file or directory

Did I miss anything?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.