Coder Social home page Coder Social logo

llm-corpus's Introduction

手把手从零教你搭建大模型知识库

本项目从零开始实现了大模型外接知识库的流程:

  1. 中文数据集处理
  2. 词向量模型训练
  3. 文档向量化
  4. 向量数据库存储知识库
  5. 本地ChatGLM2-6B大模型部署
  6. 简单的知识库应用

项目结构

  • corpus:存放知识库文档的文件夹
  • data:词向量模型训练相关的数据(模型文件较大,请自行下载模型)
  • doc:词向量模型训练的源码、文档
  • llm_server:简单的知识库应用
  • vector_db:将corpus里的文档存入qdrant向量数据库
  • config.json:项目的一些配置
    • OPENAI_API_KEY:openai的api key
    • EMBEDDING_MODEL_TYPE:文本向量化的模型openai或word2vec
    • CHAT_MODEL_TYPE:对话大模型openai或chatglm
    • CHATGLM_PORT:ChatGLM本地部署的端口
    • **PATH:一些路径,以项目根目录为起点
    • COLLECTION_NAME:向量数据库Collection的名称

运行

生成持久化知识库

cd vector_db
pip install -r requirements.txt
python main.py

main.py会自动创建名为COLLECTION_NAME的向量数据库,并将corpus文件夹中的文档向量化存储到数据库中

运行应用

cd llm_server
pip install -r requirements.txt
python main.py

运行本地部署的ChatGLM2-6B

参考ChatGLM2-6B官方文档

教程

llm-corpus's People

Contributors

oceanpresentchao avatar

Stargazers

DJun avatar LIghtJUNction avatar  avatar  avatar tracydzf avatar  avatar yichun.lin avatar Terry Wang avatar youyi avatar Aria F avatar 一心 avatar  avatar xiaoyue avatar Jiacheng Li avatar ant avatar  avatar  avatar emudao avatar  avatar 张帅 avatar  avatar  avatar Naptmn avatar peter avatar 叶伟伟 avatar geekreal avatar HocRiser avatar 涛 avatar

Watchers

 avatar geekreal avatar  avatar

Forkers

wxyhappy0201 djun

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.