Coder Social home page Coder Social logo

gottingen / search-legend Goto Github PK

View Code? Open in Web Editor NEW
234.0 19.0 26.0 174.31 MB

docs for search system and ai infra

License: Apache License 2.0

Jupyter Notebook 3.81% Starlark 3.60% Batchfile 0.02% Python 21.44% C++ 60.77% C 0.60% Shell 0.29% CMake 0.09% MLIR 5.24% Cython 0.01% LLVM 0.01% Smarty 0.05% HTML 2.39% SourcePawn 0.01% Go 1.06% Java 0.54% Perl 0.01% Vim Snippet 0.01% Pawn 0.01% Objective-C 0.09%
ai performance search-engine deep-learning neural-network python inference-engine tensorflow tensorflow2

search-legend's Introduction

search legend

中文版|English|Release Notes|Contributors

端到端搜索

曾经,企业搭建一套搜索系统,出于技术难度、资料完善成都,社区支持,人力资源等各方面的权衡,大多数会选择ES作为 基础,在ES的基础上进一步的开发迭代。不可否认,ES是一款优秀的产品,在不少业务上发挥着重要的作用。然而,ES有着它 固有的限制。工程方面,ES的数据量级能够支撑到10亿级别,数据量再大了,运维、用户体验会收到很大影响。从业务角度来说,比如分词用jieba,粗排用bm25等,在现在nn成熟的时代,搜索的质量已经达不到业务的要求。在ES上继续发力开发插件 的方式也得不偿失。总体来说,用ES搭建初版引擎,可以达到60分的效果。要将分数提高到80分,从ROI的角度来说,并不比 从头搭建一套资源低。

本项目的目的是建立一套端到端的搜索框架,可以支撑百亿量级,可以达到80分的框架系统化。

产品的形态是开箱即用,初始,内部集成nn排序,推理,向量召回,特征工程等模块,用户可以在统一的nn接口的基础上, 通过迭代算法,低成本让搜索的质量得到提高。

IR技术是系统的另一个亮点(这项技术受AI编译器的启发),进一步解耦工程与算法,即用户通过python的程序编写业务逻辑,框架会同步自动生成c++代码,以c++程序的方式运行。提升程序的性能,进一步提升产品迭代周期(省去评审、c++开发,c++测试等中间环节)。本项目的主要贡献是高度抽象搜索各个过程的组件,工程侧侧重高性能组件算子开发,算法侧将注意力集中到业务效果,提升沟通质量和效率。

作者

相关子项目

flowchart TB
    A[turbo]-->B[libtext]
    A[turbo]-->BB[titan-redis]
    A[turbo]-->BC[titan-ann]
    BB-->C[lambda]
    BC-->C[lambda]
    A[turbo]-->E[rigel]
    E[rigel]-->D[hercules]
    D[hercules]-->F[hvm]
    C[lambda]-->F[hvm]
    B[libtext]-->F[hvm]
    E[rigel]-->G[melon-rpc]
    G[melon-rpc]-->F[hvm]
    
  • turbo c++ foundation library.

  • libtext chinese segment and normalize library.

  • titan-redis disk storage using redis interface.

  • titan-ann ann search engine base on graph.

  • lambda host local search engine for mix term and vector search.

  • rigel ai and rpc runtime library, to manage memory and fiber or thread schedule.

  • hercules python Aot framework for integrated components interface rpc and ai backend like tesorflow or pytorch

  • hvm (not start now) framework gateway, let us write python and generate c++ code and run in llvm.

  • tensorflow build tensorflow c++ 编译与安装脚本项目

the goals

flowchart LR
    A[python code]-->B[hvm.compile]
    B-->C[so library and graph]
    G[ci/cd]-->C[so library and graph]
    D[c++ runtime]-.->G
    D[c++ runtime]-->E[c++ app]
    F[service mess]-->E[c++ app]

it will reduce c++ communication and development phrase during business development。 may reduce business development days a lot and provide bug free code.

how ever, in some Modularity is not good system, you need to do a lot work int test and online, it is a very frustrating thing that, if many people develop together, so fuzz..

flowchart LR
  A[python algorithm]-->B[c++ review]-->C[c++ develop]-->D[test and online]
  AA[python algorithm]-->E[generate c++]-->D[test and online]

inspire by the ai aot. design it write a few python code like below

import hvm

q:str='刘德华'
def do_search(q:tr)->List[Any]
  qp = hvm.QU.parse(q)
  qt = hvm.query.boolen_query
  sq = hvm.query.term_query(qp)
  sq.must('title')
  qt.add_sub_query(sq)
  searcher  = hvm.search
  searcher.set_relavece('bm25')
  searcher.set_ranker('dw_fm')
  searcher.set_l1_sort('default')
  searcher.set_l2_sort('ctr_pri_sorter')
  result = searcher.search(engine)
  return result
hvm.compile(do_search, "./search_demo")

at c++ side,

hvm::Session session;
bool rs = session.load("./search_demo")
if(!rs){
    return false;
}      
string query;
session.run(query);

so it can do things in c++ code,Enjoy the joy brought by c++. and hidden the difficult code via c++ and hidden complex service governance logic behind.

nlp roadmap

Table of contents

search-legend's People

Contributors

lilothar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.