txtai: AI-powered search engine
txtai builds an AI-powered index over sections of text. txtai supports building text indices to perform similarity searches and create extractive question-answering based systems.
NeuML uses txtai and/or the concepts behind it to power all of our Natural Language Processing (NLP) applications. Example applications:
- cord19q - COVID-19 literature analysis
- paperai - AI-powered literature discovery and review engine for medical/scientific papers
- neuspo - a fact-driven, real-time sports event and news site
- codequestion - Ask coding questions directly from the terminal
txtai is built on the following stack:
- sentence-transformers
- transformers
- faiss
- Python 3.6+
Installation
The easiest way to install is via pip and PyPI
pip install txtai
You can also install txtai directly from GitHub. Using a Python Virtual Environment is recommended.
pip install git+https://github.com/neuml/txtai
Python 3.6+ is supported
Notes for Windows
This project has dependencies that require compiling native code. Linux enviroments usually work without an issue. Windows requires the following extra steps.
-
Install C++ Build Tools - https://visualstudio.microsoft.com/visual-cpp-build-tools/
-
If PyTorch errors are encountered, run the following command before installing paperai. See pytorch.org for more information.
pip install torch===1.6.0 torchvision===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
Examples
The examples directory has a series of examples and notebooks giving an overview of txtai. See the list of notebooks below.
Notebooks
Notebook | Description | |
---|---|---|
Introducing txtai | Overview of the functionality provided by txtai | |
Extractive QA with txtai | Extractive question-answering with txtai | |
Build an Embeddings index from a data source | Embeddings index from a data source backed by word embeddings | |
Extractive QA with Elasticsearch | Extractive question-answering with Elasticsearch |