Coder Social home page Coder Social logo

goabiaryan / llama_index Goto Github PK

View Code? Open in Web Editor NEW

This project forked from run-llama/llama_index

0.0 0.0 0.0 31.35 MB

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.

Home Page: https://gpt-index.readthedocs.io/en/latest/

License: MIT License

Shell 0.12% Python 98.10% Makefile 0.03% Jupyter Notebook 1.76%

llama_index's Introduction

๐Ÿ—‚๏ธ LlamaIndex ๐Ÿฆ™

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.

PyPI:

Documentation: https://gpt-index.readthedocs.io/.

Twitter: https://twitter.com/gpt_index.

Discord: https://discord.gg/dGcwcsnxhU.

Ecosystem

๐Ÿš€ Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

Context

  • LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
  • How do we best augment LLMs with our own private data?
  • One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the input prompt. That way, we take advantage of the LLM's reasoning capabilities to generate a response.

To perform LLM's data augmentation in a performant, efficient, and cheap manner, we need to solve two components:

  • Data Ingestion
  • Data Indexing

Proposed Solution

That's where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion:

  • Offers data connectors to your existing data sources and data formats (API's, PDF's, docs, SQL, etc.)
  • Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning:
    • Storing context in an easy-to-access format for prompt insertion.
    • Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
    • Dealing with text splitting.
  • Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented output.
  • Offers you a comprehensive toolset trading off cost and performance.

๐Ÿ’ก Contributing

Interested in contributing? See our Contribution Guide for more details.

๐Ÿ“„ Documentation

Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

๐Ÿ’ป Example Usage

pip install llama-index

Examples are in the examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

To query:

query_engine = index.as_query_engine()
query_engine.query("<question_text>?")

By default, data is stored in-memory. To persist to disk (under ./storage):

index.storage_context.persist()

To reload from disk:

from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir='./storage')
# load index
index = load_index_from_storage(storage_context)

๐Ÿ”ง Dependencies

The main third-party package requirements are tiktoken, openai, and langchain.

All requirements should be contained within the setup.py file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.

๐Ÿ“– Citation

Reference to cite if you use LlamaIndex in a paper:

@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.