Coder Social home page Coder Social logo

redis-developer / redis-arxiv-search Goto Github PK

View Code? Open in Web Editor NEW
135.0 6.0 22.0 57.07 MB

Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.

Home Page: https://docsearch.redisvl.com

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 1.06% Python 29.74% Shell 0.14% Jupyter Notebook 29.04% HTML 2.50% TypeScript 36.59% JavaScript 0.94%
arxiv arxiv-papers document-retrieval document-search huggingface machine-learning nlp redis vector-search openai

redis-arxiv-search's Introduction

🔎 Redis arXiv Search

This repository is the official codebase for the arxiv paper search app hosted at: https://docsearch.redisvl.com

Redis is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in the technical blog post published by our partners, Data Science Dojo.

Dataset

The arXiv papers dataset was sourced from the the following Kaggle link. arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers.

Demo

Application

This app was built as a Single Page Application (SPA) with the following components:

Some inspiration was taken from this Cookiecutter project and turned into a SPA application instead of a separate front-end server approach.

Embedding Providers

Embeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports HuggingFace, OpenAI, and Cohere embeddings out of the box.

Provider Embedding Model Required?
HuggingFace sentence-transformers/all-mpnet-base-v2 Yes
OpenAI text-embedding-ada-002 Yes
Cohere embed-multilingual-v3.0 Yes

Interested in a different embedding provider? Feel free to open a PR and make a suggested addition.

Want to use a different model than the one listed? Set the following environment variables in your .env file (see below) to change:

  • SENTENCE_TRANSFORMER_MODEL
  • OPENAI_EMBEDDING_MODEL
  • COHERE_EMBEDDING_MODEL

🚀 Running the App

  1. Before running the app, install Docker Desktop.
  2. Clone (and optionally fork) this Github repo to your machine.
    $ git clone https://github.com/RedisVentures/redis-arXiv-search.git
  3. Make a copy of the .env.template file:
    $ cd redis-arXiv-search/
    $ cp .env.template .env
  4. Decide which Redis you plan to use, choose one of the methods below
    • Redis Stack runs Redis as a local docker container.
    • Redis Cloud will manage a Redis database on your behalf in the cloud.

Redis Stack Docker (Local)

Using Redis Stack locally doesn't require any additional steps. However, it will consume more resources on your machine and have performance limitations.

Use the provided docker-compose file for running the application locally:

$ docker compose -f docker-local-redis.yml up

Redis Cloud

  1. Get a FREE Redis Cloud Database. Make sure to include the Search module.

  2. Add the REDIS_HOST, REDIS_PASSWORD, and REDIS_PORT environment variables to your .env file.

  3. Run the App:

    $ docker compose -f docker-cloud-redis.yml up

Customizing (optional)

  • You can use the provided Jupyter Notebook in the data/ directory to create paper embeddings and metadata. The output JSON files will end up stored in the data/ directory and used when creating your own container.
  • Use the ./build.sh script to build your own docker image based on the application source code and dataset changes.
  • If you want to use K8s instead of Docker Compose, we have some resources to help you get started.

React Dev Environment

It's typically easier to build front end in an interactive environment, testing changes in realtime.

  1. Deploy the app using steps above.
  2. Install packages (you may need to use npm to install yarn)
    $ cd frontend/
    $ yarn install --no-optional
  3. Use yarn to serve the application from your machine
    $ yarn start
  4. Navigate to http://localhost:3000 in a browser.

All changes to your frontend code will be reflected in your display in semi realtime.

Troubleshooting

Every once and a while you need to clear out some Docker cached artifacts. Run docker system prune, restart Docker Desktop, and try again.

This project is maintained by Redis on a good faith basis. Please, open an issue here on GitHub and we will try to be responsive to these.

redis-arxiv-search's People

Contributors

tylerhutcherson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

redis-arxiv-search's Issues

Human readable category names

Instead of using the arXiv category labels, translate those to human readable names, on the front end (in the filter dropdown).

Mount datasets as a volume

We need to reduce the size of the docker image by mounting the dataset as a volume. This is also just the right way to do it and will enable easier customization moving forward.

Add category filter

Filter the results by category of the paper. If the paper belongs to a specific category then it can be included in the search.

2023 papers

Is it possible to update your store to include 2023 papers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.