Coder Social home page Coder Social logo

wikipedia-search-engine-web-app's Introduction

Wikipedia Information Retrieval System Project

A simple information retrieval system for Wikipedia articles with AI powered support.

Introduction

This project is a simple information retrieval system for Wikipedia articles. It is written in Python 3.10 and uses cosine similarity to rank the articles. The system is able to index the articles and search for them. The search results are ranked by their cosine similarity to the query. The system is able to handle multiple queries at once and can be used in streamlit web app.

Features

  • Debug mode (Prints the query and the results as well of similarity scores)

  • Enable AI powered support your search

Requirements

  • Docker or Docker-Compose
  • Ollama
    • Your favorite llm (eg. llama2)

Installation

Installation via Docker Compose (Recommended)

  1. Clone the repository
git clone https://github.com/112523chen/wikipedia-search-engine-web-app.git
cd wikipedia-search-engine-web-app
  1. Run the docker-compose file

You can change the environment variables in the docker-compose file depending on your computer resources.

environment:
  - OLLAMA_HOST=ollama # The hostname of the ollama server
  - OLLAMA_PORT=11434 # The port of the ollama server
  - OLLAMA_MODEL=llama2 # The model of the ollama server
docker-compose up
  1. Open your browser and go to http://localhost:1234

Installation via Docker

  1. Clone the repository
git clone https://github.com/112523chen/wikipedia-search-engine-web-app.git
cd wikipedia-search-engine-web-app
  1. Run the dockerfile

You can change the environment variables in the docker-compose file depending on your computer resources.

export OLLAMA_HOST=host.docker.internal # The hostname of the ollama server
export OLLAMA_PORT=11434 # The port of the ollama server
export OLLAMA_MODEL=llama2 # The model of the ollama server

You may only need to update the OLLAMA_MODEL variable.

docker build -t wikipedia-search-engine-web-app .
docker run -p 1234:1234 \
  -e OLLAMA_HOST=$OLLAMA_HOST \
  -e OLLAMA_PORT=$OLLAMA_PORT
  -e OLLAMA_MODEL=$OLLAMA_MODEL \
  wikipedia-search-engine-web-app
  1. Open your browser and go to http://localhost:1234

Usage

  1. Enter your query in the search bar and click search

Setbacks

  • The current system of AI powered support is not very good. It a fairly slow process. Need to find a better way to do it.
  • The corpus is not very big as it as around 15000 articles. This is due to the limits that Github has on the file size. (Email me if you want the full corpus)

Roadmap

  • Improve the AI powered support
  • Improve IR system
  • Add CLI support

wikipedia-search-engine-web-app's People

Contributors

112523chen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.