Coder Social home page Coder Social logo

xin007-kong / coventry-purehub-search-engine Goto Github PK

View Code? Open in Web Editor NEW

This project forked from maladeep/coventry-purehub-search-engine

0.0 0.0 0.0 10.56 MB

Python streamlit app to uncover the brilliance: explore profiles, groundbreaking work, and cutting-edge research by the exceptional minds of Coventry University.

License: MIT License

Python 100.00%

coventry-purehub-search-engine's Introduction

Project Screenshot

Uncover the brilliance: Explore profiles, groundbreaking work, and cutting-edge research by the exceptional minds of Coventry University.

Table of Contents

Overview

The Coventry PureHuB Search Engine is a web application that allows users to search for research publications and authors affiliated with Coventry University. The application utilizes natural language processing techniques, such as stemming and TF-IDF, and other techniques like inverse indexer to provide accurate search results in a user-friendly manner.

Features

  • Research Publication Search: Users can search for research publications by entering relevant keywords or phrases. The search engine employs advanced techniques such as stemming and TF-IDF to match the user's query with the indexed publication data accurately.

  • Author Search: Users can also search for specific authors by their names or related keywords. The search engine applies the same advanced techniques to match the user's input with the indexed author data. Stemming and TF-IDF: The search engine utilizes stemming to reduce words to their base or root form, enabling broader search coverage. Additionally, the application employs TF-IDF to calculate the importance of each term in the documents and generate relevance scores for the accurate ranking of search results.

  • Inverse Indexer: The search engine includes an inverse indexer that indexes and stores the publication and author data in a structured manner, enabling efficient retrieval and retrieval of relevant information.

  • Multinomial Naïve Bayes Classification: The search engine incorporates the Multinomial Naïve Bayes classification technique to categorize publications into different subject categories.

  • Cron job: The specific cron schedule used was "0 0 * * 0" along with the command file "Scrapper.py," indicating that the crawler would run every Sunday at midnight. This configuration ensured that the study remained up-to-date with the latest data by consistently retrieving fresh information at the beginning of each week.

Light mode

Try PureHuB

Installation

  1. Clone the repository:

    git clone https://github.com/maladeep/Coventry-PureHub-Search-Engine.git

  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

Run Live App

or

  1. Run locally

Streamlit run clone https://github.com/maladeep/Coventry-PureHub-Search-Engine.git

  1. Open the provided URL in your web browser.
  2. Enter your search query, select the search filter and search type, and click the "SEARCH" button.
  3. View the search results displayed in cards.
  4. Scroll down to view more search results.

Dependencies

The project has the following vital dependencies:

The Coventry PureHub Search Engine relies on the following dependencies:

  • streamlit: The web application framework used for building the user interface.
  • Pillow: A library for opening and manipulating images, used to display an image in the streamlit application.
  • ujson: A fast JSON encoder and decoder library, used to load JSON data.
  • scikit-learn: A machine learning library, used for text preprocessing, TF-IDF vectorization, and cosine similarity calculation.
  • nltk: The Natural Language Toolkit, used for tokenization, stemming, and stop-word removal.
  • numpy: A powerful library for numerical computations in Python.
  • pandas: A data manipulation library, used for handling and processing structured data.
  • seaborn: A data visualization library, used for creating attractive and informative plots.
  • matplotlib: A versatile plotting library, used for generating various types of charts and graphs.
  • scikit-multilearn: A library for multi-label classification, used for advanced search features.
  • requests: A library for making HTTP requests, used for fetching external resources.
  • beautifulsoup4: A library for web scraping, used for extracting data from web pages.
  • selenium: A library for web automation, used for interacting with web pages.
  • webdriver_manager: A library for managing web drivers, used for browser automation.

Contributing

Contributions to this project are welcome. If you find any issues or would like to suggest improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Note

This work is done for the partial fulfillment of STW7071CEM Information Retrieval coursework provided by Coventry University.

coventry-purehub-search-engine's People

Contributors

maladeep avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.