Coder Social home page Coder Social logo

windhooked / spaghettisearch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nwihardjo/spaghettisearch

0.0 1.0 0.0 38.41 MB

A Concurrent Search Engine built with Go

Home Page: http://spaghetti-search.herokuapp.com/

JavaScript 19.96% Go 74.64% CSS 2.63% Makefile 0.14% HTML 2.63%

spaghettisearch's Introduction

SpaghettiSearch: A Concurrent Search Engine

Fully-functioning search engine built on top of Golang to satisfy HKUST COMP4321 requirements.It is built using Golang as its backend, and React as its frontend.

Live Demo

http://spaghetti-search.herokuapp.com/

Features

  • Implemented Topic-Sensitive PageRank (T. H. Haveliwala, 2003) with the use of query as the sole given context, and user's interest is equally reflected on every topic.
  • Combination of PageRank and Vector-Space Model to rank the result
  • Utilised anchor text and metatags suggested on Google's paper to increase precision and index much more webpages
  • Make use of generator, future, and fan-in fan-out concurrency pattern in Golang to increase retrieval performance
  • Dynamic document summary retrieval
  • Use BadgerDB as database which optimised for SSD
  • Support keyword list search and phrase search (use double quotes for phrase search)

Setup & Installation

Backend

  • Install golang from here
$ sudo tar -C /usr/local -xzf go$VERSION.$OS-$ARCH.tar.gz
$ export PATH=$PATH:/usr/local/go/bin
  • Download this repo using go get
$ go get github.com/nwihardjo/SpaghettiSearch

Frontend

  • Install node and npm from here
  • The build has been uploaded. No need to install node to get this running.

Dependencies

dep is used as the package management to ensure the installed dependencies are the correct version from the correct vendor. Run dep ensure on project root to install required packages, or run go get ./... to same thing.

Building

  • Run make in the project root directory. It will install the necessary binary packages to bin/ directory, as well as install dependendcies
  • Run the crawler and specify the argument needed as below, then spin up the server. The backend and React server has been integrated, so that only one server by Golang needed to be started.
$ ./bin/start_crawl [-numPages=<number of pages to be crawled>] [-startURL=<starting entry point for the crawler to crawl>] [-domainOnly=<whether webpages to be crawled only in the domain of given starting URL)]
$ ./bin/server
  • Head up to your browser, and go to localhost:8080. The server is hosted on port 8080, or check the output of your terminal.

Contributor

spaghettisearch's People

Contributors

ak2411 avatar nwihardjo avatar pgabriela avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.