Coder Social home page Coder Social logo

prnan4 / domain-spell-checker Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.29 MB

Spell Corrector functionality for medical domain in Scala which consists modules to build a medical word corpus and correct misspelled words.

Scala 100.00%
end-to-end-pipeline scala spellchecker text-processing web-parsing

domain-spell-checker's Introduction

Domain Spell Checker

The Domain specific Spell Checker tool mainly consists of three modules namely the Web scraping module, Text processing module and the Spell Checker tool.

Web Scraping module

Web scraping module is used to access and download the papers hosted in the BioRxiv site. User has to enter the number of papers he wants to download and the file location where he wants to save the papers.

Text processing module

This module is used to build word corpus from the extracted pdfs. User enters the file location where the papers are stored, number of papers to parse and the location where the corpus should be built.

Spell checker tool

Scala implementation of Peter Norvig's algorithm for spell checker. This tool takes a word as inout and checks if it is spelled correctly. If the word is spelled incorrectly, it returns a possible set of suggestions to the user.

img

Setting up the project

This project uses scala version "2.12.8" and sbt version "1.3.8". It also uses jsoup, apache pdfbox, httpcomponents, scalatest and log4j logging dependencies. These can be found in the build.sbt file.

To set up the project, clone the master branch to the local. Run the following commands inside the directory.

sbt compile

img

sbt assembly

img img

sbt run

On running "sbt run", the main classes in the project are displayed. img

To perform web scraping, choose option 2. Enter the number of papers to download and the file location to save the papers. img

To parse pdfs and build the corpus, choose option 1. Enters the file location where the papers are stored, number of papers to parse and the location where the corpus should be built. img

To use the Spell Checker functionality, choose option 3. Enter the owrd to check spelling and Q to exit out of tool. img

domain-spell-checker's People

Contributors

prnan4 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.