Coder Social home page Coder Social logo

semantic-text-matching's Introduction

Semantic Text Matching

ML-model for solving the problem of searching semantically similar text documents.

The model processes a new incoming text question and returns a list of N similar questions from an existing dataset of 4,567 medical-related questions.

Model

Unsupervised learner for implementing neighbor searches: Nearest Neighbors

Metric to use for distance computation: cosine

Preprocessing

  1. Tokenization: TfidfVectorizer
    • tokenizer: custom function with regular pattern r"(?u)\b\w\w+\b"
    • token_pattern: None
  2. Stop words: nltk.corpus.stopwords
  3. Minimum word length: 2
  4. Remove punctuation
  5. Decode special symbols: html.unescape
  6. Lemmatization: spaCy en_core_web_sm model

Input

question - string input question

Output

questions - List[str] output similar questions (from Dataset)

Metric

Top-N accuracy (accuracy@n)

Accuracy@5 - 0.887

Run

1. Clone repo

git clone https://github.com/DimionX/semantic-text-matching.git

cd semantic-text-matching

2. Creation of virtual environment

python -m venv .venv
source .venv/bin/activate

Dev:

pip install -r requirements.dev.txt
pre-commit install

Production:

pip install -r requirements.txt

3. Download dataset

Dataset info - Medical Questions Pairs

python ./src/load_dataset.py

4. Model Training

Load spacy models

python -m spacy download en

Model training and save

python ./src/save_model.py

Jupyter Notebook

main.ipynb

Run Streamlit

streamlit run ./src/stream.py

Streamlit app in your browser: http://127.0.0.1:8501

Run CLI

python ./src/matcher.py

Run Docker Compose

docker compose up

Streamlit app in your browser: http://127.0.0.1:8501

semantic-text-matching's People

Contributors

dimionx avatar

Stargazers

Fedor Konovalenko avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.