Coder Social home page Coder Social logo

thisarawelmilla / simplewebapplication Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 152 KB

A webapplication which valiadte a xml payload, convert it into json format and save in a database(used technologies: maven, jax-rs, tomcat, mysql))

Java 3.44% Jupyter Notebook 96.56%

simplewebapplication's Introduction

IR_project-song_search_engine

The songs can be searched by a search engine with 8 metadata. The song title, name of the singer, name of the composer, name of the music producer, lyrics, beat, genre and rating. An interface is also available supporting basic functions to enhav]nce the user experience. Search bar, facets in checkbox format, and search results are provided along with the user interface. The search engine is implemented with the elastic search engine and the python client.

System Architecture

alt text

Indexing

I create specific mapping focusing on this scenario. I used positional indexing, as the order of the word occurrence is important especially for the song title and the lyrics.

Tokenization

Tokenize the query by all the symbols as any symbol does not give any special meaning in this scenario.

Spell correction

check each word in the query is in corpus otherwise suggest a term by calculating the similarity of each word in the corpus with the weighted edit distance method.

alt text

Remove stopwords

Remove stop word in the query. If any token has occurred in the stopword corpus that token will be ignored at the searching process. However, this query process does not apply when the searching is done under the song title and lyrics fields as those fields must be searched with the stop words. For other fields, all the stop words will be removed.

Lemmatization

The lemmatization is applied to the tokens which are remaining after removing the stopwords. the lemmatization is implemented only focusing on the artist's name. Here also this query process does not when the searching is done under the song title and lyrics fields as they need to be searched in the given form of the words.

Text classification

when the query is given, the text classification can be used to identify which metadata field the user interested in. I implement the text classification as the one-layer neural network. As the training data are unavailable I manually handcrafted weight and the bias. For that, I identified potential words for each metadata class assume that only those words can be used to specify a particular metadata field. The text classification gives the probability of the user query being interested in a particular metadata class.

alt text

Boosting fields

Using the results of the text classification, the necessary fields will be boosted. The boosting affect the scores of the search result given by the electricsearch.

Facets

The facets are available for advanced filtering searches. The searching can be done with any number of facets nad any combination of facets either along with the main query or only with facets.

alt text

Retrieve documents

When retrieving the songs, two separated query processing paths are used as shown in the architectural diagram. One is for song titles and lyrics and the other one is for all other fields. match_phrase was used in the searching process as the position of the words is important, especially for the song tile and the lyrics.

simplewebapplication's People

Contributors

thisarawelmilla avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.