Coder Social home page Coder Social logo

eyadmshokry / searchquranbytopic Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 0.0 2.86 MB

My Graduation Project for Faculty of Computers and Information, Helwan Univeristy, Computer Science Department

License: MIT License

Python 100.00%
python word2vec deep-learning cosine-similarity search-engine flask quran topic word2vec-model verses

searchquranbytopic's Introduction

QuranSearchByTopic

Research Paper

This Project is developed to help all Muslims to deal with the Holy Quran easier and faster. as this Project allow them to search the Quran for specific Keyword or Verse, and also for a Concrete Topic or Conceptual Topic and to help them in it's Memorization and Recitation also See Recitation Part.

This project consists of two parts; the first is a Search Engine based on a Deep Learning Model called 'word2vec' used to search in the Quran using Keyword, Verse or Topic with accuracy about 70%. The second part is an iOS Application, which used to introduce the first part, which we mentioned and, also, to help users memorize and recite the Quran using Voice and help them know their mistakes. This system has high accuracy in Evaluating users' sayings in comparison of other applications.

It is a Search Engine for Quran written in Python that allows you to search by Topic or Concept like صلة الرحم, الميراث. our Search Engine is not only matching the words, but it uses a Deep Learning Model called word2vec, or Word-To-Vector to take into consideration the meaning/semantic of the words. Download it from here

Dataset Preparation

we needed to get a documented and trusted representation of verses of the whole Quran and their according topics, Because this is something religious which we cannot make it ourselves to be trusted for the users of the application.

Mushaf Al Tajweed Quran book

  • Author: compiled by Dr. Mohammed Fayez Kamel Under Supervision of Dr. Ali Abu Al-Kheir.
  • Publisher: published by Dar Al-Maarifa in Syria and authenticated by Al Azhar Islamic Research Academy in Egypt You can see online version from it here

We used this book to annotate each verse with it's related topic. So we could map each User's Query to the most related Topic using our word2vec model and Cosine Similarity technique. Then retrive the verses of this topic.

Word2Vec Model

Arabic Islamic Corpus for training the Model

We collected our Corpus which we used to train the word2vec model from many resources:

  • King Saud University Corpus of Classical Arabic (KSUCCA)
  • Quran Text with total number 751,291 words
  • Watan-2004 Abbas et al., 2011 with total number of 106,289,288 words
  • CNN-arabic, (Saad and Ashour, 2010) OSAC: Open Source Arabic Corpus with total number of 23,984,550 words
  • BBC-arabic, (Saad and Ashour, 2010) OSAC: Open Source Arabic Corpus with total number of 19,833,141 words
  • Arabic Book Reviews Aly and Atiya, 2013 LABR: Large Scale Arabic Book Reviews with total number of 38,065,922 words
  • Hadith dataset with 2,410,569 words and 34,409 unique words

Training the Model

We collected all of this corpora in only one txt file, access it from here and after processing it we used it to train our word2vec model using this command: $ ./word2vec -train corpus.txt -output model.bin -cbow 1 -size 300 -window 10 -threads 8 -binary 1 -iter 15

Note: This is not the final version of the Project. It's still under Development.

searchquranbytopic's People

Contributors

eyadmshokry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.