Coder Social home page Coder Social logo

islamandai / quran-nlp Goto Github PK

View Code? Open in Web Editor NEW
46.0 3.0 10.0 107.43 MB

Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP

Home Page: https://www.kaggle.com/datasets/alizahidraja/quran-nlp

License: Apache License 2.0

Jupyter Notebook 100.00%
ai corpus corpus-linguistics hadees hadith islam nlp quran tafsir translation

quran-nlp's Introduction

QURAN NLP

NLP & AI on the Quran!

Dataset Structure

  • data
    • quran
      • corpus (190,655)
        • dictionary (53,924)
        • morphology (128,219)
        • verbs (1,475)
        • lemmas (3,680)
        • lemmas (grouped) (3,357)
      • quran.csv (6,236)
    • hadith (700,000+ hadiths!)
      • Sanadset (650,000 hadith) (Note that this data crosses the limit set by github, you can download it from Kaggle)
      • arabichadith (62,169 hadith)
      • thaqalayn (26,975 hadith)
      • kaggle_hadith_clean.csv (34,410 hadith)
      • kaggle_rawis.csv (24,028 rawis)
    • namesofallah (99)
    • surah (114)
    • tafseer (4 * 6,236)
    • translation (9 * 6,236)
    • main_df.csv (6,236)

Motivation

I thought about using my knowledge of ML & NLP in the Quran to make something out of it. I have tried to get a summary of the Verses and Tafasir, getting the sentiment analysis, I have made a Search Engine so that any query can be searched as easily as a person does on Google

This is an open source project and I am trying to host it somewhere so people can use it and make the most out of it.

Collaborations are HIGHLY welcome! If anyone can help with the code or help fact-check the search results or summaries that would be a HUGE help!

Looking forward to doing something great with the Quran & NLP

Search Engine

Work till now

  1. Notebook to scrape data from the website: https://www.altafsir.com/
  2. Provided English translation and Tafseer of Quran in easy-to-use CSV format
  3. Used NLP to get the top 1000 words used in the Quran
  4. Used sentiment analysis for the Quran each surah
  5. Text Summarization for the Quran & each Surah
  6. Search Engine for Quran using Google USE (Universal Sentence Encoder)
  7. Similarity Index of Translation & Tafseer
  8. Notebook to scrape data from https://thaqalayn.net/ which is a Comprehensive Shia Hadith Library
  9. Notebook to scrape https://corpus.quran.com/ which contains corpus of Quran, including dictionary, verbs, lemmas, morphology

Top 100 most common words

Similarity Index

Future Goals

  1. Add more Data!
  2. Add more Tafaseer and translation to better train the NLP model for Search Engine & Analysis
  3. Make an end-to-end application so that everyone can benefit from the newly trained models
  4. Find insightful things from the Quran
  5. Make an Arabic NLP model capable of understanding the Quran
  6. Make a single graph database encompassing Islamic knowledge
  7. Making an AI tool to authenticate Hadith

Important Note

If you find any type of error or mistake in the translation please correct me. If you find the work interesting feel free to build more on it!

How To Contribute

Feel free to make notebooks on the current data, add more data (authentic and with sources) and have a look at the current data to make sure it is authentic and up-to-date!

Dataset also available at https://www.kaggle.com/datasets/alizahidraja/quran-nlp You can use Kaggle to work on it online too!

Project started: March 1, 2023

Islam & AI

quran-nlp's People

Contributors

abdullahan1928 avatar alizahidraja avatar muhammadsaadsiddique avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

quran-nlp's Issues

DATA NEEDED!

We need to have enough data sources to make sufficient progress in this project

Here are a few websites containing data related to Islam

I need help to download/scrape this data in csv format so that we can efficiently use them for our goal
https://www.islamicstudies.info/
https://myislam.org/
https://dorar.net/
https://tafsir.app/
https://www.cef.org.pk/
https://easyquranwahadees.com/
https://modoee.com/

The data should have a somewhat uniform format

for Quran:
Name,Surah,Ayat,Arabic
The translation & tafaseer should have sources of who wrote them

For Hadees:
name, reference number, hadees
The translation & tafaseer should have sources of who wrote them

Same goes for stories & events that took place in the history

The timeline of ayats revelation is also an interesting dataset that is needed

Make Flask/Fast APIs for getting data

Make Flask/Fast APIs for getting data

The purpose of the APIs would be to fetch result for the front end after the NLP functions or other ML functions are performed

e.g. an API call might be called "get_tafseer" and it gets a surah number, ayat number and an optional version of tafseer as there are multiple, this will return a string

This is the initial idea, it might evolve as the project evolves

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.