Due to the social media now, Arabic dialects have begun to appear in written form. The problem of automatically determining the dialect of an Arabic text remains a major challenge for researchers. This project use deep learning model and machine learning model for the problem of automatically identifying the dialect of a text written in Arabic.
The process of computationally identifying the language of a given text is considered the cornerstone of many important NLP applications such as machine translation, social media analysis, etc. Since the dialects could be considered as a closely related languages, dialect identification could be referred to as a special (more difficult) case of language identification problem.
Dataset collected of tweets belonging to a wide range of country level Arabic dialects covering 18 different countries in the Middle East and North Africa region. Our method for building this dataset relies on applying multiple filters to identify users who belong to different countries.
- download the dataset from link above
- Download model trained above link if not train from scratch
- install Libraries in requirment
- run app.py in folder Deployment