Coder Social home page Coder Social logo

a-monolingual-arabic-parallel-corpus-'s Introduction

A-Monolingual-Arabic-Parallel-Corpus-

A7'ta: A Monolingual Arabic Parallel Corpus for Grammar Checking

Collected by Nora Madi email : nmadi at ksu dot edu dot sa site: https://github.com/iwan-rg

Reference: N. Madi and H. S. Al‐Khalifa, “A7’ta: Data on a Monolingual Arabic Parallel Corpus for Grammar Checking,” Data in Brief, vol. 22, pp. 237–240, 2019.

Resource

The parallel corpus is a collection of Modern Standard Arabic (MSA) sentences (and words) extracted from the book كشاف الأخطاء اللغوية - الصحافة السعودية أنموذجاً (Linguistic Error Detector – Saudi Press as a Sample).

Data Files:

Contains erroneous Arabic sentences and their correct counterparts.

Data Structure:

1- Text format 2- UTF-8 encoding

Statitics :

The data contains 300 documents, 445 erroneous sentences and their error-free counterparts, and a total of 3,532 words. Each pair of sentences differs in only one word.

Folder structure:

  1. There are 8 folders for each of the eight main categories in the book.
  2. Within each folder, there is a sub-folder for each sub-category within the main category if any.
  3. Inside each main folder or sub-folder, there are folders for each type of error.
  4. Within each error type folder, there are two files; one for the correctly written sentences (الصواب) and another for the erroneous sentences (الخطأ).

a-monolingual-arabic-parallel-corpus-'s People

Contributors

iwan-rg avatar noraamm avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.