Coder Social home page Coder Social logo

brad-arabic-dataset's Introduction

BRAD: Books Reviews in Arabic Dataset

Description

This dataset contains 510,600 book reviews in Arabic language. The reviews were collected from GoodReads.com website during June/July 2016. This work is an extension of the early dataset of large-scale Arabic dataset, LABR, which has around 63K Arabic Book Reviews collected from GoodReads.com. The reviews are expressed mainly in Modern Standard Arabic but there are reviews in dialectal Arabic as well. The following table summarize some tatistics on the HARD Dataset.

Property Number Property Number
Number of reviews 510,598 Median reviews per book 37
Number of users 76,530 Min reviews per book 1
Avg. reviews per user 7 Number of books 4993
Max reviews per user 396 Avg. reviews per book 102
Median reviews per user 2 Number of tokens 39,886,898

The following figure depicts the reviews distribution in BRAD Datasets; balanced (inner) and unbalanced (outer).

alt text

You may refer to our paper, cited below, for details on the dataset.

Dataset

  • data/

    • bal-reviews.csv.rar: a zipped CSV file containing the balanced dataset of positive and negative reviews. Only positive (ratings 4 & 5) and negative (ratings 1 & 2) reviews are included. The dataset consists of a little bit more than 156K reviews. The format of each review record is:

                   rating<TAB>review_id<TAB>book_id<TAB>user_id<TAB>review
      
    • unbal-reviews.csv.rar: the whole dataset of more than 510K reviews. This is a clean dataset that includes all reviews. The format of each review record is:

                   rating<TAB>review_id<TAB>book_id<TAB>user_id<TAB>review
      

      where: rating: the user rating on a scale of 1 to 5 review id: the id of the review (to access the review of a specific review) book id: the id of the hotel user id: the type of user (single, couple, ...) review: the text of the review

Citation

Please cite the following paper if you decise to use the dataset: Conference: Conference:

Elnagar A. and Einea O. 'BRAD 1.0: Book reviews in Arabic dataset'. 2016 IEEE/ACS 13th International Conference of 
Computer Systems and Applications (AICCSA), pp. 1-8, Nov 2016. DOI: 10.1109/AICCSA.2016.7945800.

brad-arabic-dataset's People

Contributors

elnagara avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.