Coder Social home page Coder Social logo

twitter-user-gender-classification's Introduction

This data set was used to train a CrowdFlower AI gender predictor. You can read all about the project here. Contributors were asked to simply view a Twitter profile and judge whether the user was a male, a female, or a brand (non-individual). The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.

The data can be downloaded from kaggle.

#The Data Description

The dataset contains the following fields:

  • _unit_id: a unique id for user
  • _golden: whether the user was included in the gold standard for the model; TRUE or FALSE
  • _unit_state: state of the observation; one of finalized (for contributor-judged) or golden (for gold standard observations)
  • _trusted_judgments: number of trusted judgments (int); always 3 for non-golden, and what may be a unique id for gold standard observations
  • _last_judgment_at: date and time of last contributor judgment; blank for gold standard observations
  • gender: one of male, female, or brand (for non-human profiles)
  • gender:confidence: a float representing confidence in the provided gender
  • profile_yn: "no" here seems to mean that the profile was meant to be part of the
  • dataset but was not available when contributors went to judge it
  • profile_yn:confidence: confidence in the existence/non-existence of the profile
  • created: date and time when the profile was created
  • description: the user's profile description
  • fav_number: number of tweets the user has favorited
  • gender_gold: if the profile is golden, what is the gender?
  • link_color: the link color on the profile, as a hex value
  • name: the user's name
  • profile_yn_gold: whether the profile y/n value is golden
  • profileimage: a link to the profile image
  • retweet_count: number of times the user has retweeted (or possibly, been retweeted)
  • sidebar_color: color of the profile sidebar, as a hex value
  • text: text of a random one of the user's tweets
  • tweet_coord: if the user has location turned on, the coordinates as a string with the format "[latitude, longitude]"
  • tweet_count: number of tweets that the user has posted
  • tweet_created: when the random tweet (in the text column) was created
  • tweet_id: the tweet id of the random tweet
  • tweet_location: location of the tweet; seems to not be particularly normalized
  • user_timezone: the timezone of the user

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.