Coder Social home page Coder Social logo

subredditsashashtags's Introduction

Analyzing the phenomenon of 'subreddits as hashtags' - when reddit users leave comments consisting only of a subreddit name as a form of commentary. Subreddits commonly used for this purpose include /r/nocontext, /r/thathappened, and /r/titlegore.

Files

  • top_hts_comms.csv: The 15k most frequently hashtagged subreddits, with the number of times they've been hashtagged ('tot'), and the total number of comments posted in the actual subreddit, as a measure of its level of activity.
  • lt_10k_comments_top_hts.csv: The most commonly hashtagged low-activity subreddits (having less than 10k total comments).
  • long_subs.csv: hashtagged subreddits sorted by length of subreddit name. long_subs_gt_2.csv is the same thing, but only subreddits that have been hashtagged more than a couple of times.
  • 15k_unique_hts.csv: a random sample of 15,000 subreddit names which have been hashtagged only once (case sensitive). Most aren't actual subreddits.
  • context.csv: hashtagged subreddits with 'context' in their name
  • theydidthe.csv: hashtagged subreddit names that are variants on the 'they did the [monster] math' snowclone.

Numbers

  • 380k distinct hashtags (case sensitive). 325k after lowercasing.
    • 280k appear only once. 35k twice, ~40k 3-10 times.

Methodology

All this data was gathered from the Reddit comments dataset hosted on Google BigQuery here.

  • RE used was '^\s*/[rR]/[[:alnum:]]*\s*$'
    • probably should have made leading and trailing forward slashes optional (but leading slash and no trailing slash is the most common format by far)
    • also this wouldn't have caught cases where people linkified the subreddit name in markdown (though I don't think this is that common?)
    • running the query that builds the hashtags table by running this RE against every comment body eats through about .8 TB, which is about $4 worth, or most of the free limit for one month.
  • case preserved in the initial intermediate table, but most of the files in this repo are post-lowercasing and merging

subredditsashashtags's People

Contributors

colinmorris avatar

Watchers

 avatar  avatar

Forkers

tarun-ssharma

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.