Peyman Mortazavi
It's interesting that our data has some issues, I wanted to find the ratio / but we only have 'null' for likes.
db.reddit.distinct("likes") returns [ null ]
You could find interesting social facts. Like finding top downvoted posts, that's what people in society--or at least in reddit don't like.
What majority of topics are about/likes and dislikes in the community. Like you could see if majoriy of people in reddit are tilted towards a political party or not.
[Link to Code or pasted code] [Answer]
It'll affect our analyze dramatically. The people who are good writers in reddit space and often attract many upvotes will remove many raddits from our data.
Yes, very much. If we only study posts or people whose post are upvoted often, our results and conclusion will be biased.
We'll be biased towards popularity.
Private subraddits are filtered out. Only popular raddits are included.
To analyze real data modified data and see if the result is biased toward a specific direction.
db.precipitation.aggregate([{"$match":{"date":{$regex:/20100425/}, "station.name":"MADISON DANE CO REGIONAL AIRPORT WI US"}},{"$group":{"_id":null,"total":{"$sum":"$hpcp"}}}])
ANSWER: 62
[Query snippet]
db.normal.aggregate([ {$match: {"DATE": {$regex: /20100425/}, "STATION_NAME": {$regex: /LAS VEGAS/}}}, {$group: {"_id":null, avg: { $avg:"$HLY-WIND-AVGSPD"}}} ])
ANSWER: 110.08333333333333
db.businesses.aggregate([{ $match: { city: "Madison", state: "WI" } }, { $group: { _id: null, total: { $sum: "$review_count" } } }, { $sort: { total: -1 } } ])
ANSWER: 34410
db.businesses.aggregate([{ $match: { city: "Las Vegas", state: "NV" } }, { $group: { _id: null, total: { $sum: "$review_count" } } }, { $sort: { total: -1 } } ])
ANSWER: 577509
db.businesses.aggregate([{ $match: { city: "Phoenix", state: "AZ" } }, { $group: { _id: null, total: { $sum: "$review_count" } } }, { $sort: { total: -1 } } ])
ANSWER: 200089
[Code] [Answer]