snoww / twitchoverlap Goto Github PK
View Code? Open in Web Editor NEWstats.roki.sh, polls twitch every 30 minutes to calculate intersection of all channels above 1,000 viewers
Home Page: https://stats.roki.sh
License: MIT License
stats.roki.sh, polls twitch every 30 minutes to calculate intersection of all channels above 1,000 viewers
Home Page: https://stats.roki.sh
License: MIT License
Hi,
I actually did something similar to this a while back at https://channel-similarity.johnpyp.com/ but it didn't generate much interest, presumably due to among other things a lack of visualization which is a core part of this project and which makes it cool and interesting to look at and is well executed.
However, I do think that the similarity measure that I used is better in the sense of capturing similarity between channel communities, and could be implemented here without too much issue. Mathematically what I did is outlined at https://channel-similarity.johnpyp.com/details but it essentially boils down to a couple of differences from what you currently have going on right now:
The weight of a viewer should be normalized according to the number of channels that they're in. The reason behind this is that we want the relative weight of that user to be determined by how much of the percent of their viewing is dedicated to that channel.
For example: channel A and channel B sharing a viewer that ONLY views these two channels on the entire site should account for more than if channel A and channel B happen to share Nightbot that's present on a large chunk of channels on the site. Currently their relative weight for similarity is the same.
(this should be fairly simple to implement, doesn't require any scraping changes, and can also be used for the realtime channel page view)
The weight of a viewer should be determined not only by if they happened to be in that channel during the time period collected, but by the amount of time spent there (i.e the number of scrapes they appeared in).
An example of the shortcoming of the current approach: if channel A happens to host a channel B during the period collected, then all those chatters appearing momentarily in channel B's chat currently provide as much weight to similarity as chatters that spend long periods of time in both of these channels.
(this could require scraping changes to store these values)
It'd be nice to be able to view the ranking of similar channels over time (say, the same period as the Atlas) from within the channel page/view.
The system you have built is impressive. I just wish it used Venn Diagrams for representation. Eg. If I select 2 streamers - Bugha, Clix and BuckeFPS, then I can get the overlap between these three streamers, unique viewers that only watch one of these streamers, the ones that watch only 2 out of the other 2 streamers and then a universal set of users(also include the major streamers that fall in the universal set, that don't watch any of the three streamers).
Maybe this isn't a bug, so this might just be a question on my part, but why are some streamer's names missing from the list of overlapped streamers? is it because there isn't enough overlap for them to warrant being on the list below the graph?
for example, Hirona has ~400 overlapping chatters to RatedEpicz channel at one point according to this graph:
yet, Hirona is not listed in the probability list below, at any percent. why is this? is the list below the graph, probability based on chatting trends over the past month or something? so a single day isn't enough to matter?
a 2nd example is Hasan overlapping with moonmoon. There's 1600 overlapping chatters, but Hasanabi is not listed in the probability list.
this site is awesome, i'm just trying to better understand the data in front of me.
oh and 2nd question, what timezone are the graphs using? is it my local timezone or a set one?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.