anvaka / gazer Goto Github PK

GitHub analysis and discovery

Home Page: http://www.yasiv.com/github/#/

License: Other

JavaScript 60.71% HTML 16.78% SCSS 22.51%

gazer's Introduction

gazer

This project aims to analyze followers base of a GitHub repository and suggest related projects. It is using information about shared number of stars to calculate similarity index between two projects.

Try it yourself

Hosted version of the app is available here: http://www.yasiv.com/github/#/ It already knows about 15,162 popular projects. If your project had more than 200 stars on Nov 30th, 2014 most likely you will get suggestions immediately. Otherwise the site will build similarities in the real time. Make sure to sort by "Similarity coefficient" when application completes gathering information.

Offline index is produced by ghindex

Hows and Whys

It's kind of experiment of my own curiosity. I wanted to find a mobile UI library for the web. After googling around I found a library, but I wanted to see more related projects. GitHub did not provide this feature, so I developed a simple metric to calculate similarity of two projects.

// Metric 1: Similarity measure of two projects A and B.
similarity = 2 * sharedStarsCount(A, B)/(numberOfStars(A) + numberOfStars(B));

While this is very naive formula, in practice it gives interesting results. For example, among top related projects for my graph drawing library vivagraph.js (650 stars):

strathausen/dracula - JavaScript browser based layout and representation of connected graphs. (274 stars)
jacomyal/sigma.js - an open-source lightweight JavaScript graph drawing library (1,395 stars)
samizdatco/arbor - a graph visualization library using web workers and jQuery (1,221 stars)
dhotson/springy - A force directed graph layout algorithm in JavaScript (639 stars)
uskudnik/GraphGL - A network visualization library (70 stars)

For popular projects, with more than 2-3k stars, the metric [1] can be polluted by other popular projects (like Backbone, or Bootstrap): We, developers, all tend to like beautiful code. Surprisingly, the amount of "popular noise" can be significantly reduced by analyzing limited subset of random stargazers. Metric [1] can be rewritten as

// Metric 2: Similarity measure of two popular projects A and B
weight = randomSubsetSize/numberOfStars(A);
similarity 2 * sharedStarsCount(A, B)/(weight * (randomSubsetSize + numberOfStars(B)));

Here is an example of angular.js (11K followers) analysis on random subset of 500 followers:

angular-ui/angular-ui - AngularUI - The companion suite for AngularJS (2,119 stars)
angular/angular-seed - Seed project for angular apps. (1,998 stars)
jmcunningham/AngularJS-Learning - A bunch of links to blog posts, articles, videos, etc for learning AngularJS (2,465 stars)
angular-ui/bootstrap - Native AngularJS (Angular) directives for Twitter's Bootstrap. Small footprint (5kB gzipped!), no 3rd party JS dependencies (jQuery, bootstrap JS) required! (1,320 stars)
mgcrea/angular-strap - Bootstrap directives for Angular (1,177 stars)

Caveats

Github does not have a bulk API, which makes processing of popular projects extremely time consuming. It could be mitigated by serving precomputed suggestions. If suggestions are not available, the time can further be reduced by limiting amount of stars to analyze (see metric [2]).
Number of requests to GitHub API is rate limited (60 per hour). Sign in to the application with OAuth to increase rate limit up to 5,000 requests per hour.
Analyzing randomized subset may produce different ranking and pick different projects as the best match. But you will notice the same projects are being picked between multiple runs of the algorithm. Pay attention to those.
The algorithm will not work for projects with small amount of stars. I'm still not sure what is the lower bound here (100 stars?). For projects with 500+ stars quite often results are interesting.

Local build

git clone https://github.com/anvaka/gazer.git
cd gazer
npm install
bower install
grunt server

What do you think?

I would love to hear your feedback!

If you know how to make distance calculation better - I'm very open to incorporate your metrics.

This is my first angular app, so I'm still learning.

If you work at GitHub - I would love to see this feature implemented by you, guys :)

Do not hesitate to open an issue or submit a pull request :).

gazer's People

Contributors

Stargazers

Watchers

Forkers

l2ded imclab redvv mcanthony mrossi124 medhassno1 genevera perrozzi vernitgarg winlinvip standardgalactic

gazer's Issues

GitHub API changes

Hello!

First of all, thank you for this tool! It is a great way to discover projects.

Sadly, it seems GitHub changed their API, and the OAuth endpoint no longer works; my browser shows a 302 for the request on this URL:
https://github.com/login/oauth/authorize

This means the rate limit is low (60 every... hour?).

trouble loggin into github and making searches in local machine

so i've made it running on ubuntu 18.04 with the following steps:

install npm, bower, grunt, fix some issues. might wanna look at:
  https://stackoverflow.com/questions/12369390/bower-command-not-found
  https://stackoverflow.com/questions/55921442/how-to-fix-referenceerror-primordials-is-not-defined-in-node-js
install ruby-full using apt, sudo then gem install compass
  note compass is not mongodb's compass!
at this point shall be running grunt server, yet another error occurs, might wanna look at:
  https://stackoverflow.com/questions/55763428/react-native-error-enospc-system-limit-for-number-of-file-watchers-reached
after all those should see page same as this one:
  http://www.yasiv.com/github/#/
at this point should be able to connect to own github account otherwise check internet restrictions

yet after an initial successful run, i wasn't able to run any searches or log into github. i was able to get redirected to github's login page, but it doesn't show my account after i successfully log in.

i am using a socks5 proxy and firefox. so that might be why. but still i wonder if there's a way to work with the proxy server?

thanks in advance.

Option to limit followers analyzed

When processing N out of M followers, I see:

Processing indyfromoz: 300/1200
Processing jeancarlozapata: 100/700
Processing mindinpanic: 200/300
Processing ramiy

The X/Y number at the end is something I think should also be configurable because it appears to have an impact on the analysis time too. Perhaps implement and see whether it has an impact on result quality? :)

Make call to action more clear on homepage

When I first landed on http://www.yasiv.com/github/#/ after logging in, it wasn't entirely clear what I should do. I saw the same video and search/input box at the top and there wasn't much guidance in the UI.

Because a logged in user is more likely to want to look at their own repositories, is it worth using the GitHub API to pull this info down and display in this view as a list? Selecting can then begin the analysis.

403 Forbidden on AWS

It seems looking at a repository that was not crawled before (for example, here), the request to AWS issues a 403 Forbidden:
http://s3.amazonaws.com/github_yasiv/out/tracboat/tracboat.json

Also, the page shows the message "Sorry, I couldn't find this repository on GitHub.com. Make sure it exists. ". I believe it might be related.
The GitHub "stargazers" request still succeeds:
https://api.github.com/repos/tracboat/tracboat/stargazers?per_page=100&page=1&callback=angular.callbacks._1&access_token=[...]

Better sorting method

Both "Shared # of stars" and "Similarity coefficient" are useful ways to sort, but they also each have problems. The first is heavily weighted toward popular repos while the second is heavily weighted toward obscure repos.

Example query:
http://www.yasiv.com/github/#/costars?q=timber%2Ftimber

Top result by shared stars:
url: https://github.com/zeit/next.js
stars: 39,617
shared_stars: 3
similarity coefficient: 0.00007572506752 (I assume this is shared_stars / total_stars)

Top result by similarity coefficient:
url: https://github.com/dylanvorster/exia-wordpress
stars: 1
shared_stars: 1
similarity coefficient: 1

Maybe something as simple as score = Math.log(stars) + similarity_coefficient * 10 would give a good balance?

zeit/next.js would have a score of ~10.6 (Math.log(39617) + 0.00007572506752 * 10)
dylanvorster/exia-wordpress would have a score of 10 (Math.log(1) + 1 * 10)

Clicking a username takes you to the incorrect profile

Currently clicking on my profile name takes me to http://www.yasiv.com/anvaka :)

Limit follower gathering option

In the below screenshot a lot of time is spent in the gathering followers step. What is actually happening in this process? Are we making bulk requests for just a list of followers from the API or are we fetching individual records for each of those followers?

In either case I would probably also enforce an option to limit this. Again maybe down to 300-500. It will affect the results, but should speed up overall analysis time.