labhr / octohatrack Goto Github PK

View Code? Open in Web Editor NEW

185.0 11.0 27.0 2.34 MB

🐙👒 Show _all_ the contributors to a GitHub repository.

License: BSD 3-Clause "New" or "Revised" License

Python 99.81% Dockerfile 0.19%

github-api contributions hat-rack

octohatrack's Introduction

🐙👒 - octohatrack

It's easy to see some code contributions on a GitHub repo, but what about everything else?

pip install octohatrack
octohatrack LABHR/octohatrack

Octohatrack takes a GitHub repo name, and returns two lists:

A list of all users as defined by GitHub as a contributor
A list of all contributors to a project

What is a 'direct contributor'?

On any GitHub repo page, the header at the top of the file listings shows a number of commits, branches, releases and contributors. If you click the 'contributors' link, you get a list of users that contributed code to the master branch of the repo, ordered by the commits and lines of code contributed. This list is limited to the top 100 users.

GitHub has acknowledged 'Community Contributors', those that have contributed code to the master branch of repos that are dependencies of the current repo. The total count of these contributors was visible by hovering over the 'contributors' link on the main repo.

Update 2019-06-13 - GitHub now uses the term Direct and Community contributor.

Update 2020-06 - GitHub removed Community contributor visibility with a UX update.

So, what are 'all contributors', then?

That's everyone who has worked on a GitHub project.

It compiles a complete list of:

the GitHub-defined contributors (not just the top 100), plus
everyone who has
- created an issue,
- opened a pull requests,
- commented on an issue,
- replied to a pull request,
- made any in-line comments on code,
- edited the repo wiki
or in any other way interacted with the repo.

It also adds anyone manually added to the CONTRIBUTORS file on a repo (if it exists). See the bottom of CONTRIBUTORS for details on the formatting of this file.

Limitations

GitHub Reactions are not counted. Issue #87
Does not iterate over dependencies (although octohatrack could be run over these independently.)

#LABHR

"Let's All Build a Hat Rack" (#LABHR) is an original concept by Leslie Hawthorn

Installation

pip install octohatrack

octohatrack requires Python 3. Check your pip --version to ensure that it's pointing to a Python 3 installation. If you have both Python 2.7 and Python 3 on your system, you may need to install using:

pip3 install octohatrack

See "Debugging: Python 3 requirement" for more information.

Usage

usage: octohatrack [-h] [--no-cache] [--wait-for-reset] [-v] username/repo

positional arguments:
  username/repo     the name of the repo to parse

optional arguments:
  -h, --help        show this help message and exit
  --no-cache        Disable local caching of API results
  --wait-for-reset  Enable waiting for rate limit reset rather than erroring
  -v, --version     show program's version number and exit

Define an environment variable for GITHUB_TOKEN to use an authentication token to avoid being Rate Limited to 60 requests per hour (allows for deeper searching).

Development Usage

For advanced use cases, like development, you have more options than the published version.

Run this repo locally

git clone https://github.com/labhr/octohatrack
cd octohatrack
virtualenv venv
source venv/bin/activate
pip install -e .
python3 -m octohatrack [arguments]

Run octohatrack in a Docker container

git clone https://github.com/labhr/octohatrack
cd octohatrack
docker build -t octohatrack .
docker run -e GITHUB_TOKEN octohatrack [arguments]

Example output

$ octohatrack LABHR/octohatrack

Checking repo exists....
Getting API Contributors...................
Getting Issue and Pull Request Contributors...............................................................................................................................................................................................................................................................................................................................................
Getting File Contributors....
Getting Wiki Contributors...

All Contributors:
Jiagod
Anna Ossowski (ossanna16)
Chandler Song (Chandler-Song)
Christopher Hiller (boneskull)
Cory Benfield (Lukasa)
Davey Shafik (dshafik)
David Beitey (davidjb)
Deb Nicholson (baconandcoconut on twitter)
Deleted user (ghost)
E. Dunham (edunham)
Gawain Lynch (GawainLynch)
Isabel Drost-Fromm (MaineC)
Jan Niggemann (jniggemann)
Joel Nothman (jnothman)
John Vandenberg (jayvdb)
Katie McLaughlin (glasnt)
Kenneth Reitz (kennethreitz)
Kirstie Whitaker (KirstieJane)
Kristian Perkins (kristianperkins)
Laura (alicetragedy)
Leslie Hawthorn (lhawthorn on twitter)
Marc Tamlyn (mjtamlyn)
Mike Sampson (mfs)
Nick Coghlan (ncoghlan)
Opal Symes (software-opal)
Parker Moore (parkr)
Patrick Connolly (patcon)
Rogerio Prado de Jesus (rogeriopradoj)
Russell Keith-Magee (freakboy3742)
Sumana Harihareswara (brainwane)
Sven Dowideit (SvenDowideit)
Tennessee Leeuwenburg (tleeuwenburg)
The Gitter Badger (gitter-badger)
Thomas A Caswell (tacaswell)
Thomas Winwood (ketsuban)
Tim Groeneveld (timgws)
Tobias Kunze (rixx)
Tom Clark (tclark)
Vladislav Mihov (skilldeliver)

Repo: LABHR/octohatrack
GitHub Contributors: 14
All Contributors: 39 👏

Debugging

Python 3 requirement

octohatrack requires Python 3.

This is because there's a number of features that require Python 3, and octohatrack is not --universal. More specifically, there are some system utils that are Python 3 only, and Unicode support in Python 3 is so much easier than in Python 2.

If you are having issues installing and are getting a octohatrack requires a Python 3 environment error, check:

python --version
pip --version

If you are running in an environment with both Python 2 and Python 3, you may need to use pip3 to install.

There are two checks in setup.py and __main__.py that will end the installation or execution, respectively, running if it doesn't detect a Python 3 environment.

If you are running in a Python 3 environment and it kicks you out, please log an issue, including your python --version, and if you're running in a virtualenv.

Cache

In order to not make duplicate API calls, octohatrack uses caching via requests-cache.

Previous versions used a local cache_file.json.

Any time an external API call is made, it gets saved to a local cache file so that any subsequent calls don't have to burn an API call.

You can disable the cache by using the --no-cache flag.

To reset the cache, remove the cache_file.json or cache.sqlite file.

If you experience ongoing issues with the caching, please log a detailed issue describing what you're seeing

Rate limiting

Even if you define a GITHUB_TOKEN, you may be rate limited for a popular repository. Using --wait-for-reset will have Octohatrack sleep until GitHub says your token is usable again.

Faster Pull Request/Issue results with Big Query

The slowest function of octohatrack is iterating through all the issues and pull requests of a repo and getting a list of all the issue/pull request openers and all commenters. This part of the data collection is probably the part that will be rate limited.

This section can be faster, but more expensive, with BigQuery.

Following the GH Archive instructions, you can get the list of the events from a repo in under 10 seconds.

⚠️ Check the BigQuery pricing page for more details, but the following query should stay under the free quota (~150GB of the 1TB limit).

SELECT
  actor.login
FROM
  `githubarchive.month.*`
WHERE
  repo.name = "username/repo"
  AND type NOT IN ("WatchEvent", "ForkEvent") -- octohatrack doesn't consider these participation
GROUP BY 
  actor.login
ORDER BY
  LOWER(actor.login) ASC

This data set doesn't understand renaming of users or repos. You may end up with old/dead aliases in your results.

If you have renamed your repo, make the following change:

WHERE
-  repo.name = "username/repo"
+ (repo.name = "username/repo" OR repo.name = "username/oldrepo")

A sample implementation of this method is available in octohatrack_bigquery.py.

Wiki

Because GitHub doesn't have an API endpoint for being able to parse gollum-based repo-wikis, octohatrack defaults to cloning wiki repos locally, and parsing via gitpython.

If there are issues cloning the wiki, or other issues, it shouldn't break an octohatrack run, but if you do encounter issues, please log an issue, and be sure to include platform information (this functionality has been tested on Mac OSX Yosemite and Ubuntu Xenial).

To do

include merge-only contributors

Code of Conduct

Octohatrack operates under a Code of Conduct.

License

Octohatrack is distributed under the MIT license.

This project is not affiliated with GitHub.

octohatrack's People

Contributors

Stargazers

Watchers

octohatrack's Issues

Parse a common form of CONTRIBUTORS for additions

This idea has been bouncing around my head for a bit, so I thought I'd braindump here if anyone had any comments.

Projects sometimes maintain a CONTRIBUTORS file, where people are listed. In the age of (known limited) contribution lists already being aggregated, I thought it might be an idea to add a semi-automated groking of these kind of files along with the existing parsing.

My thought is that using a git-commit-like format with git log formatting, there could be a pseudo-standard of human readable and machine parseable contributions. When getting the metadata for a repo, the master version of this file, if it exists, can be added to the listings.

I'm thinking something like this:

# These people are additionally awesome for helping with things

Alanis Morrisett <[email protected]>
Kaywinnet Lee Frye <[email protected]>

# More comments could go anywhere here

For any line that looks like a display name and angle-bracket, these should be added to the non-coding contributions list.

In my use case, I'd add people who didn't commit code to this list. They might not have a GitHub account, so I'd probably fallback to pulling their Gravatar for the display versions.

Include name as well as username in output

This would help identify the people when many contributors don't have gravitars, as well as making the text output more usable in release notes.

Updating README.md with a couple of n00b Q & As

Firstly - thank you for making this! My team and I really want to recognise all types of contributions to our project and this seems like a great way to do just that!

I've just installed octohatrack on a windows machine (successfully) and it took me a little while to figure out a couple of the options so I though I'd offer to update the README.md with the info I was looking for to make it easier for the next person.

Specifically, the README.md doesn't explain the -g, --generate-html option which meant I was scratching my head for a little bit trying to figure out what I was supposed to do with the .json file that was generated. It wasn't hard to figure out, but I just thought I could add a little note to point out how useful it is!

Then there was one other question: am I right that a simple use case would be to download this locally and then every now and again re-run the code to update your html file? Is there a better/different way that you'd recommend?

Thanks again!

Update README to show all parameters, and sample invocations.

Known issue: Contributor mismatch

(Possible cause: Pull Requests with multiple participants)

I need to dive into this more, but it appears that the statistics for what GitHub reports in the banner of a repo and what the API endpoint is reporting might not be the same, and I think this is possibly because of issues when the participants of a Pull Request differ to the participants in a git branch

A PR would be attributed to one person, but when there's more than one person working on a branch and it gets merged, there appears to be an inconsistency.

In the wild: https://twitter.com/nibalizer/status/696279727120654336

Just ran octohatrack on voxpupul/puppet-collectd:
Code Contributors: 30
Non Code Contributors: 133

However, https://github.com/voxpupuli/puppet-collectd reports 94 contributors

Full investigation pending

Improve documentation on setup

Reticketed from #87 (comment)

After a little stumbling around, I got this working (I use pipenv)

brew install pipenv # for keeping each app in a virtualenv
brew install pyenv # for installing other versions of python
pipenv --version
>>> pipenv, version 2018.10.9
pyenv --version
>>> pyenv 1.2.7
pipenv run python --version
>>> Python 3.6.5
# Install a 3.x version with pyenv if you don't have it
git clone LABHR/octohatrack
cd octohatrack
pipenv run python setup.py install
pipenv run octohatrack
>>> usage: octohatrack [-h] [--no-cache] [--wait-for-reset] [-v] username/repo
>>> octohatrack: error: the following arguments are required: username/repo
pipenv run octohatrack LABHR/octohatrack
>>> Checking repo exists....Repo does not exist: LABHR/octohatrack
export GITHUB_TOKEN=xxxxxxxxxxxxxxxxx
pipenv run octohatrack LABHR/octohatrack
>>> Checking repo exists....
>>> Getting API Contributors................. etc etc working

cc: @Chandler-Song

local client install instructions not currently working

Client output when attempting to run locally:

amcasari-macbookpro% python3 -m octohatrack LABHR/octohatrack
amcasari-macbookpro% pip install octohatrack
Requirement already satisfied: octohatrack in /Users/amcasari/repos-mine/octohatrack (1.0.0a0)
Requirement already satisfied: requests in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from octohatrack) (2.24.0)
Requirement already satisfied: gitpython in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from octohatrack) (3.1.11)
Requirement already satisfied: requests-cache in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from octohatrack) (0.5.2)
Requirement already satisfied: idna<3,>=2.5 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from requests->octohatrack) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from requests->octohatrack) (1.25.10)
Requirement already satisfied: certifi>=2017.4.17 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from requests->octohatrack) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from requests->octohatrack) (3.0.4)
Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from gitpython->octohatrack) (4.0.5)
Requirement already satisfied: smmap<4,>=3.0.1 in /Users/amcasari/.pyenv/versions/3.8.6/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython->octohatrack) (3.0.4)
WARNING: You are using pip version 20.2.4; however, version 20.3.1 is available.
You should consider upgrading via the '/Users/amcasari/.pyenv/versions/3.8.6/bin/python3.8 -m pip install --upgrade pip' command.
amcasari-macbookpro% octohatrack --help
usage: octohatrack [-h] [--no-cache] [--wait-for-reset] [-v] username/repo

positional arguments:
  username/repo     the name of the repo to parse

optional arguments:
  -h, --help        show this help message and exit
  --no-cache        Disable local caching of API results
  --wait-for-reset  Enable waiting for rate limit reset rather than erroring
  -v, --version     show program's version number and exit
amcasari-macbookpro% python3 -m octohatrack amcasari/octohatrack
amcasari-macbookpro% octohatrack amcasari/octohatrack

Checking repo exists....Repo does not exist: amcasari/octohatrack

Does octohatrack include wiki edits?

I did a quick look and it looks like the project doesn't currently include/count wiki edits as a form of contribution. Did I miss something? Is this planned? If not, I could help implement that.

Investigate using the BigQuery cache instead of live GitHub

https://cloud.google.com/blog/products/gcp/github-on-bigquery-analyze-all-the-open-source-code
https://github.blog/2016-06-29-making-open-source-data-more-available/

Blockers would be:

if signup is required
if data octohatrack needs isn't included.

Make a CHANGELOG

http://keepachangelog.com/

Human readable, important information, etc.

Show also commits, additions and deletions of a contributor in the list

For example, this would be very handy for showing top10 contributors of a repo.

Add tests

Probably would mock github responses, and/or do a self live test.

Handle ratelimiting errors nicely

At the moment, if you hit the rate limit (60/hour untokened, 5000/hour tokened), the app will say the repo doesn't exist, instead of saying you've hit the limit. It should ideally also get the X-RateLimit-Reset header and return the time to try again, possibly with a (in xx minutes) message.

Enable run from local

Somehow in my packaging messes, I have somehow removed the ability to run octohat from the local folder. I need to work how to do a thing that makes it go ./octohat.py user/repo for local testing.

I think it's a case of making a octohat.py file that does the same as the packaging entrypoint

Optionally output all code contributors as well

The information is being gathered anyway, so it's no extra work to output this.

Bonus: only the top 100 code contributors are readily accessible from the github interface anyway, so this is super helpful

add octohat generated report to octohat readme

This would be very fancy.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0153' in position 1

This error is reproducible with python 2.7.9 on Debian 8.1:
U+0153 is LATIN SMALL LIGATURE OE: œ

$> octohat -g sebsauvage/MinigalNano
Collecting contributors....
Collecting commentors...........................................................
................................................................................                                                                              
Code contributions: 11
Non-code contributions: 28
jdedba
AnatomicJC
Riduidel
andersk
Niols
cepcasa
qwertygc
bitbybit
ygbillet
thomas30
Denys06
meh-uk
cborne
lanner
fufroma
k1ka
jabalv
10cre
indoushka
arthurlutz
farbrorlarry
nicosomb
Mermouy
fpp-gh
patcou
fredtantini
Knah-Tsaeb
danc
Traceback (most recent call last):
  File "/usr/local/bin/octohat", line 9, in <module>
    load_entry_point('octohat==0.1.3', 'console_scripts', 'octohat')()
  File "/usr/local/lib/python2.7/dist-packages/octohat/__init__.py", line 59, in
 main
    f.write(display_user_html(user, args))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0153' in position 1
97: ordinal not in range(128)

pull data from Bugzilla

This is totally a wishlist item.

we have an internal Bugzilla instance, it's even linked into other bug systems (big company)

There is an upcoming REST API to Bugzilla, but in the meantime there's the old XMLRPC one ( https://www.bugzilla.org/docs/4.4/en/html/api/Bugzilla/WebService/Server/XMLRPC.html ).

Through that, it should be possible to search for bugs on a particular product and extract who's filing and commenting on them.

Travis-CI GITHUB_TOKEN environment variable not found

The DEBUG function shows that the GITHUB_TOKEN isn't being seen by the script. I've setup this up as an secret environment variable as not to expose it to the public, but it should still be visible to the script.

The problem is that to get all the contributor information for octohat now takes more than 60 API calls, which means that without the token the self-referencing test cannot run. 😞

Exception when a user has no name

On https://github.com/coala/gh-board, the following all cause an exception:

{'user_name': 'domlobo', 'name': None}
{'user_name': 'greenkeeper[bot]', 'name': None}
{'user_name': 'brentirwin', 'name': None}
{'user_name': 'adjohnson916', 'name': None}
{'user_name': 'domlobo', 'name': None}
{'user_name': 'brentirwin', 'name': None}
{'user_name': 'ANURADHAJHA99', 'name': None}

All Contributors:
Traceback (most recent call last):
  File "/home/jayvdb/.local/bin/octohatrack", line 11, in <module>
    load_entry_point('octohatrack', 'console_scripts', 'octohatrack')()
  File "/home/jayvdb/projects/ghuser/octohatrack/octohatrack/__main__.py", line 53, in main
    display_results(repo_name, contributors, len(api))
  File "/home/jayvdb/projects/ghuser/octohatrack/octohatrack/helpers.py", line 16, in display_results
    for user in sorted(contributors, key=lambda k: k['name'].lower()):
  File "/home/jayvdb/projects/ghuser/octohatrack/octohatrack/helpers.py", line 16, in <lambda>
    for user in sorted(contributors, key=lambda k: k['name'].lower()):
AttributeError: 'NoneType' object has no attribute 'lower'

gh-pages landing page

I already have glasnt.github.io -> glasnt.com, so if glasnt/octohat gets a gh-pages branch, then it can be all shiny over on glasnt.com/octohat

I just need some shiny CSS and copy to make it look nice.

Unsure if this will turn into the landing point for #34 or not. Probably not, because static. 🤷

Output includeable HTML snippet with names

It'd be great if the HTML output could not have >title< in it and instead be a candidate for directly including into a web page somewhere rather than having to either edit it or have it in a separate file.

Migrate from argparse to click (minor)

I'd be nice to have pretty colours in the CLI without having to re-learn TERM colour codes.

Absolutely minor, and much lower priority than anything else open.

Ensure non-GitHub commiters are listed

For people with commits in a repo that do not have an associated GitHub account (e.g. git-based commits made outside of GitHub, particularly for repos imported that predate GitHub), octohatrack should ensure they are listed in the contributors list.

Timeouts / running out of tokens

Hi there,

I'm trying to run this on fchollet/keras. It's an active project. I am using a github token to increase the number of requests, but it keeps running out of tokens anyway. I'm not sure if the results from previous runs are cached (re-runs will help) or not (re-runs don't help).

I could use some advice on how to run this on a bigger project.

Thanks,
-Tennessee

GitHub Reactions API integration

https://developer.github.com/changes/2016-06-07-reactions-api-update/

Still in preview mode, but once it's official, octohatrack should be updated to suit.

Upgrade octohatrack to use GraphQL

I have an proof of concept implementation of octohatrack using GitHub's GraphQL API v4

proof of concept graphql implementation

As per the limitations, because the API won't allow more than the last 100 results for any item, the results are limited.

But compared to octohatrack 0.6.1 it's just a touch faster.

$ time octohatrack labhr/octohatrack
All Contributors: 31

real	2m47.853s

$ time python octohatrack_graphql.py labhr/octohatrack
# 28 results returned

real	0m3.106s

Just a touch.

The numbers are out because I haven't included the CONTRITUBORS file contributors, but that's just a case of appending the existing functionality in contributors_file.py

To Do:

get feature partity
release a beta command-line version to pypi
either:
- migrate functionality to javascript, migrating graphql functionality to web presence
- using command line/python version, make a basic python server with functionality
Advanced:
- using a persistent storage web server, make a badge server (a suggestion by @phildini, having a badge like , instead of the shields.io version )

octohat as a service

*dun dun dun*

Ensure co-authors are supported

https://twitter.com/shiftkey/status/958081366469492736

https://github.com/blog/2496-commit-together-with-co-authors

Pending checking how the api presents these to rest requests

cleanup flags for 1.0

Major releases can totally break backwards compatibility, right? 😄

General plan:

remove the optional nature of showing contributor names and code contributors
- results in a full list for each invocation
remove the HTML generation[0]
keep the disabling of cache an option

[0] Yes, this is contentious. As BDFL[1], I reckon that the amazing work in LABHR/js-hatrack in regards to generating visual representation of contributions overrides anything my hacky python CLI can generate.

[1] I think I can call myself this, in jest[2]

[2] Are nested footnotes legal?

Internal rework

Because pagination is hard, I'd like to clean up the internals and possibly use https://github.com/sigmavirus24/github3.py as a replacement for octohub

Not having to embed things would be a plus. And I believe when I was choosing a github wrapper library I discounted github3.py because the github releases aren't being kept up to date, but it's hit 1.0.0 on pypi, and being actively maintained, so yay!

Segmentation fault when trying to install through pip3

Environment:

Ubuntu 17.10
pip3

Executed command:
pip3 install octohatrack

Expected output: Installed package.

Actual output:

Collecting octohatrack
Collecting gitpython (from octohatrack)
  Using cached GitPython-2.1.8-py2.py3-none-any.whl
Collecting requests (from octohatrack)
  Using cached requests-2.18.4-py2.py3-none-any.whl
Collecting simplejson (from octohatrack)
Collecting gitdb2>=2.0.0 (from gitpython->octohatrack)
  Using cached gitdb2-2.0.3-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->octohatrack)
  Using cached idna-2.6-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->octohatrack)
  Using cached urllib3-1.22-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->octohatrack)
  Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->octohatrack)
  Using cached certifi-2017.11.5-py2.py3-none-any.whl
Collecting smmap2>=2.0.0 (from gitdb2>=2.0.0->gitpython->octohatrack)
  Using cached smmap2-2.0.3-py2.py3-none-any.whl
Installing collected packages: smmap2, gitdb2, gitpython, idna, urllib3, chardet, certifi, requests,     simplejson, octohatrack
Successfully installed certifi-2017.11.5 chardet-3.0.4 gitdb2-2.0.3 gitpython-2.1.8 idna-2.6     octohatrack-0.6.1 requests-2.18.4 simplejson-3.13.2 smmap2-2.0.3 urllib3-1.22
Segmentation fault (core dumped)

Labelling of sections of contributors

I've been mulling over this after conversations I've had with @ossanna16; I don't think I'm using the correct naming conventions for the sections of the report.

I think it should be renamed, but I'm unsure on the new names.

I have two thoughts for this at the moment:

Option One: Keep the separation, but rename the sections

Repo: (repo_name)
GitHub contributors:
[list a]

All Contributors:
[list b]

GitHub Contributors: len(list a)
All Contributors: len(list b)

Where [list a] is the list defined by GitHub, and [list b] is everything else

Option Two: make the CLI match the web interface

Repo: (repo_name)
Contributors: 
[list]

Contributors (as per GitHub): len(list a)
All Contributors: len(list)

In this form, len(list a) would be the original list as per the GitHub contributor API endpoint, but the entire list would be an alphabetic list of all contributors (maybe with the 👏 emoji at the end for what was previously defined as 'non-coding' contributors)

@ossanna16: if you have the time, I'd appreciate your input on this.

Reduce caching

I just let octohatrack run on a moderately large repository and the cache file easily reached 1MB after two runs, which is a bit large for inspection (especially since dumping everything in one line makes syntax highlighting take a while …). I think reducing the amount of data written to the cache file might be a good idea :)

Have actual tests

Needs to test for: a full up repo, a repo with no activity, generate_html, and probably some sort of unit testing.

Make --generate-html option, and default to false

The app should not create files locally unless asked, and should only output to stdout in the default case

Error while 'Collecting wiki contributors' from phpdocbrbridge/traducao

Hello, @glasnt!

I got this error in v0.5.0 while using trying to run octohatrack against https://github.com/phpdocbrbridge/traducao.

via pip installed version on OS X

$ octohatrack --version

octohatrack version 0.5.0

$ octohatrack phpdocbrbridge/traducao

Collecting API contributors...
Collecting all repo contributors...
Collecting wiki contributors.....'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)

via Docker

$ docker run \
    --interactive \
    --tty \
    --rm \
    --env GITHUB_TOKEN=$GITHUB_TOKEN \
    --volume $HOME/.octohatrack:/cache \
     \
    rogeriopradoj/octohatrack:latest \
    \
     --version

octohatrack version 0.5.0

$ docker run \
    --interactive \
    --tty \
    --rm \
    --env GITHUB_TOKEN=$GITHUB_TOKEN \
    --volume $HOME/.octohatrack:/cache \
     \
    rogeriopradoj/octohatrack:latest \
    \
     phpdocbrbridge/traducao

Collecting API contributors...
Collecting all repo contributors...
Collecting wiki contributors....Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/git/cmd.py", line 614, in execute
    **subprocess_kwargs
  File "/usr/local/lib/python3.5/subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.5/subprocess.py", line 1544, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'git'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/octohatrack", line 9, in <module>
    load_entry_point('octohatrack==0.5.0', 'console_scripts', 'octohatrack')()
  File "/usr/local/lib/python3.5/site-packages/octohatrack/__init__.py", line 50, in main
    wiki_contributors = get_wiki_contributors(repo_name)
  File "/usr/local/lib/python3.5/site-packages/octohatrack/wiki.py", line 54, in get_wiki_contributors
    repo = Repo.clone_from(wiki_url, tmp_folder)
  File "/usr/local/lib/python3.5/site-packages/git/repo/base.py", line 957, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/git/repo/base.py", line 898, in _clone
    v=True, **add_progress(kwargs, git, progress))
  File "/usr/local/lib/python3.5/site-packages/git/cmd.py", line 459, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/git/cmd.py", line 920, in _call_process
    return self.execute(make_call(), **_kwargs)
  File "/usr/local/lib/python3.5/site-packages/git/cmd.py", line 617, in execute
    raise GitCommandNotFound(str(err))
git.exc.GitCommandNotFound: [Errno 2] No such file or directory: 'git'

Date Range limitation

https://github.com/pydanny/contributors has a date-range limitation, which is quite useful for working out contributors to sprints and other time-based things.

octohatrack does have a --limit flag already, but that's for "x most recent issues/PRs"

Date ranges would be useful.

Reference: https://twitter.com/glasnt/status/741102742119157761

Rate limited boom.

Probably should check to see if the GITHUB_TOKEN is set, and stop (unless there's a --forceset)

because

octohatrack -c -n boot2docker/boot2docker
Collecting contributors....
Collecting commentors................Traceback (most recent call last):
  File "octohatrack.py", line 2, in <module>
    octohatrack.main()
  File "/usr/src/app/octohatrack/__init__.py", line 25, in main
    code_commentors = get_code_commentors(repo_name, args.limit)
  File "/usr/src/app/octohatrack/helpers.py", line 44, in get_code_commentors
    users.append(get_user("/repos/%s/pulls/%d" % (repo_name, index)))
  File "/usr/src/app/octohatrack/helpers.py", line 86, in get_user
    entry = get_data(uri)
  File "/usr/src/app/octohatrack/helpers.py", line 55, in get_data
    resp = conn.send("GET", uri)
  File "/usr/src/app/octohatrack/connection.py", line 71, in send
    return parse_response(response)
  File "/usr/src/app/octohatrack/response.py", line 103, in parse_response
    raise ValueError(message)
ValueError: You have run out of GitHub request tokens. Set a GITHUB_TOKEN to increase your limit to 5000/hour. Try again in ~44 minutes.

just looks ew.

add octohat related talk and article links

Create a list of known github robots to exclude

There are some robots that automatically create issues and pull requests that probably shouldn't be attributed as contributions.

From what I can tell, these include:

gitter-badger
a travis-ci bot of sorts
probably a circle-ci bot as well

Ideally, there should be a file of a list of robots that are removed from the two contributors before they are compared

More badges!

http://shields.io/ has setup.py-sourced badges!

Eg.

https://img.shields.io/pypi/l/octohatrack.svg

https://img.shields.io/pypi/pyversions/octohatrack.svg

Issues of optional arguments

When I enter octohatrack -h, the result is as follows
optional arguments:
-h, --help show this help message and exit
--no-cache Disable local caching of API results
-v, --version show program's version number and exit
-l 10, --limit 10 Limit to the last x Issues/Pull Requests
Which version should I use or what should I do to get the optional arguments as shown in the readme?

Review if octohatrack is required in a Contributors world

Following the GitHub Satellite 2019 keynote, I want to review what that native functionality adds, and if octohatrack adds any value any more.

add --version flag

Helpful for debugging and versioning

Really would like some kind of partial data caching

So that I can run the info on a large repo, without constantly re-requesting everything.

Unbreak local invocation mechanism

I think I need to find someone one to explain this one to me IRL because I'm not getting it.

What I want:

pip install octohatrack; octohatrack blah/blah to "just work" as a command-line tool
git clone https://github.com/labhr/octohatrack; cd octohatrack; python3 somefile.py blah/blah to "just work" as an invocation of the local code

What I have is octohatrack.py which goes

import octohatrack
octohatrack.main()

But this doesn't run on systems without octohatrack the package already installed.

pkg_resources.DistributionNotFound: The 'octohatrack' distribution was not found and is required by the application

What I think might be happening is that I've done doofed the __init__.py file, so that I can't import the local file functionality. The current setup works gives a valid setup.py and installation mechanism, but it just doesn't work for local invocation.

However, if you install octohatrack locally, the code in the local package doesn't actually run (provable by changing the current directory files to change the functionality, and have those changes be seen in the output.

I think this is something to do with modules, packages, setup.py's, and the hackery I did in #1

Add automated self report update

Make the section of the repo that self-reports the contributors on this report self update.

Similar to the workflow in glasnt/glasnt/.github/workflows/build.yaml, probably on a weekly schedule.

The update process should replace a specially named section in the README, and commit it to the repo. If there's no change, then no commit is made.

Investigate using sqlite

Potentially then able to use datasette for ad-hoc analytics

AttributeError: module 'octohatrack' has no attribute 'main'

With e1b52cd,

# docker build -t octohatrack .
.....
# docker run -e GITHUB_TOKEN octohatrack
Traceback (most recent call last):
  File "octohatrack.py", line 2, in <module>
    octohatrack.main()
AttributeError: module 'octohatrack' has no attribute 'main'