Coder Social home page Coder Social logo

iqtlabs / gitgeo Goto Github PK

View Code? Open in Web Editor NEW
11.0 5.0 9.0 1.32 MB

Discover the geography of open-source software.

License: Apache License 2.0

Python 99.82% Dockerfile 0.18%
open-source-community git github pypi pypi-package geography open-source-software open-source-community-management osint

gitgeo's Introduction

GitHub Actions Unit Tests codecov Codacy Badge pylint Score Python Versions Supported CodeQL security: bandit Code style: black

GitGeo

Discover the geography of open-source software. Explore the geographic locations of software developers associated with a GitHub repository or a Python (PyPI) package.

See, for instance, the geography of the contributors to the Python package requests.

map_image

Why use GitGeo?

  • Curiosity
  • Open source software community management
  • Research on open source software ecosystems
  • IT security compliance

Related Writing

Installation

pip install gitgeo

Or:

git clone https://github.com/IQTLabs/GitGeo

Usage

(requires internet connection)

  • First, create one or more GitHub personal access tokens.

  • Second, run these commands in the command line to set environmental variables:

    export GITHUB_USERNAME='[github_username]'
    export GITHUB_TOKEN='[github_token]'
  • Alternatively, to use multiple tokens, create a file called tokens.txt in the code's directory and enter a GitHub personal access token on each line.

  • Third, run these commands in the command line:

gitgeo --package [package_name]

gitgeo --repo [github_repo_url]

For example:

>>> gitgeo --package requests

-----------------
PACKAGE: requests
-----------------
CONTRIBUTOR, LOCATION
* indicates PyPI maintainer
---------------------
kennethreitz42 | Virginia, USA
Lukasa * | London, England
sigmavirus24 | Madison, WI
nateprewitt * | None
slingamn | None
BraulioVM | Malaga & Granada, Spain
dpursehouse | Kawasaki
jgorset | Oslo, Norway
...

Or:

>>> gitgeo --repo www.github.com/psf/requests

-----------------
GITHUB REPO: psf/requests
-----------------
CONTRIBUTOR, LOCATION
---------------------
kennethreitz42 | Virginia, USA | United States
Lukasa | London, England | United Kingdom
sigmavirus24 | Madison, WI | United States
nateprewitt | None | None
...

There are other command line options too:

Add --summary to get the results summarized by country. e.g.

>>> gitgeo --package requests --summary

-----------------
PACKAGE: requests
GITHUB REPO: psf/requests
-----------------
COUNTRY | # OF CONTRIBUTORS
---------------------------
United States 37
None 23
United Kingdom 4
Canada 4
Germany 4
Switzerland 4
Spain 2
Russia 2
...

Add --map when using the --repo option to create an html map saved in the results folder. See image above for static example. Real map includes zooming and tooltip capability.

Add --ouput_csv to output csv of results to results folder.

To create a csv of contributors from many repositories, enter repositories on separate lines in the repos.txt file. Then use the --multirepo flag.

Add multirepo_map and then a filename to create a map of csv ouput. csv output must be located in the results folder.

Add --num and specify a multiple of 100 from 100 (default) to 500 to specify the number of contributors analyzed per repo.

Run tests:

pytest

Want to contribute?

  • Open a PR. We are glad to accept pull requests. We use black and pylint and pydocstyle, though we are glad to help if you haven't used those tools before.
  • Open an issue. Tell us your problem or a functionality you want.
  • Want to help build a community related to GitGeo and similar open source software ecosystem exploration tools? Please send an email to [email protected].

gitgeo's People

Contributors

codacy-badger avatar dependabot[bot] avatar jbenjoseph avatar jspeed-meyers avatar kdobolyi avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gitgeo's Issues

Multirepo scan capability now broken

Describe the bug
The multirepo scan capability is currently broken.

To Reproduce
gitgeo --multirepo

Expected behavior
Scan the repos specified in repos.txt, returning a csv of contributors and their locations.

Screenshots

Traceback (most recent call last):
  File "/Users/jmeyers/opt/anaconda3/bin/gitgeo", line 8, in <module>
    sys.exit(main())
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/main.py", line 123, in main
    scan_multiple_repos(num=args.num)
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/multi_repo_scan.py", line 26, in scan_multiple_repos
    create_csv("multirepo", timestamp)
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/custom_csv.py", line 23, in create_csv
    with open(filename, "w", encoding="utf-8", newline="") as file:
FileNotFoundError: [Errno 2] No such file or directory: 'results/multirepo_20211104-093528.csv'

Desktop (please complete the following information):

  • OS: macOS Catalina 10.15.7
  • Python 3.7.4

Additional context
I suspect the PyPI-ification of this package--specifically placing the Python files in a gitgeo sub-directory--is preventing the GitGeo tools from accessing the files in the results directory. Can you please advise and, if possible, fix?

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

GitGeo PyPI Package Bug?

Describe the bug
When using the PyPI package of GitGeo, I get an error.

To Reproduce

  1. pip install gitgeo
  2. gitgeo --help

Expected behavior
This output:

usage: main.py [-h] [--package PACKAGE] [--repo REPO] [--multirepo]
               [--multirepo_map MULTIREPO_MAP] [--summary] [--output_csv]
               [--map] [--num {100,200,300,400,500}]

optional arguments:
  -h, --help            show this help message and exit
  --package PACKAGE     Specify Python (PyPI) package.
  --repo REPO           Specify GitHub repo.
  --multirepo           Scan multiple repos from input file.
  --multirepo_map MULTIREPO_MAP
                        Convert mutlirepo scan file into map.
  --summary             Display results by country.
  --output_csv          Output results in csv.
  --map                 Display country by country results in map.
  --num {100,200,300,400,500}
                        Specify max number of contributors per repo.

The Error I Get

Traceback (most recent call last):
  File "/Users/jmeyers/opt/anaconda3/bin/gitgeo", line 5, in <module>
    from gitgeo.main import main
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/main.py", line 6, in <module>
    from gitgeo.mapping import make_map
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/mapping.py", line 16, in <module>
    from gitgeo.geolocation import get_country_from_location
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/geolocation.py", line 3, in <module>
    from gitgeo.geographies_list import (
  File "/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/geographies_list.py", line 41, in <module>
    with open(Path(__file__).with_name("country_codes.csv"), errors="ignore", newline="") as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/jmeyers/opt/anaconda3/lib/python3.7/site-packages/gitgeo/country_codes.csv'

Desktop (please complete the following information):

  • OS: OS X 10.15.7
  • Python 3.7.4

Any help would be appreciated!

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update dependency pandas to v1.5.3
  • Update dependency requests to v2.28.2
  • Update actions/checkout action to v3
  • Update actions/setup-python action to v4
  • Update codecov/codecov-action action to v3
  • Update dependency pytest to v7
  • Update github/codeql-action action to v2
  • ๐Ÿ” Create all rate-limited PRs at once ๐Ÿ”

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

dockerfile
Dockerfile
github-actions
.github/workflows/codeql-analysis.yml
  • actions/checkout v2
  • github/codeql-action v1
  • github/codeql-action v1
  • github/codeql-action v1
.github/workflows/pypi.yml
  • actions/checkout v2
.github/workflows/python-package.yml
  • actions/checkout v2
  • actions/setup-python v2
  • codecov/codecov-action v1
pip_requirements
requirements.txt
  • beautifulsoup4 ==4.10.0
  • folium ==0.12.1
  • pandas ==1.3.4
  • pytest ==6.2.5
  • requests ==2.26.0

  • Check this box to trigger a request for Renovate to run again on this repository

GitHub Actions Tests Failing

Describe the bug
Test case failing.

To Reproduce
See GitHub actions log.

Expected behavior
Tests should pass.

@kdobolyi, can you please fix the tests or codebase so that the GitHub actions tests pass.

Desktop (please complete the following information):

  • GitHub actions runs Linux Ubuntu 20.

Additional context
I think the new PR has tests that don't pass yet.

Tests Fail Locally

Describe the bug
The tests fail locally.

To Reproduce
git clone https://www.github.com/iqtlabs/gitgeo
cd gitgeo
pytest

Expected behavior
The tests should pass locally.

Desktop (please complete the following information):

  • OS: macOS Catalina
  • Python 3.7.4

Additional context
Three tests fail:

FAILED tests/test_gitgeo.py::TestMapping::test_add_contributor_count_to_json - FileNotFoundError: [Errno 2] No such file or directory: 'world.json'
FAILED tests/test_gitgeo.py::TestMapping::test_make_map_from_repo - FileNotFoundError: [Errno 2] No such file or directory: 'world.json'
FAILED tests/test_gitgeo.py::TestMapping::test_make_map_from_csv - FileNotFoundError: [Errno 2] No such file or directory: 'world.json'

Request
@jbenjoseph, can you please either help me fix this or provide a PR? The bug clearly has to do with the Path-related functionality, but I don't understand that functionality that well. Thank you!

Use of command line arguments

Describe the bug
I tried to access the command line arguments, specifically the help argument and GitGeo returned nothing. Is this the intended behavior?

To Reproduce
python main.py --help

Expected behavior
I expect this output:

usage: main.py [-h] [--package PACKAGE] [--repo REPO] [--multirepo]
               [--multirepo_map MULTIREPO_MAP] [--summary] [--output_csv]
               [--map] [--num {100,200,300,400,500}]

optional arguments:
  -h, --help            show this help message and exit
  --package PACKAGE     Specify Python (PyPI) package.
  --repo REPO           Specify GitHub repo.
  --multirepo           Scan multiple repos from input file.
  --multirepo_map MULTIREPO_MAP
                        Convert mutlirepo scan file into map.
  --summary             Display results by country.
  --output_csv          Output results in csv.
  --map                 Display country by country results in map.
  --num {100,200,300,400,500}
                        Specify max number of contributors per repo.

Screenshots
I got no output.

Desktop (please complete the following information):

  • OS: macOS Catalina 10.15.7
  • Python 3.7.4

Additional context
I replaced the main() function function definition line in main.py with this line:

if __name__ == "__main__":

and the program performed as I expected. Is it okay if I replace this line? Or was this change made because of the recent PyPI-package-ification changes? Thank you, @jbenjoseph!

Potentially Use CLAVIN for Geoparsing

https://github.com/Novetta/CLAVIN

This repo from Novetta does geoparsing ("Paris, France" --> "France") and is likely superior to our quick and dirty geoparser. While the code is in Java, I think, there is also the ability to run the REST version of this code as a docker container.

There is no need to pursue this currently. If there is strong interest in improving GitGeo's geoparsing, then this is simply one option to be evaluated.

@kdobolyi, for your situational awareness. Peter Bronez brought this to my attention because Charlie Greenbacker was a contributor to CLAVIN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.