Coder Social home page Coder Social logo

zkoppert / innersource-crawler Goto Github PK

View Code? Open in Web Editor NEW
30.0 3.0 20.0 135 KB

This project creates a repos.json that can be utilized by the SAP InnerSource Portal.

License: MIT License

Python 75.96% Dockerfile 10.96% Makefile 13.08%
python innersource innersource-commons github-actions innersource-portal actions hacktoberfest

innersource-crawler's Introduction

InnerSource Crawler

CodeQL

This project creates a repos.json that can be utilized by the SAP InnerSource Portal. The current approach assumes that the repos that you want to show in the portal are available in a GitHub organization, and that they all are tagged with a certain topic.

Support

If you need support using this project or have questions about it, please open up an issue in this repository. Requests made directly to GitHub staff or support team will be redirected here to open an issue. GitHub SLA's and support/services contracts do not apply to this repository.

Use as a GitHub Action

  1. Create a repository to host this GitHub Action or select an existing repository
  2. Create the env values from the sample workflow below (GH_TOKEN, ORGANIZATION) with your information as repository secrets. More info on creating secrets can be found here. Note: Your GitHub token will need to have read/write access to all the repositories in the organization
  3. Copy the below example workflow to your repository and put it in the .github/workflows/ directory with the file extension .yml (ie. .github/workflows/crawler.yml)
  4. Don't forget to do something with the resulting repos.json file. You can move it to another repository if needed or save it as a build artifact. This will all depend on what you are doing with it and what repository you are running this action out of.

Example workflow

name: InnerSource repo crawler

on:
  workflow_dispatch:
  schedule:
    - cron: '00 5 * * *'

jobs:
  build:
    name: InnerSource repo crawler
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v2
    
    - name: Run crawler tool
      uses: docker://ghcr.io/zkoppert/innersource-crawler:v1
      env:
        GH_TOKEN: ${{ secrets.GH_TOKEN }}
        ORGANIZATION: ${{ secrets.ORGANIZATION }}
        # for multiple topics, add them after a comma eg:
        # TOPIC: inner-source,actions,security,python
        TOPIC: inner-source

Local usage without Docker

  1. Copy .env-example to .env
  2. Fill out the .env file with a token from a user that has access to the organization to scan (listed below). Tokens should have admin:org or read:org access.
  3. Fill out the .env file with the exact topic name you are searching for
  4. Fill out the .env file with the exact organization that you want to search in
  5. (Optional) Fill out the .env file with the exact URL of the GitHub Enterprise that you want to search in. Keep empty if you want to search in the public github.com.
  6. pip install -r requirements.txt
  7. Run python3 ./crawler.py, which will create a repos.json file containing the relevant metadata for the GitHub repos for the given topic
  8. Copy repos.json to your instance of the SAP-InnerSource-Portal and launch the portal as outlined in their installation instructions

License

MIT

innersource-crawler's People

Contributors

3cpt avatar dependabot[bot] avatar issamansur avatar jasonmacgowan avatar mcantu avatar sicot-f avatar spier avatar zkoppert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

innersource-crawler's Issues

What kind of permission is needed ?

When we use this script, which permission should we define when creating the GH_TOKEN token ?

Do we only need the admin:org / read:org right ?

Fix linting issues

This application has the following linting issues that should be resolve:

  • crawler.py:1:0: C0114: Missing module docstring (missing-module-docstring)
  • crawler.py:40:24: C0209: Formatting a regular string which could be a f-string (consider-using-f-string)
  • crawler.py:47:22: C0209: Formatting a regular string which could be a f-string (consider-using-f-string)
  • crawler.py:95:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)

repositories with no description cause an error

Repositories that don't have a description cause the following error:

Traceback (most recent call last):
  File "/action/workspace/crawler.py", line 84, in <module>
    innersource_repo["score"] = repo_activity.score.calculate(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.[11](https://github.com/innersource-portal-demo/project-portal-for-innersource/actions/runs/6279610170/job/17055587875#step:4:12)/site-packages/repo_activity/score.py", line 54, in calculate
    len(repo["description"]) > 30
    ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

A workaround is to add a description but it would be better if the crawler could handle repositories without descriptions gracefully and continue running.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.