Coder Social home page Coder Social logo

apache / incubator-devlake Goto Github PK

View Code? Open in Web Editor NEW
2.4K 48.0 471.0 34.84 MB

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

Home Page: https://devlake.apache.org/

License: Apache License 2.0

JavaScript 0.03% Shell 0.29% Dockerfile 0.21% Makefile 0.08% Go 87.22% HTML 0.02% TypeScript 9.81% CSS 0.03% Python 2.30%
data data-analysis data-engineering data-integration data-transfers devops domain-layer etl golang integration

incubator-devlake's Introduction



Apache DevLake(Incubating)

PRs Welcome Dockerhub pulls unit-test Go Report Card Slack Twitter


๐Ÿค” What is Apache DevLake?

Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth.

Apache DevLake is designed for developer teams looking to make better sense of their development process and to bring a more data-driven approach to their own practices. You can ask Apache DevLake many questions regarding your development process. Just connect and query.

๐ŸŽฏ What can be accomplished with Apache DevLake?

  1. Your Dev Data lives in many silos and tools. DevLake brings them all together to give you a complete view of your Software Development Life Cycle (SDLC).
  2. From DORA to scrum retros, DevLake implements metrics effortlessly with prebuilt dashboards supporting common frameworks and goals.
  3. DevLake fits teams of all shapes and sizes, and can be readily extended to support new data sources, metrics, and dashboards, with a flexible framework for data collection and transformation.

๐Ÿ‘‰ Live Demo

DORA Dashboard

All Dashboards

๐Ÿ’ช Supported Data Sources

Here you can find all data sources supported by DevLake, their scopes, supported versions and more!

๐Ÿš€ Getting Started

๐Ÿค“ How do I use DevLake?

1. Set up DevLake

You can set up Apache DevLake by following our step-by-step instructions for Install via Docker Compose or Install via Helm. Please see detailed instructions here, and ask the community if you get stuck at any point.

2. Create a Blueprint

The DevLake Configuration UI will guide you through the process (a Blueprint) to define the data connections, data scope, transformation and sync frequency of the data you wish to collect.

3. Track the Blueprint's progress

You can track the progress of the Blueprint you have just set up.

4. View the pre-built dashboards

Once the first run of the Blueprint is completed, you can view the corresponding dashboards.

5. Customize the dashboards with SQL

If the pre-built dashboards are limited for your use cases, you can always customize or create your own metrics or dashboards with SQL.

๐Ÿ˜ How to Contribute

Please read the contribution guidelines before you make contribution. The following docs list the resources you might need to know after you decided to make contribution.

๐Ÿ“„ Contributing to Documentation:

โŒš Roadmap

  • Roadmap: Detailed roadmaps for DevLake.

๐Ÿ’™ Community

๐Ÿ“„ License

This project is licensed under Apache License 2.0 - see the LICENSE file for details.

incubator-devlake's People

Contributors

abeizn avatar basicthinker avatar camilleteruel avatar coldgust avatar d4x1 avatar e2corporation avatar hezyin avatar keon94 avatar kevin-kline avatar klesh avatar leric avatar liangjingyang avatar likyh avatar long2ice avatar mappjzc avatar matrixji avatar merico-devlake avatar mindlesscloud avatar mintsweet avatar narrowizard avatar oddscenes avatar perhapzz avatar pranshu-raj avatar snowmoon-dev avatar startrekzky avatar thenicetgp avatar warren830 avatar xgdyp avatar yumengwang03 avatar zhangning10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-devlake's Issues

npm run commit has some issues?

@narrowizard when I run the comman npm run commit,

If get this message:

hint: The '.husky/commit-msg' hook was ignored because it's not set as executable.
hint: You can disable this warning with `git config advice.ignoredHook false`.

Is this something that can be ignored? Or do you want this husky thing running?

Allow developers to not pull all data all the time

As a developer, I want to call the api urls but only get the data that I really want for the component I am working on. I would like a config somethign like this:

jira: {
    host: 'https://merico.atlassian.net',
    basicAuth: 'anVzdGluLmJyYXplYXVAbWVyaWNvLmRldjpMTVlPVnNtZXhXekZlNW5sakdMY0VERDU=',
    proxy: 'http://localhost:4780',
    timeout: 15000,
    maxPagesForTest: 2,
    skipIssueCollection: false,
    skipChangelogCollection: true
  },

differences between typescript app and javascript app

typescript branch is ts-main
javascript branch is main

collector and enrichment

  • same: add plugins to implement collector and enricher

  • different: the task executor would not care about task type.

we may implement collector and enricher separately but enricher is high coupling with collector.

eg. the jira leadtime enrichment is high coupling with jira issue and jira issue changelog collector.

changes:

  • no collector service and enrichment service only one queue service for task executing
  • collector and enrichment would implement same interface for task executing

todo:

  • move the collector and enrichment code from javascript branch and implement the interface
  • compose the task executing, do enrichment after collector finished

source and source type

  • same: the source type is the plugin type/name. we would have jira and gitlab recently
  • different: one source type could have multi sources

js branch use config to load source temperately. it makes one source type can not have one source. and should restart app for change. now we use rest api and source table to save source. and trigger source task with source id.

changes:

  • one rest api service
  • add source with rest api
  • trigger source tasks with rest api

todo:

  • add rest apis referenced above

schema and migrations

  • different: use typeorm which is more friendly to typescript ORM Comparison (Typeorm vs Sequelize)

  • different: plugins would manage their own data schemas and the migrations

  • different: plugins would have collector schema and enrichment schema

TODO:

  • add typeorm entities in plugins
  • add migrate process while initial plugins in queue services

why nestjs

Nest (NestJS) is a framework for building efficient, scalable Node.js server-side applications. It uses progressive JavaScript, is built with and fully supports TypeScript (yet still enables developers to code in pure JavaScript) and combines elements of OOP (Object Oriented Programming), FP (Functional Programming), and FRP (Functional Reactive Programming).

easy to setup mono repo. easy to add rest api, graphql api, websocket server and microservice apps.

dependency injection. and has test tools with dependency injection in the testing environment for easily mocking components

easy for angular developer and java spring boot developer to start.

MIT-licensed open source project.

No data to calculate MR Review time

Data is missing to calculate 'MR Review time'.
No record in table 'gitlab_merge_requests' has a valid 'first_comment_time'.
The DB I connected is 13.212.147.78:45432.

Standardize how we lint code locally

We need a solution for the linting problem. The goal is to standardize how we do linting locally so we never see the linting changes in PR in the future.

Rebase failed due to 'create file grafana/dashboards/Home.json permission denied'

Hi, all

I was trying to rebase, and failed with following message:

[klesh@klesh-laptop] ~/P/m/t/lake (gitlab|โœ” )> git pull --rebase upstream main                                                                                                                                              25.064s
remote: Enumerating objects: 272, done.
remote: Counting objects: 100% (272/272), done.
remote: Compressing objects: 100% (116/116), done.
remote: Total 272 (delta 165), reused 238 (delta 142), pack-reused 0
Receiving objects: 100% (272/272), 185.07 KiB | 35.00 KiB/s, done.
Resolving deltas: 100% (165/165), completed with 22 local objects.
From https://github.com/merico-dev/lake
 * branch            main       -> FETCH_HEAD
   28d3631..aaa678c  main       -> upstream/main
First, rewinding head to replay your work on top of it...
error: unable to create file grafana/dashboards/Home.json: Permission denied
fatal: Could not detach HEAD

Possible reason: folders were created by docker, and they belong to root

Decide about whether we need resolution time

Theres some calculations going on in grafana with lead times and resolution times. But we dont currently collect resolution times from Jira.

We should gather resolution times, as well as lead times (currently only collect lead times)

Hook up Jira API to get all issues data

Collect issues for JIRA and save it in raw form

  • Get a jira access token for their API
  • store that token in a gitignored config file
  • update the readme so others can get this token as well
  • Create a file like /src/collectors/jira/index.js
  • Create a file like /src/collectors/jira/issues.js
  • Call the Jira API to get a list of issues given a project id?
  • Use the file test/test-docker-compose, as an example to connect to mongo db
  • create a file in db/connections/mongo.js that handles all saves of raw data
  • Store the data in mongoDB locally with that file

Collector should write data with data schema

  • we should defined the data schema
  • the data schema would simple handle some validator and primary keys
    • for 'mongodb' the primary keys should generate to '_id'
    • maybe we would use some ORM later. eg TypeORM
  • the schema would have an abstract implement
    • jira schemas would extends from abstract schema
    • and gitlab
    • and github

Failed to enrich TypeError: Cannot read property 'id' of undefined

I got a error message as bellow :
image

I found it in source code, and I thought that error maybe caused no data in boardsCollection.
That mean, the collector maybe has failed when syncing data.
There's whole logs in the end of this issue.

image

[0] Publishing task to collection {"jira":{"boardId":8},"gitlab":{"projectId":8967944}}
[1] mongodb://lake:lakeIScoming@mongodb:27017/lake?authSource=admin
[1] INFO >>> jira collecting { boardId: 8 }
[1] INFO >>> jira collecting board 8
[1] INFO >>> gitlab collecting { projectId: 8967944 }
[1] INFO >>> gitlab collecting project 8967944
[1] INFO >>> jira fetching data from agile/1.0/board/8 #0
[1] INFO >>> gitlab fetching data from projects/8967944 #0
[1] INFO >>> gitlab fetched data from projects/8967944
[1] INFO >>> gitlab collecting project done! 8967944
[1] INFO >>> gitlab collecting commits for project 8967944
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=1 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #1
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=1
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=2 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #2
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=2
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=3 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #3
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=3
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=4 #0
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=4
[1] INFO >>> jira fetching data from agile/1.0/board/8 #4
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=5 #0
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=5
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=6 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #5
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=6
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=7 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #6
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=7
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=8 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #7
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=8
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=9 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #8
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=9
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=10 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #9
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=10
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=11 #0
[1] Failed to collect Error: INFO >>> jira fetching data failed: retry limit exceeding
[1] at Object.fetch (/var/www/lake/src/plugins/jira-pond/src/collector/fetcher.js:33:11)
[1] at runMicrotasks (<anonymous>)
[1] at processTicksAndRejections (internal/process/task_queues.js:95:5)
[1] at async collectByBoardId (/var/www/lake/src/plugins/jira-pond/src/collector/boards.js:16:20)
[1] at async Object.collect (/var/www/lake/src/plugins/jira-pond/src/collector/boards.js:10:3)
[1] at async Object.collect (/var/www/lake/src/plugins/jira-pond/src/collector/index.js:6:3)
[1] at async Object.exec [as jira] (/var/www/lake/src/plugins/jira-pond/index.js:13:7)
[1] at async Promise.all (index 0)
[1] at async jobHandler (/var/www/lake/src/collection/worker.js:20:5)
[1] at async channel.consume.noAck (/var/www/lake/src/queue/consumer.js:24:9)
[1] INFO >>> closing mongo connection
[2] Publishing task to enrichment {"jira":{"boardId":8},"gitlab":{"projectId":8967944}}
[3] mongodb://lake:lakeIScoming@mongodb:27017/lake?authSource=admin
[3] INFO >>> recieve enriche job
[3] INFO >>> jira enriching board 8
[3] INFO >>> gitlab enriching { projectId: 8967944 }
[3] INFO >>> gitlab enriching project 8967944
[3] Failed to enrich TypeError: Cannot read property 'id' of undefined
[3] at enrichBoardById (/var/www/lake/src/plugins/jira-pond/src/enricher/boards.js:16:15)
[3] at processTicksAndRejections (internal/process/task_queues.js:95:5)
[3] at async Object.enrich (/var/www/lake/src/plugins/jira-pond/src/enricher/boards.js:8:3)
[3] at async Object.enrich (/var/www/lake/src/plugins/jira-pond/src/enricher/index.js:6:3)
[3] at async Object.exec [as jira] (/var/www/lake/src/plugins/jira-pond/index.js:26:7)
[3] at async Promise.all (index 0)
[3] at async jobHandler (/var/www/lake/src/enrichment/worker.js:20:5)
[3] at async channel.consume.noAck (/var/www/lake/src/queue/consumer.js:24:9)
[3] INFO >>> closing mongo connection
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=11

Too many commits seem to be pulled in for gitlab pond collector

async collectByProjectId (db, projectId, forceAll) {
    const commitsCollection = await findOrCreateCollection(db, collectionName)
    for await (const commit of fetcher.fetchPaged(`projects/${projectId}/repository/commits?all=true&with_stats=true`)) {
      commit.projectId = projectId
      console.log('JON >>> saving commit to mongo')
      await commitsCollection.findOneAndUpdate(
        { id: commit.id },
        { $set: commit },
        { upsert: true }
      )
    }
  }

I sent a post with:

{
    "jira": {
        "boardId": 8
    },
    "gitlab": {
        "projectId": 8967944
    }
}

This is for vdev.co

Screen Shot 2021-07-30 at 2 37 29 PM

I expected to see only 15k commits in my mongo db but I saw many more. I stopped the process when it hit 350 pages long

Screen Shot 2021-07-30 at 2 38 13 PM

EXPECTED:

  • I should get the exact number of commits in the repo stored in mongo

Questions

  • Is this maybe all commits from all branch even if they are not merged?

collecting Jira change log

Requirement lead time should be broken down by issue status, then we could find out in which state the requirement stays for the most time.
The breakdown metrics will also be mentioned in Jinglei's demo.

Grafana should not use the enrichment database

Hi, all

I was check out collected data the other day, and the lake database looked like this:
image

It's kind of messy, I think it was cuased by grafana using same pg database as our enrichment.
Actually, grafana doesn't require a database to run, we can simply remove the environment variables of grafana service from docker-compose.yml
Or, we can use another database for grafana only.

Regards
Klesh Wong

Running commands in bulk against mongo

When we insert many records at once or when we want to itterate over collections to enhance data, we need to run too many promises. One way to avoid this is with Promise.all()

console.log('JON >>> changelog', changelog)
      for (const change of changelog.values) {
        // todo we need to add our own primary key
        // todo only update based on primary key
        promises.push(changelogCollection.insertOne({
          issueId: issue.id,
          ...change
        }))
      }
    }
    await Promise.all(promises)

But the downside is that if one fails, they all fail.

Having said that, what happens when one fails... Do we try to do it again later?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.