apache / incubator-devlake Goto Github PK

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

Home Page: https://devlake.apache.org/

License: Apache License 2.0

JavaScript 0.03% Shell 0.29% Dockerfile 0.21% Makefile 0.08% Go 87.22% HTML 0.02% TypeScript 9.81% CSS 0.03% Python 2.30%

data data-analysis data-engineering data-integration data-transfers devops domain-layer etl golang integration

incubator-devlake's Introduction

Apache DevLake(Incubating)

🤔 What is Apache DevLake?

Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth.

Apache DevLake is designed for developer teams looking to make better sense of their development process and to bring a more data-driven approach to their own practices. You can ask Apache DevLake many questions regarding your development process. Just connect and query.

🎯 What can be accomplished with Apache DevLake?

Your Dev Data lives in many silos and tools. DevLake brings them all together to give you a complete view of your Software Development Life Cycle (SDLC).
From DORA to scrum retros, DevLake implements metrics effortlessly with prebuilt dashboards supporting common frameworks and goals.
DevLake fits teams of all shapes and sizes, and can be readily extended to support new data sources, metrics, and dashboards, with a flexible framework for data collection and transformation.

👉 Live Demo

DORA Dashboard

All Dashboards

💪 Supported Data Sources

Here you can find all data sources supported by DevLake, their scopes, supported versions and more!

🚀 Getting Started

🤓 How do I use DevLake?

1. Set up DevLake

You can set up Apache DevLake by following our step-by-step instructions for Install via Docker Compose or Install via Helm. Please see detailed instructions here, and ask the community if you get stuck at any point.

2. Create a Blueprint

The DevLake Configuration UI will guide you through the process (a Blueprint) to define the data connections, data scope, transformation and sync frequency of the data you wish to collect.

3. Track the Blueprint's progress

You can track the progress of the Blueprint you have just set up.

4. View the pre-built dashboards

Once the first run of the Blueprint is completed, you can view the corresponding dashboards.

5. Customize the dashboards with SQL

If the pre-built dashboards are limited for your use cases, you can always customize or create your own metrics or dashboards with SQL.

😍 How to Contribute

Please read the contribution guidelines before you make contribution. The following docs list the resources you might need to know after you decided to make contribution.

Create an Issue: Report a bug or feature request to Apache DevLake
Submit a PR: Start with good first issues or issues with no assignees
Join Mailing list: Initiate or participate in project discussions on the mailing list
Write a Blog: Write a blog to share your use cases about Apache DevLake
Develop a Plugin: Add a plugin to integrate Apache DevLake with more data sources for the community

📄 Contributing to Documentation:

Apache DevLake's documentation is hosted at devlake.apache.org
We have a separate GitHub repository for Apache DevLake's documentation: github.com/apache/incubator-devlake-website

⌚ Roadmap

Roadmap: Detailed roadmaps for DevLake.

💙 Community

Slack: Message us on Slack
Wechat Community: Check the barcode

📄 License

This project is licensed under Apache License 2.0 - see the LICENSE file for details.

incubator-devlake's People

Contributors

Stargazers

Watchers

Forkers

kevin-kline klesh maximdub startrekzky gerilleto snailsword joshuakwan jaxzhou arunsanna jacky539027 liangjingyang runjinz l54808821 hsj1606 shevchenkoav stdiopt zhenhangtung yizhimumujun zhangtao25 jiaomamj ggcannard pgenvil stevengonsalvez ronlin1 tmvanetten bluecommunitysustainability tanhaipeng bspan2001 hanchi1983 xiaoruiguo qixiaobo cy1287 wking1986 shane-lea keevol lokihjl gavinljj liangjz anthonygao brty1702 fly-open-devops raye-deng allensmile skymysky 5l1v3r1 elserhumano rcodes01 muharihar shantek-vn shiftwinting weiplanet cppmx googege super-rain xiongxiaoqing0614 seanshaogh huangweiboy2 christopher-men daohuo-io gaozining xkx9431 joonghoonchoi sala2000 gavin19801211 cylinder3 nhat-tong papersaltserver daniloinfo jesusoctavioas wuchunfu bryant-finney rheehot viemac hj3938 golang108 ailala1993929 ew010 namakotoduncan mazkarow suryatmodulus lmsama fanscout mu-l isgasho admodev leric aikerv majinhui04 sinntalker redbearder hellowillwan coder-layne gispdr andyhou-3d lcster althetinkerer algobot76 youngsoonlee eduardflorinescu anurag-ar

incubator-devlake's Issues

npm run commit has some issues?

@narrowizard when I run the comman npm run commit,

If get this message:

hint: The '.husky/commit-msg' hook was ignored because it's not set as executable.
hint: You can disable this warning with `git config advice.ignoredHook false`.

Is this something that can be ignored? Or do you want this husky thing running?

Allow developers to not pull all data all the time

As a developer, I want to call the api urls but only get the data that I really want for the component I am working on. I would like a config somethign like this:

jira: {
    host: 'https://merico.atlassian.net',
    basicAuth: 'anVzdGluLmJyYXplYXVAbWVyaWNvLmRldjpMTVlPVnNtZXhXekZlNW5sakdMY0VERDU=',
    proxy: 'http://localhost:4780',
    timeout: 15000,
    maxPagesForTest: 2,
    skipIssueCollection: false,
    skipChangelogCollection: true
  },

differences between typescript app and javascript app

typescript branch is ts-main
javascript branch is main

collector and enrichment

same: add plugins to implement collector and enricher
different: the task executor would not care about task type.

we may implement collector and enricher separately but enricher is high coupling with collector.

eg. the jira leadtime enrichment is high coupling with jira issue and jira issue changelog collector.

changes:

no collector service and enrichment service only one queue service for task executing
collector and enrichment would implement same interface for task executing

todo:

move the collector and enrichment code from javascript branch and implement the interface
compose the task executing, do enrichment after collector finished

source and source type

same: the source type is the plugin type/name. we would have jira and gitlab recently
different: one source type could have multi sources

js branch use config to load source temperately. it makes one source type can not have one source. and should restart app for change. now we use rest api and source table to save source. and trigger source task with source id.

changes:

one rest api service
add source with rest api
trigger source tasks with rest api

todo:

add rest apis referenced above

schema and migrations

different: use typeorm which is more friendly to typescript ORM Comparison (Typeorm vs Sequelize)
different: plugins would manage their own data schemas and the migrations
different: plugins would have collector schema and enrichment schema

TODO:

add typeorm entities in plugins
add migrate process while initial plugins in queue services

why nestjs

Nest (NestJS) is a framework for building efficient, scalable Node.js server-side applications. It uses progressive JavaScript, is built with and fully supports TypeScript (yet still enables developers to code in pure JavaScript) and combines elements of OOP (Object Oriented Programming), FP (Functional Programming), and FRP (Functional Reactive Programming).

easy to setup mono repo. easy to add rest api, graphql api, websocket server and microservice apps.

dependency injection. and has test tools with dependency injection in the testing environment for easily mocking components

easy for angular developer and java spring boot developer to start.

MIT-licensed open source project.

Requirement delivery rate

Gitlab Collection: API calls need pagination

Run migrations when docker-compose up is called

WHY

We want users to not have to worry about running migrations

Add mocha

Bugs count per 1k lines of code

Default datasource is uneditable

Should we consider the fact that user might change database password afterward, if so, this is problematic.

People who use two factor auth can't use the api tokens from Jira

From the docs: "If you use two-step verification to authenticate, your script will need to use a REST API token to authenticate."

https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/

Number of incidents found after shipping (defined as the number of issues of type 'INCIDENT')

Allow Grafana Dashboards to reference local icons

Our current Grafana homepage uses a few web icons: https://z3.ax1x.com/2021/08/04/fAClNV.png
Considering that these icons may not work, it‘s better to reference local icons. I will upload the icon files later.
The same applies to other visual materials that may be used in the future, such as images, etc.

Number of reviewers

MR review pass ratio (defined as the percentage of merged MRs vs all MRs)

Number of merge requests

Accumulated lines (defined as the sum of added lines of code and removed lines of code during a time window)

No data to calculate MR Review time

Data is missing to calculate 'MR Review time'.
No record in table 'gitlab_merge_requests' has a valid 'first_comment_time'.
The DB I connected is 13.212.147.78:45432.

Summarize the challenges of implementing a data collector for JIRA in Wiki

This will benefit both of our future contributors and the rest of our own team.

Absolute lines (defined as the diff between two snapshot of the codebase)

Standardize how we lint code locally

We need a solution for the linting problem. The goal is to standardize how we do linting locally so we never see the linting changes in PR in the future.

Rebase failed due to 'create file grafana/dashboards/Home.json permission denied'

Hi, all

I was trying to rebase, and failed with following message:

[klesh@klesh-laptop] ~/P/m/t/lake (gitlab|✔ )> git pull --rebase upstream main                                                                                                                                              25.064s
remote: Enumerating objects: 272, done.
remote: Counting objects: 100% (272/272), done.
remote: Compressing objects: 100% (116/116), done.
remote: Total 272 (delta 165), reused 238 (delta 142), pack-reused 0
Receiving objects: 100% (272/272), 185.07 KiB | 35.00 KiB/s, done.
Resolving deltas: 100% (165/165), completed with 22 local objects.
From https://github.com/merico-dev/lake
 * branch            main       -> FETCH_HEAD
   28d3631..aaa678c  main       -> upstream/main
First, rewinding head to replay your work on top of it...
error: unable to create file grafana/dashboards/Home.json: Permission denied
fatal: Could not detach HEAD

Possible reason: folders were created by docker, and they belong to root

The type of `authoredDate` and `committedDate` should be `DATE` instead of `VARCHAR`

These two field need to be aggregated in many scenarios, It's hard and inefficient to aggregated on VARCHAR when they should be DATE

Decide about whether we need resolution time

Theres some calculations going on in grafana with lead times and resolution times. But we dont currently collect resolution times from Jira.

We should gather resolution times, as well as lead times (currently only collect lead times)

Run migrations on DB automatically with npm run docker

Gitlab metrics

@kevin-kline Please see below for a list of metrics we want to support for our MVP for Gitlab:

Hook up Jira API to get all issues data

Collect issues for JIRA and save it in raw form

Get a jira access token for their API
store that token in a gitignored config file
update the readme so others can get this token as well
Create a file like /src/collectors/jira/index.js
Create a file like /src/collectors/jira/issues.js
Call the Jira API to get a list of issues given a project id?
Use the file test/test-docker-compose, as an example to connect to mongo db
create a file in db/connections/mongo.js that handles all saves of raw data
Store the data in mongoDB locally with that file

Bug age (defined as the lead time for issues of type 'BUG')

Number of requirement (defined as the number of issues of type "Story" and "Task")

Collector should write data with data schema

we should defined the data schema
the data schema would simple handle some validator and primary keys
- for 'mongodb' the primary keys should generate to '_id'
- maybe we would use some ORM later. eg TypeORM
the schema would have an abstract implement
- jira schemas would extends from abstract schema
- and gitlab
- and github

Failed to enrich TypeError: Cannot read property 'id' of undefined

I got a error message as bellow :

I found it in source code, and I thought that error maybe caused no data in boardsCollection.
That mean, the collector maybe has failed when syncing data.
There's whole logs in the end of this issue.

[0] Publishing task to collection {"jira":{"boardId":8},"gitlab":{"projectId":8967944}}
[1] mongodb://lake:lakeIScoming@mongodb:27017/lake?authSource=admin
[1] INFO >>> jira collecting { boardId: 8 }
[1] INFO >>> jira collecting board 8
[1] INFO >>> gitlab collecting { projectId: 8967944 }
[1] INFO >>> gitlab collecting project 8967944
[1] INFO >>> jira fetching data from agile/1.0/board/8 #0
[1] INFO >>> gitlab fetching data from projects/8967944 #0
[1] INFO >>> gitlab fetched data from projects/8967944
[1] INFO >>> gitlab collecting project done! 8967944
[1] INFO >>> gitlab collecting commits for project 8967944
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=1 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #1
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=1
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=2 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #2
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=2
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=3 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #3
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=3
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=4 #0
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=4
[1] INFO >>> jira fetching data from agile/1.0/board/8 #4
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=5 #0
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=5
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=6 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #5
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=6
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=7 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #6
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=7
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=8 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #7
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=8
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=9 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #8
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=9
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=10 #0
[1] INFO >>> jira fetching data from agile/1.0/board/8 #9
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=10
[1] INFO >>> gitlab fetching data from projects/8967944/repository/commits?withStats=true&per_page=100&page=11 #0
[1] Failed to collect Error: INFO >>> jira fetching data failed: retry limit exceeding
[1] at Object.fetch (/var/www/lake/src/plugins/jira-pond/src/collector/fetcher.js:33:11)
[1] at runMicrotasks (<anonymous>)
[1] at processTicksAndRejections (internal/process/task_queues.js:95:5)
[1] at async collectByBoardId (/var/www/lake/src/plugins/jira-pond/src/collector/boards.js:16:20)
[1] at async Object.collect (/var/www/lake/src/plugins/jira-pond/src/collector/boards.js:10:3)
[1] at async Object.collect (/var/www/lake/src/plugins/jira-pond/src/collector/index.js:6:3)
[1] at async Object.exec [as jira] (/var/www/lake/src/plugins/jira-pond/index.js:13:7)
[1] at async Promise.all (index 0)
[1] at async jobHandler (/var/www/lake/src/collection/worker.js:20:5)
[1] at async channel.consume.noAck (/var/www/lake/src/queue/consumer.js:24:9)
[1] INFO >>> closing mongo connection
[2] Publishing task to enrichment {"jira":{"boardId":8},"gitlab":{"projectId":8967944}}
[3] mongodb://lake:lakeIScoming@mongodb:27017/lake?authSource=admin
[3] INFO >>> recieve enriche job
[3] INFO >>> jira enriching board 8
[3] INFO >>> gitlab enriching { projectId: 8967944 }
[3] INFO >>> gitlab enriching project 8967944
[3] Failed to enrich TypeError: Cannot read property 'id' of undefined
[3] at enrichBoardById (/var/www/lake/src/plugins/jira-pond/src/enricher/boards.js:16:15)
[3] at processTicksAndRejections (internal/process/task_queues.js:95:5)
[3] at async Object.enrich (/var/www/lake/src/plugins/jira-pond/src/enricher/boards.js:8:3)
[3] at async Object.enrich (/var/www/lake/src/plugins/jira-pond/src/enricher/index.js:6:3)
[3] at async Object.exec [as jira] (/var/www/lake/src/plugins/jira-pond/index.js:26:7)
[3] at async Promise.all (index 0)
[3] at async jobHandler (/var/www/lake/src/enrichment/worker.js:20:5)
[3] at async channel.consume.noAck (/var/www/lake/src/queue/consumer.js:24:9)
[3] INFO >>> closing mongo connection
[1] INFO >>> gitlab fetched data from projects/8967944/repository/commits?withStats=true&per_page=100&page=11

Removed lines of code

Number of contributors

Too many commits seem to be pulled in for gitlab pond collector

async collectByProjectId (db, projectId, forceAll) {
    const commitsCollection = await findOrCreateCollection(db, collectionName)
    for await (const commit of fetcher.fetchPaged(`projects/${projectId}/repository/commits?all=true&with_stats=true`)) {
      commit.projectId = projectId
      console.log('JON >>> saving commit to mongo')
      await commitsCollection.findOneAndUpdate(
        { id: commit.id },
        { $set: commit },
        { upsert: true }
      )
    }
  }

I sent a post with:

{
    "jira": {
        "boardId": 8
    },
    "gitlab": {
        "projectId": 8967944
    }
}

This is for vdev.co

I expected to see only 15k commits in my mongo db but I saw many more. I stopped the process when it hit 350 pages long

EXPECTED:

I should get the exact number of commits in the repo stored in mongo

Questions

Is this maybe all commits from all branch even if they are not merged?

Add lint

Added lines of code

Grafana datasource.yml should be synchronized with configuration

Users are likly to change pg database password during deployment, if so, datasource.yml would become invalid.
We should find a way to sync those settings

Requirement lead time (defined as the lead time for issues of type "Story" and "Task")

GitLab: Merge Requests API calls need pagination

collecting Jira change log

Requirement lead time should be broken down by issue status, then we could find out in which state the requirement stays for the most time.
The breakdown metrics will also be mentioned in Jinglei's demo.

Jira metrics

And two cross data source metrics with Gitlab:

#109
#110

Grafana should not use the enrichment database

Hi, all

I was check out collected data the other day, and the lake database looked like this:

It's kind of messy, I think it was cuased by grafana using same pg database as our enrichment.
Actually, grafana doesn't require a database to run, we can simply remove the environment variables of grafana service from docker-compose.yml
Or, we can use another database for grafana only.

Regards
Klesh Wong

console.log('JON >>> changelog', changelog)
      for (const change of changelog.values) {
        // todo we need to add our own primary key
        // todo only update based on primary key
        promises.push(changelogCollection.insertOne({
          issueId: issue.id,
          ...change
        }))
      }
    }
    await Promise.all(promises)

But the downside is that if one fails, they all fail.

Having said that, what happens when one fails... Do we try to do it again later?

apache / incubator-devlake Goto Github PK

incubator-devlake's Introduction

Apache DevLake(Incubating)

🤔 What is Apache DevLake?

🎯 What can be accomplished with Apache DevLake?

👉 Live Demo

💪 Supported Data Sources

🚀 Getting Started

🤓 How do I use DevLake?

1. Set up DevLake

2. Create a Blueprint

3. Track the Blueprint's progress

4. View the pre-built dashboards

5. Customize the dashboards with SQL

😍 How to Contribute

📄 Contributing to Documentation:

⌚ Roadmap

💙 Community

📄 License

incubator-devlake's People

Contributors

Stargazers

Watchers

Forkers

incubator-devlake's Issues

collector and enrichment

source and source type

schema and migrations

why nestjs

WHY

Collect issues for JIRA and save it in raw form

EXPECTED:

Questions

Recommend Projects

Recommend Topics

Recommend Org