Coder Social home page Coder Social logo

varangian's Issues

MVP: prototype run of training, prediction and issue creating on new code

Is your feature request related to a problem? Please describe.
As a Product Owner, I want to see a manual demo of the monolith, so that new issues are created based on new code comits of one specific repository.

High-level Goals

  • show interaction between model and bot
  • new issues on new code
  • automated issue creation

Describe the solution you'd like
TBA

Describe alternatives you've considered
TBA

Additional context
TBA

Acceptance Criteria
TBA

Determine the contents of a sample issue

First draft of contents and a sample bug trace is given below.

We will probably need multiple iterations and feedback from users before finalizing the content.

Issue Contents:

Subject:

Infer bug type: UNINITIALIZED_VALUE

Location: /openssl/src/test dtls_mtu_test.c:100

Description: The value read from mtus[_] was never initialized.

The buggy line: 100. > for (s = mtus[0]; s <= mtus[29]; s++) {

Confidence:

Priority Rank:

Link to the full trace.


Sample Bug Trace:

/openssl/src/test dtls_mtu_test.c: 100 : error: UNINITIALIZED_VALUE

The value read from mtus[_] was never initialized.
Showing all 1 steps of the trace

/gpfs/automountdir/r92gpfs02/zhengyu/work/ai4code/benchmarks/openssl/src/test/dtls_mtu_test.c:100:10:
98. * that size and see what actual record size we end up with.
99. */
100. > for (s = mtus[0]; s <= mtus[29]; s++) {
101. size_t reclen;
102.

Determine a candidate list of projects for the initial run

There was some discussion on this during the last meeting.

It was mentioned that we need a candidate list of projects which are:

  1. Active and "healthy"
  2. High number of users

Once we get the the candidate list, we can pick one most suited for differential analysis and begin.

Projects we worked with so far: Libtiff, Nginx, HTTPD, OpenSSL, Libav, FFMpeg

Tasks List for first run of monoloth prototype

Tasks for creation of monolith prototype and it's first run:

  • Determine target projects
  • Training data split
    • Temporal slicing of commits into training, dev splits [Burn/Saurabh]
  • Manual Verification of model output
  • ML
    • Feature Engineering:
      • Incorporate Yunchungs features in Feature Extractor [Burn]
      • Check if new features improve performance [Saurabh]
      • Add new features to feature extractor [Saurabh]
    • Separate training and inference [Burn]
    • Separate Voting from training [Burn]
  • C-BERT:
    • Create new train file for prototype [Luca]
    • Create tokenization script [Saurabh]
    • Inference mode tokenization[Luca/Saurabh]
    • Write script for artifacts package creation [Luca]
    • Run experiments to pick the right C-BERT model for prototype [Luca/Saurabh]
  • Engineering
    • Prepare system diagram [Saurabh]
    • Formalize output across components [Saurabh/Burn/Luca]
    • Create Infer output code extractor [Burn]
    • Write Python code for first run
    • Integration run
    • Model selector after training [Saurabh]
    • Inference output selector [Saurabh]
    • Get metrics at bug level
  • #13
    • Provide libtiff data to Kevin [Saurabh/Burn]
    • Bot Development @KPostOffice

Tasks List for Libtiff pipeline

Milestone 2: Implement the AugSA inference pipeline on Libtiff (https://gitlab.com/libtiff/libtiff)

Updates to the bot

Issue created by the bot needs to following updates:

  1. Title
    Title needs to be upsated to: [Priority]-[Issue Type]-[Bug location]
    Where Priority is rank of the issue in the ML output.

  2. Include priority in the description

  3. Bug trace:
    Show only the first few lines, but hide the full bug trace in the markup.

  4. bug location should actually point to the code so that the user can click and examine the code.

  5. If possible, create issues on actual clone of the Libtiff project so that the links work.

Edit: Edited the title to avoid markup error.

OKR and Tasks for Inference Scale Out (Milestone 3)

Milestone 3
Period: October to December 2021

OKRs

  • Objective: Scale Varangian to new OS projects 
    • Key Result: We have 6 repos by Jan 2022
    • Tasks:
      • Generate D2A V2 training data for 6 repos
      • Train models for the 6 repos
      • Automate D2A Retraining
      • Containerize and auto-start inference pipeline
      • Automate the training pipeline
      • Containerize and auto-start training pipeline. 
  • Objective: Scale models to repos with less data
    • Key Result: Improve Model Performance on cross project datasets
    • Tasks:
      • #30 Generate Ensemble baselines
      • #30 Generate C-BERT baselines
      • Improve ensemble performance:
        • #31 Feature Engineering
        • Separability, Segmentation
      • Improve C-BERT performance: More data, better model
  • Objective: Add features for better user engagement 
    • Key Result: Bot will have 5 new features
    • Tasks:
      • Tag/Assign users to issue
      • Access to full ranked list, traces
      • Issue management
        • Close resolved open issues
        • Issues marked FP are not opened again
        • Issues marked FP are reviewed
        • Duplicate issues are not opened
        • Resolved Issue Feedback to backend 
      • Aggregate issues 

user feedback loop

Is your feature request related to a problem? Please describe.
As a Data Scientist, I want to have a channel for user feedback, so that we can fine-tune the model

Failed to update dependencies to their latest version

Automatic dependency update failed for the current master with SHA a922988.

The automatic dependency management cannot continue. Please fix errors reported bellow.

Command
  $ pipenv update --dev
Standard output

Standard error
Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Environment details

Kebechet version: 1.5.2
Python version: 3.8.6
Platform: Linux-4.18.0-305.19.1.el8_4.x86_64-x86_64-with-glibc2.2.5
pipenv version: pipenv, version 2020.11.15


Dependency graph
Unable to obtain dependency graph:

Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Notes

For more information, see Pipfile and Pipfile.lock.

Once this issue is resolved, the issue will be automatically closed by bot.

/label thoth/potential-flake
/kind bug
/priority critical-urgent

Failed to update dependencies to their latest version

Automatic dependency update failed for the current master with SHA 0b65211.

The automatic dependency management cannot continue. Please fix errors reported bellow.

Command
  $ pipenv update --dev
Standard output

Standard error
Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Environment details

Kebechet version: 1.4.0
Python version: 3.8.6
Platform: Linux-4.18.0-193.56.1.el8_2.x86_64-x86_64-with-glibc2.2.5
pipenv version: pipenv, version 2020.11.15


Dependency graph
Unable to obtain dependency graph:

Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Notes

For more information, see Pipfile and Pipfile.lock.

Once this issue is resolved, the issue will be automatically closed by bot.

/label thoth/potential-flake
/kind bug
/priority critical-urgent

Failed to update dependencies to their latest version

Automatic dependency update failed for the current master with SHA a922988.

The automatic dependency management cannot continue. Please fix errors reported bellow.

Command
  $ pipenv update --dev
Standard output

Standard error
Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Environment details

Kebechet version: 1.5.2
Python version: 3.8.6
Platform: Linux-4.18.0-305.19.1.el8_4.x86_64-x86_64-with-glibc2.2.5
pipenv version: pipenv, version 2020.11.15


Dependency graph
Unable to obtain dependency graph:

Warning: Python 3.9 was not found on your system...
Neither 'pyenv' nor 'asdf' could be found to install Python.
You can specify specific versions of Python with:
$ pipenv --python path/to/python

Notes

For more information, see Pipfile and Pipfile.lock.

Once this issue is resolved, the issue will be automatically closed by bot.

/label thoth/potential-flake
/kind bug
/priority critical-urgent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.