Coder Social home page Coder Social logo

deinspanjer / bugzilla_etl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mozilla-metrics/bugzilla_etl

0.0 2.0 0.0 46.98 MB

Reconstructs bug versions from bugzilla history and stores them in ElasticSearch

Shell 10.63% JavaScript 89.37%

bugzilla_etl's Introduction

Bugzilla ETL

Notice: This ETL is no longer used - active development has moved to https://github.com/klahnakoski/Bugzilla-ETL.

A set of Pentaho DI jobs to extract bug versions from a bugzilla database and store them in an elasticsearch index. This ETL drives dashboards for BMO, for various teams at Mozilla Corporation.

Requirements

  • an elasticsearch cluster where you can CRUD the index bugs
  • a working PDI (a.k.a kettle) installation (free community edition should work fine). Tested with PDI CE 4.3

Minimal instructions

  • Clone this project into a local directory

  • Configure the elasticsearch indexes (put a cluster node in place of localhost):

  • Configure Pentaho DI:

    • add a directory .kettle in your $KETTLE_HOME
    • there, create a file kettle.properties
    • in that file, add settings for bugs_db_host, bugs_db_port, bugs_db_user, bugs_db_pass and bugs_db_name for your bugzilla-database connection.
    • add settings for ES_NODES, ES_CLUSTER, ES_INDEX
  • If necessary, modify bin/import_bugs.sh, then run it to import the full data set.

  • Later on, use bin/update_bugs_incr.sh to read incremental modifications from the MySQL database

Known issues

  • Some cases where a user's bugzilla ID changes mid-history for a bug can't be handled automatically, and should be added to configuration/kettle/bugzilla_aliases.txt. There are several alias-related scripts and transformations that help to detect these types of changes. See bin/find_aliases.sh, bin/find_all_aliases.sh, transformations/find_aliases.ktr, and transformations/detect_new_aliases.ktr.
  • Mozilla Bug 804946 causes some trouble with the ETL. See Bug 804961 for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.