Coder Social home page Coder Social logo

gitbase-spark-connector's Introduction

gitbase-spark-connector Build Status codecov

gitbase-spark-connector is a Scala libraray that lets you expose gitbase tables as Spark SQL Dataframes to run scalable analysis and processing pipelines on source code.

Pre-requisites

Import as a dependency

For the moment, it is served through jitpack so you can check out examples about how to import it in your project here.

Usage

First of all, you'll need a gitbase instance running. It will expose your repositories through a SQL interface.

docker run -d --name gitbase -p 3306:3306 -v /path/to/repos/directory:/opt/repos srcd/gitbase:v0.17.0

Note you must change /path/to/repos/directory to the actual path where your git repositories are located.

Also, a bblfsh server could be needed for some operations on UASTs

docker run -d --name bblfshd --privileged -p 9432:9432 bblfsh/bblfshd:v2.9.1-drivers

You can configure where gitbase and bblfsh are listening by the environment variables:

  • BBLFSH_HOST (default: "0.0.0.0")
  • BBLFSH_PORT (default: "9432")
  • GITBASE_SERVERS (default: "0.0.0.0:3306")

Finally you can add the gitbase DataSource and configuration just registering in the spark session.

import tech.sourced.gitbase.spark.util.GitbaseSessionBuilder

val spark = SparkSession.builder().appName("test")
    .master("local[*]")
    .config("spark.driver.host", "localhost")
    .registerGitbaseSource()
    .getOrCreate()

val refs = spark.table("ref_commits")
val commits = spark.table("commits")

val df = refs
  .join(commits, Seq("repository_id", "commit_hash"))
  .filter(refs("history_index") === 0)

df.select("ref_name", "commit_hash", "committer_when").show(false)

Output:

+-------------------------------------------------------------------------------+----------------------------------------+-------------------+
|ref_name                                                                       |commit_hash                             |committer_when     |
+-------------------------------------------------------------------------------+----------------------------------------+-------------------+
|refs/heads/HEAD/015dcc49-9049-b00c-ba72-b6f5fa98cbe7                           |fff7062de8474d10a67d417ccea87ba6f58ca81d|2015-07-28 08:39:11|
|refs/heads/HEAD/015dcc49-90e6-34f2-ac03-df879ee269f3                           |fff7062de8474d10a67d417ccea87ba6f58ca81d|2015-07-28 08:39:11|
|refs/heads/develop/015dcc49-9049-b00c-ba72-b6f5fa98cbe7                        |880653c14945dbbc915f1145561ed3df3ebaf168|2015-08-19 01:02:38|
|refs/heads/HEAD/015da2f4-6d89-7ec8-5ac9-a38329ea875b                           |dbfab055c70379219cbcf422f05316fdf4e1aed3|2008-02-01 16:42:40|
+-------------------------------------------------------------------------------+----------------------------------------+-------------------+

gitbase-spark-connector's People

Contributors

ajnavarro avatar erizocosmico avatar mcarmonaa avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.