Coder Social home page Coder Social logo

flokkr's Introduction

Docker images for Open Source bigdata/hadoop projects

Flokkr is an umbrella github organization to collect all of my containerization work for Apache bigdata/datascience projects such as Apache Hadoop or Apache Spark.

On high level, there are two main type of the subprojects/git repos under this organization: Containers and runtime configuration examples.

If you would like to run a simple Apache bigdata project, open the repository and use the included docker-compose file. If you need a more sophisticated cluster which includes multiple product and different configuration: investigate the runtime repositories and choose a method which is the most appropriate for you.

Containers

All of the containers are based on one smart baseimage defined in flokkr/docker-baseimage. It contains all the configuration loading script (based on environment variables or consul servers) and other extensions (eg. btrace instrumentation).

To get more information about the available environment variables check the flokkr/launcher repository.

All the other containers can be found with docker- prefix under the flokkr organization.

The containers are usually built on travis-ci and pushed to the docker hub instead to use dockerhub automatic buidls due to the limitation of the dockerhub (for example it's hard to generate matrix builds with all the older versions).

Available images:

Repository Product
docker-baseimage Base image with all the configuration loading magic
docker-hadoop Apache Hadoop components (hdfs/yarn)
docker-spark Apache Spark components
docker-storm Apache Storm components
docker-zookeeper Apache Zookeeper components
docker-kafka Apache Kafka components
docker-hbase Apache HBase components
docker-zeppelin Apache Zeppelin interface
docker-krb5 Highly insecure kerberos container, with an open REST api to request new kerberos keytab files.

Note: previous version of the containers (and some not yet migrated) can be found under the github.com/elek account.

Runtime examples

Docker image creation is easy, just a few lines to download and unpack the Apache projects. The tricky part is how the containers could work together: service discovery, configuration management, data locality, multi-tenancy, etc.

There are various examples how the containers could be used and each of them have a separated repository with the runtime- prefix.

Repository Details
runtime-compose docker-composed based pseudo clusters (multiple containers but only for one hosts). Configuration are defined by environment variables. For development and local experiments.
runtime-consul Multi-host real cluster with consul (for storing the configuration and docker-compose definitions) and docker-compose. Small scripts help to maintain the cluster state (restart components on every config change). Full data-locality is achieved by using docker host network.
runtime-nomad Multi-host real cluster with consul (for storing the configuration and docker-compose definitions) and nomad (to start the instances). Small scripts help to maintain the cluster state (restart components on every config change). Full data-locality is achieved by using docker host network.
runtime-swarm Similar to the previous one, but the container scheduling part is simplified with docker-compose + swarm. No host network, so no data-locality. Environment variable based configuration management.
runtime-kubernetes Kubernetes managed cluster with kubernetes ConfigMap based configuration set.

flokkr's People

Contributors

elek avatar

Watchers

zhang jian avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.