Coder Social home page Coder Social logo

data_engineer_exercise's Introduction

XRef Report pipe

Part I - Query Executor

You should implement a Java application that:

Gets the following as an input:

  • Query with parameters (for example: :from)
  • Parameters and their values (for example: -Dfrom='2019-01-05')
  • Output format (for example: CSV)
  • Output file path

Executes it in a provided data source and writes the result to the output file path in the provided output format.

  1. You should implement only PostgresDB support as the data source, but make sure that it will easily support many different data sources.
  2. You should implement only CSV (with header) as the output format, but make sure that it will easily support many different formats (for example json).
  3. Keep in your mind that you may want to combine between different data sources and different output formats.
  4. You are welcome to use any external library (for example JDBC/JDBI).
  5. The application should be ready to use in production.

For testing your code you can use the followingpublic database, and can execute your code using the following query:

select * from xref where timestamp > :from;

Part II - XRef Report

Using Airflow, you should generate an hourly CSV with new records in the xref table and send the file as an email.

High level steps

  1. (Optional) Fork this repository, clone it and use it to implement the next steps (it should help you)
  2. Install Airflow docker
  3. Implement Airflow dag xref_pipe_dag.py with 2 tasks and their dependencies:
  • query_executor - Generate an hourly CSV with new records in the xref table
  • xref_report - Send the file that was generated by query_executor and send it by email (to a fake email address) (use the following if you want)
  1. Share your solution using github (data_engineer_exercise + Java code)

Installation instruction (Optional)

In order to not make this task too complex, we provided a few steps that should help you to prepare the airflow environment

  1. Fork this repository and clone it
  2. Install Docker
  3. Install airflow docker with LocalExecutor and Java:
cd <PATH_TO_DATA_ENGINEER_EXERCISE_FOLDER>
docker-compose -f ./airflow/docker-compose-LocalExecutor.yml up -d

In order to install java we extended puckel/docker-airflow:1.10.2 image, the original docker image can be found here if you want to read about it (you don't really need to, just in case you have some problems it might help)

  1. Go to (http://localhost:8080) and make sure that example_dag.py and its task print "Hello World!"
  2. Now you can implement the tasks and create the new dag

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.