You should implement a Java application that:
Gets the following as an input:
- Query with parameters (for example:
:from
) - Parameters and their values (for example: -Dfrom='2019-01-05')
- Output format (for example: CSV)
- Output file path
Executes it in a provided data source and writes the result to the output file path in the provided output format.
- You should implement only PostgresDB support as the data source, but make sure that it will easily support many different data sources.
- You should implement only CSV (with header) as the output format, but make sure that it will easily support many different formats (for example json).
- Keep in your mind that you may want to combine between different data sources and different output formats.
- You are welcome to use any external library (for example JDBC/JDBI).
- The application should be ready to use in production.
For testing your code you can use the followingpublic database, and can execute your code using the following query:
select * from xref where timestamp > :from;
Using Airflow, you should generate an hourly CSV with new records in the xref
table and send the file as an email.
- (Optional) Fork this repository, clone it and use it to implement the next steps (it should help you)
- Install Airflow docker
- Implement Airflow dag
xref_pipe_dag.py
with 2 tasks and their dependencies:
query_executor
- Generate an hourly CSV with new records in thexref
tablexref_report
- Send the file that was generated byquery_executor
and send it by email (to a fake email address) (use the following if you want)
- Share your solution using github (data_engineer_exercise + Java code)
In order to not make this task too complex, we provided a few steps that should help you to prepare the airflow environment
- Fork this repository and clone it
- Install Docker
- Install airflow docker with LocalExecutor and Java:
cd <PATH_TO_DATA_ENGINEER_EXERCISE_FOLDER>
docker-compose -f ./airflow/docker-compose-LocalExecutor.yml up -d
In order to install java we extended puckel/docker-airflow:1.10.2
image, the original docker image can be found here if you want to read about it (you don't really need to, just in case you have some problems it might help)
- Go to (http://localhost:8080) and make sure that
example_dag.py
and its task print "Hello World!" - Now you can implement the tasks and create the new dag