Thought Industries Data Engineering Code Challenge

Congratulations on making it to the code challenge step of the interview process!

This challenge is designed to test your knowledge of Python, ETL, LookML and data modeling. This readme will outline how to get started and what's expected of the two challenge components.

Getting Started

To get started with the challenge, it's strongly encouraged that you leverage GitHub by either creating a private repository from this template or creating a private fork and setting that up on your machine. As a last resort, you can download a zip file.

If using a Git fork/repository, the suggested approach is to create a copy of the problem branch called solution. You will commit and push to this branch and open a PR to the problem branch for review.

Once you have your files setup, you're ready to begin the ETL portion of the challenge.

Data Model

For the both the ETL and Looker challenges, please refer to the following data model.

ETL Challenge

The ETL challenge will test your Python and ETL skills by requiring you to implement the extract and load functions of a Python library. The documentation within the code outlines what's expected.

The goal is to move data from a RethinkDB instance into a PostgreSQL instance. You're work is verified via pytest cases.

The system requirements, general procedure and rules are outlined below.

System Requirements

UNIX-based environment (MacOS, Ubuntu/Linux). Windows systems should work, but may require tweaks to the setup and run scripts, but these can also be performed manually.
Docker

Procedure

The code is written to be run within a docker compose deployment to reduce local system dependencies. All commands to deploy and run the ETL test are in the run script.

Review and run the run script. The tests will fail, but you'll get familiar with how the system runs.
Review the code in tests/test.py and lib/etl.py to see what needs to be implemented.
Implement the necessary functions
When confident in your solution, confirm that the run script runs successfully and all tests are passing.

If you want to keep the docker-compose deployment up so you can interactively run code in the Python container as you make changes, just run docker-compose up -d. Just remember that RethinkDB and Postgres need initialization when initially deployed.

Rules

No modifying or adding of any files except lib/etl.py.
The functions should be implemented as outlined by the docs, no adding of parameters or changing of return types.
Your solution can be as simple or complex as you like, so long as the tests pass.

Looker Challenge

This section will test your Looker/LookML and data modeling skills. You will be implementing code necessary to expose data within Looker, namely views, models and explores.

Unless you have a Looker instance to develop in, this will be free hand coding (a Looker instance is not required for this challenge). You are expected to follow LookML syntax to the best of your ability. If you are not familiar with LookML, please check out their free course and documentation.

All code should be stored in the looker folder in their respective folders.

Procedure

Create one view for each table in the data model, stored in the views folder. There should be one dimension for each field, as well as a count measure.
Create a model called ti and additional ti_shared.lkml file stored in the models folder. The ti model should contain includes for ti_shared.lkml and the explores. This should be accomplished using only 2 include statements. The model should also contain a connection string for a connection named postgres.
In ti_shared.lkml use access grants to create three levels of access based on a access_level user attribute. These access levels should be called internal, company and client. Access should be additive starting from internal (internal also gets company and client access, company gets client access).
Create one explore file for each view, stored in the explore folder. Explores files should be structured so that there is one base explore (extension required) which is extended once per access level defined in ti_shared. Each explore should have joins for all related tables, with join conditions and relationship (cardinality) defined. Explores with Explore file names should match the names of the view they are based on.

For access levels of company and client, add access filters for company and client users attributes (mapped to the company and client dimensions of the base view). Company access needs access filters for company, while client access needs filters for both company and client.

Review

It's strongly encouraged that you start the challenge early and reach out early and often for feedback and help. You should approach this challenge as you would a normal work task.

If you're using a Github repository, opening a PR for your solution branch into the problem branch is the easiest way to review. When you're ready to start sharing work, add mgirard772 ([email protected]) as a collaborator, then add them as a reviewer for the PR.

If you're not using version control, then you can zip up your work and share via email.

thoughtindustries / de-code-challenge Goto Github PK

de-code-challenge's Introduction

Thought Industries Data Engineering Code Challenge

Getting Started

Data Model

ETL Challenge

System Requirements

Procedure

Rules

Looker Challenge

Procedure

Review

de-code-challenge's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent