Congratulations on making it to the code challenge step of the interview process!
This challenge is designed to test your knowledge of Python, ETL, LookML and data modeling. This readme will outline how to get started and what's expected of the two challenge components.
To get started with the challenge, it's strongly encouraged that you leverage GitHub by either creating a private repository from this template or creating a private fork and setting that up on your machine. As a last resort, you can download a zip file.
If using a Git fork/repository, the suggested approach is to create a copy of the problem
branch called solution
.
You will commit and push to this branch and open a PR to the problem
branch for review.
Once you have your files setup, you're ready to begin the ETL portion of the challenge.
For the both the ETL and Looker challenges, please refer to the following data model.
The ETL challenge will test your Python and ETL skills by requiring you to implement the extract and load functions of a Python library. The documentation within the code outlines what's expected.
The goal is to move data from a RethinkDB instance into a PostgreSQL instance. You're work is verified via pytest cases.
The system requirements, general procedure and rules are outlined below.
- UNIX-based environment (MacOS, Ubuntu/Linux). Windows systems should work, but may require tweaks to the setup and run scripts, but these can also be performed manually.
- Docker
The code is written to be run within a docker compose deployment to reduce local system dependencies.
All commands to deploy and run the ETL test are in the run
script.
- Review and run the
run
script. The tests will fail, but you'll get familiar with how the system runs. - Review the code in
tests/test.py
andlib/etl.py
to see what needs to be implemented. - Implement the necessary functions
- When confident in your solution, confirm that the run script runs successfully and all tests are passing.
If you want to keep the docker-compose deployment up so you can interactively run code
in the Python container as you make changes, just run docker-compose up -d
.
Just remember that RethinkDB and Postgres need initialization when initially deployed.
- No modifying or adding of any files except
lib/etl.py
. - The functions should be implemented as outlined by the docs, no adding of parameters or changing of return types.
- Your solution can be as simple or complex as you like, so long as the tests pass.
This section will test your Looker/LookML and data modeling skills. You will be implementing code necessary to expose data within Looker, namely views, models and explores.
Unless you have a Looker instance to develop in, this will be free hand coding (a Looker instance is not required for this challenge). You are expected to follow LookML syntax to the best of your ability. If you are not familiar with LookML, please check out their free course and documentation.
All code should be stored in the looker
folder in their respective folders.
- Create one view for each table in the data model, stored in the views folder. There should be one dimension for each field, as well as a count measure.
- Create a model called
ti
and additionalti_shared.lkml
file stored in the models folder. Theti
model should contain includes forti_shared.lkml
and the explores. This should be accomplished using only 2 include statements. The model should also contain a connection string for a connection namedpostgres
. - In
ti_shared.lkml
use access grants to create three levels of access based on aaccess_level
user attribute. These access levels should be called internal, company and client. Access should be additive starting from internal (internal also gets company and client access, company gets client access). - Create one explore file for each view, stored in the explore folder. Explores files should be structured so
that there is one base explore (extension required) which is extended once per access level defined in
ti_shared
. Each explore should have joins for all related tables, with join conditions and relationship (cardinality) defined. Explores with Explore file names should match the names of the view they are based on.
For access levels of company and client, add access filters for company and client users attributes (mapped to the company and client dimensions of the base view). Company access needs access filters for company, while client access needs filters for both company and client.
It's strongly encouraged that you start the challenge early and reach out early and often for feedback and help. You should approach this challenge as you would a normal work task.
If you're using a Github repository, opening a PR for your solution branch into the problem branch is the easiest way to review. When you're ready to start sharing work, add mgirard772 ([email protected]) as a collaborator, then add them as a reviewer for the PR.
If you're not using version control, then you can zip up your work and share via email.