Insight Data Engineering Problem
The Problem statement can be found here.
This is the documentation on how to run the donation analytics. The backend program is written with Python v3.6
.
Fundatmental scientific computing package will be required to install if not already in the system. Other basic modules this program uses are re, sys and datetime
This project uses numpy package. Check if you don't have them locally installed.
$ python -m pip install --user numpy
The directory structure for the repo is like this:
├── README.md
├── run.sh
├── src
│ └── donation-analytics.py
│ └── validate.py
├── input
│ └── percentile.txt
│ └── itcont.txt
├── output
| └── repeat_donors.txt
├── insight_testsuite
└── run_tests.sh
└── tests
└── test_1
| ├── input
| │ └── percentile.txt
| │ └── itcont.txt
| |__ output
| │ └── repeat_donors.txt
├── test_2_date_format_validation
├── test_3_rounding_for_percentile
There are two files in the src folder, donation-analytics.py and validate.py.
Validate.py is a utility file that contains,
- a function
validateDate
validates if the date is in required format - another function
malformed
that checks if any of the required entity is malformed in the record or if the donation is from an organization instead of an individual
The donation-analytics.py file imports the malformed
function from Validate.py and performs the main analytics, that involves:
- reading the input contributions file
- skipping the tuples in case they are malformed or if the donation is from an organization
- writing the repeat donors file
- parsing the important fields out of every transaction tuple
- checking if the donor is a repeat donor
- emitting the contribution of the recipient, from repeatDonors zipCode for current year into a file
The test script run_tests.sh
can be run from the insight_testsuite
folder
insight_testsuite~$ ./run_tests.sh
To obtain the analytics paste the itcont.txt
and percentile.txt
files into ./input
folder and then run the script run.sh
as follows:
$ ./run.sh