The Data Science Pipeline Documentation can be found here.
0.5 - 2021-07-16
-
Removed all the containers and other resources that have been moved to Kubernetes. Kubernetes is now used to run all the containers.
-
Replaced the Spark Worker VMs with Azure Databricks.
-
Replaced the Spark Postgres Facts DB with a Redis cache for improved performance.
-
Fixed some bugs that prevented deploying to Azure US Government Cloud.
0.4.0 - 2021-01-29
-
Added Terraform remote state
-
Separated infrastructure resources from application resources
-
Moved Ansible playbooks to separate repository
-
Created a DevOps Docker container
0.3.0 - 2020-12-09
-
Added documentation on Administration, Deployment, and FAQ.
-
Improved the resiliency of the Spark worker nodes by auto-starting hadoop and yarn on reboot.
-
Updated Spark to 3.0.
-
Locked down SSH access to the VMs.
-
Grafana now stores its information in Consul.
-
Updated existing and added new Grafana Dashboards.
0.2.2 - 2020-10-29
-
Disable anonymous access to the Mosquitto MQTT broker
-
Synchronize Event Hubs creation with creation of their access rules
-
Add Prometheus server and exporters
-
Add pipeline health and status Grafana dashboards
-
Store Jupyter Notebook password in Consul
0.2.1 - 2020-10-07
-
Remove analytics job specific from pipeline repo
-
Tweak HDFS settings to allow multiple jobs on YARN
-
Deployment fixes
0.2.0 - 2020-09-14
-
Deploy three worker nodes managed by YARN
-
Deploy Consul server and use it for storing deployment facts
-
Integrate Grafana visualizations for all data topics
0.1.0 - 2020-05-11
-
Initial release