iGrid

Making the power grid smarter. An Insight 2020 May Data Engineering Project by Kevin Yi-Wei Lin.

Background

Power outages cause economic loss. Due to the rise of WFH, it can be difficult for the old power grid to adjust to the new demand pattern from regular households. To reduce the chance of blackout, utility companies need a new way to turn off/down unessential appliances when there is regional power demand/supply imbalance. Instead of power grid infrastructure overhaul, the utility companies can exploit the trend of smart appliances and smart plugs, controlling them through home assistants from the cloud when needed by using long-term and short-term metrics.

Demo

Slide Deck

Link

Data pipeline

The pipeline was designed to seperate fine-grain analysis (Spark) from coarse-grain analysis. The latter was possible with only a python script because of the roll-up on ingestion and powerful queries powered by Druid. The current setup can at least handle 100k msg/s with 10k appliances. This message velocity is on the level of a single power distribution station.

Repository Structure

├── batch               Batch processing python script with Airfow
├── data                Usage of GREEND and REDD data sets
├── database            Druid config files
├── example config      Example configuration file 
├── frontend            Imply Pivot config files
├── ingestion           Kafka producer scripts
└── stream_processing   Spark Structured Streaming script

Instructions

Data sets

The Reference Energy Disaggregation Data Set (REDD) [1]:
The low frequency data set was used.
GREEND: Energy Metering Data Set [2]:
Version GREEND_0-2_300615.zip was used. Please refer to the instruction in /data.

Cluster setup
(I strongly recommend future fellows to utilize AWS managed clusters.)

Kafka v2.2.1: AWS MSK three m5.large nodes
Spark v2.4.5: AWS EMR v5.30.1 three m5 large nodes (1 master and 2 workers) with bootstrap action script: stream_processing/init_emr.sh
Druid v0.18.1: single server "small" using i3.2xlarge
Kafka producers: four t2.xlarge
Batch with Airflow: t2.small
Imply Pivot: t2.medium
pip3 requirement files are in individual folders

Create Kafka Topics
powerraw, history and dutycycle, where 6 partitions and replication factor 2 were used.

Initiate Druid Datasources
Change the relevant address and import the specfications into datasources.

Start Kafka Producers

Change relevant parameters in config.ini
Place the Python and Bash scripts, config.ini and schema.avsc under the same directory
./run_GREEND.sh [starting day shift] [ending day shift] or
./run_REDD.sh [starting day shift] [ending day shift]
This will replay the whole data set for each day shift specified in the argument. Do not do more than 20 playbacks on a single machine.

Submit Spark Strutured Streaming Job

Change relevant parameters in config.ini
Place duty_cycle_avro.py, config.ini and schema.avsc under the same directory
spark-submit --master yarn --deploy-mode client --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-avro_2.11:2.4.5 duty_cycle_avro.py

Start Batch Historcal Processing

Put druid_batch.py and config.ini under /home/ubuntu, or other path specified in the DAG file.
Change relevant parameters in config.ini
Put DAG script in dags folder and turn on in Airflow

Dashboard

Connect Pivot to Druid Datasources
Import dashboard config file frontend/dashboard-iGridDemo.json

References

[1] J. Zico Kolter and Matthew J. Johnson. REDD: A public data set for energy disaggregation research. In proceedings of the SustKDD workshop on Data Mining Applications in Sustainability, 2011.
[2] S. D’Alessandro, A.M. Tonello, A. Monacchi, W. Elmenreich, “GREEND: An Energy Consumption Dataset of Households in Italy and Austria,” Proc. of IEEE SMARTGRIDCOMM 2014, Venice, Italy, November 3-6, 2014.

License

This project is licensed under the MIT License - see the LICENSE file for details

linkevinlin1 / insight_de_smart_grid Goto Github PK

insight_de_smart_grid's Introduction

iGrid

Table of Contents

Background

Demo

Slide Deck

Data pipeline

Repository Structure

Instructions

References

License

insight_de_smart_grid's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent