Coder Social home page Coder Social logo

buds-lab / the-building-data-genome-project Goto Github PK

View Code? Open in Web Editor NEW
182.0 49.0 60.0 517.9 MB

A collection of non-residential buildings for performance analysis and algorithm benchmarking

Home Page: http://www.buildingdatagenome.org

License: MIT License

Makefile 0.01% Python 0.01% Jupyter Notebook 99.99%
open-data jupyter-notebook electricity-meter commercial-building energy-efficiency electrical-meters smart-meter temporal-data feature-extraction feature-engineering

the-building-data-genome-project's Introduction

Check out the Building Data Genome 2 - the latest version that supercedes this one: https://github.com/buds-lab/building-data-genome-project-2

building data genome logo

  • Does your data science technique actually scale across hundreds of buildings?
  • Is it actually faster or more accurate?

These are questions that researchers should ask when developing data-driven methods. Building performance prediction, classi cation, and clustering algorithms are becoming an essential part of analysis for anomaly detection, control optimization, and demand response. But how do we actually compare, each individual technique against previously created methods?

The time-series data mining community identifed this problem as early as 2003: “Much of this work has very little utility because the contribution made”...“offer an amount of improvement that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.” (Keogh, E. and Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349–371, Oct. 2003.)

They created the time-series data benchmarking set. This data set enables testing of new techniques on an assortment of real world data sets. For commerical buildings data, we are doing the same!

The need for Benchmarking Data Set for Non-residential Building Data Analytics

Most of the existing building performance data science studies rely on each individual researcher creating their own methods, finding a case study data set and determining efficacy on their own. Not surprisingly, most of those researcher find positive, yet questionably meaningful results.

old way

Using a large, consistent benchmark data set from hundreds (or thousands) of buildings, a researcher can determine how well their methods actually perform across a heterogeneous data set. If multiple researcher use the same data set, then there can be meaningful comparisons of accuracy, speed and ease-of-use.

new way

Introducing the Building Data Genome Project

It is an open data set from 507 non-residential buildings that includes hourly whole building electrical meter data for one year. Each of the buildings has meta data such as or area, weather, and primary use type. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can also be used simply as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings. The charts below illustrate the breakdown of the buildings according to location, building industry, sub-industry, and primary use type.

meta data

Please contribute new data sets or provide analysis examples in Jupyter or R markdown using the data

Citation of Data-Set

Clayton Miller, Forrest Meggers, The Building Data Genome Project: An open, public data set from non-residential building electrical meters, Energy Procedia, Volume 122, September 2017, Pages 439-444, ISSN 1876-6102, https://doi.org/10.1016/j.egypro.2017.07.400.

ResearchGate

BibTex:
@article{Miller2017439,
title = "The Building Data Genome Project: An open, public data set from non-residential building electrical meters ",
journal = "Energy Procedia ",
volume = "122",
number = "",
pages = "439 - 444",
year = "2017",
note = "\{CISBAT\} 2017 International ConferenceFuture Buildings & Districts – Energy Efficiency from Nano to Urban Scale ",
issn = "1876-6102",
doi = "https://doi.org/10.1016/j.egypro.2017.07.400",
url = "http://www.sciencedirect.com/science/article/pii/S1876610217330047",
author = "Clayton Miller and Forrest Meggers",
keywords = "Open Data",
keywords = "Non-Residential Building Meter Data",
keywords = "Benchmark Data Set",
keywords = "Big Data",
keywords = "Machine Learning ",
abstract = "Abstract As of 2015, there are over 60 million smart meters installed in the United States; these meters are at the forefront of big data analytics in the building industry. However, only a few public data sources of hourly non-residential meter data exist for the purpose of testing algorithms. This paper describes the collection, cleaning, and compilation of several such data sets found publicly on-line, in addition to several collected by the authors. There are 507 whole building electrical meters in this collection, and a majority are from buildings on university campuses. This group serves as a primary repository of open, non-residential data sources that can be built upon by other researchers. An overview of the data sources, subset selection criteria, and details of access to the repository are included. Future uses include the application of new, proposed prediction and classification models to compare performance to previously generated techniques. "
}

Getting Started

We recommend you download the Anaconda Python Distribution and use Jupyter to get an understanding of the data.

  • Raw temporal and meta data are found in /data/raw/

Example notebooks are found in /notebooks/ -- a few good overview examples:

Publications or Projects that use this data-set:

Please update this list if you add notebooks or R-Markdown files to the notebook folder.

Contact -- (Add yours if you contribute to the data set)

Dr. Clayton Miller Building and Urban Data Science (BUDS) Group National University of Singapore [email protected] http://budslab.org/

Dr. Forrest Meggers Cooling and Heating for Architecturally Optimized System (CHAOS) Lab Princeton University [email protected] http://chaos.princeton.edu/

Anjukan Kathirgamanathan PhD Student, Energy Institute University College Dublin [email protected] https://energyinstitute.ucd.ie/

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│    │    │
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                          generated with `pip freeze > requirements.txt`

Project Organization

The MIT License (MIT) Copyright (c) 2016, Clayton Miller

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

the-building-data-genome-project's People

Contributors

anjukan avatar cmiller8 avatar samy101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

the-building-data-genome-project's Issues

Meaning of the data?

I was wondering in the data/processed/temp_open_utc_complete.csv, each building ID (each column) has a time-series data, but what is the meaning? electric consumption? or some sensoring data? What is the meaning?

downloading with PIP

Hi there,

I'm relatively new at cloning packages from GitHub. I'm try to download this package using pip, i've followed advice from here:
https://stackoverflow.com/questions/15268953/how-to-install-python-package-from-github
and here:
https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support

with command:
pip install -e git+https://github.com/buds-lab/the-building-data-genome-project.git

but go the following error:
could not detect requirement name, please specify one with #egg=

could any suggest what is causing the issue / a solution?

cheers

references

Does Github have a place to put reference papers/docs? Or are we still using Zotero separately?

Notebook doesn't work

Hi the temporal data exploration doesn't work as uploaded, on cell 27 it uses an argument variable called building (function plot_buildingtype) that has not been declared

Units ??

Is there anywhere where the units are defined?? I can't seem to find it anywhere...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.