Coder Social home page Coder Social logo

ny-house-price-estimator's Introduction

NY House Price Estimator

Table of Contents

Aim

This is an example repository illustrating the use of KDL on a simple machine learning classification task.

In this project, the aim is to create an estimator for the price of a rental property in New York. We are using a publicly available dataset containing property attributes such as type of property (house, apartment, etc.), number of bedrooms and bathrooms, neighbourhood, and amenities (such as breakfast, TV, internet, WiFi, etc.). Our aim is to create a classification model that can use the aforementioned attributes to predict whether the rental price of the property falls within the low-, mid-, high- or luxury-priced category. The data are imbalanced, since there are much more low- and mid-priced properties than luxury-priced properties, requiring us to handle class balance in model training.

Project structure

The project repository has the following directory structure:

├── goals         <- Acceptance criteria (typically as automated tests describing desired behaviour)
│
├── lab
│   │
│   ├── analysis  <- Analyses of data, models etc. (notebooks)
│   │
│   ├── lib       <- Importable functions used by analysis notebooks and processes scripts
│   │                (including unit tests)
│   │
│   └── processes           <- Source code for reproducible workflow steps.
│       │
│       ├── preprocess_data
│       │   ├── main.py                      <- Process main
│       │   ├── mappings.py                  <- Variable mappings
│       │   ├── process_house_data.py        <- Process logic
│       │   └── process_house_data_test.py   <- Integration test for the process
│       │
│       ├── train_model
│       │   ├── main.py                      <- Process main
│       │   ├── classifiers.py               <- Process logic
│       │   └── classifiers_test.py          <- Integration test for the process
│       │
│       ├── config.ini         <- Config for Drone runs
│       └── config_test.ini    <- Config for integration tests
|
├── runtimes      <- Code for generating deployment runtimes (.krt)
│
├── .drone.yml    <- Instructions for Drone runners
├── .env          <- Local environment variables for VScode IDE
├── .gitignore    
└── README.md     <- Main README

Launching experiment runs (Drone)

To enable full traceability and reproducibility, all executions that generate results or artifacts (e.g. processed datasets, trained models, validation metrics, plots of model validation, etc.) are run on Drone runners instead of the user's Jupyter or Vscode tools.

Pipeline executions are launched by the trigger specified in .drone.yml for each pipeline. An example is shown below:

trigger:
  ref:
    - refs/tags/preprocess-data-*

With this trigger in place, the pipeline will be executed on Drone agents whenever a tag matching the pattern specified in the trigger is pushed to the remote repository, for example:

git tag preprocess-data-v0
git push origin preprocess-data-v0

Note: When using an external repository (e.g. hosted on Github), a delay in synchronization between Gitea and the mirrored external repo may cause a delay in launching the pipeline on the Drone runners. This delay can be overcome by manually forcing a synchronization of the repository in the Gitea UI Settings.

Testing

The repository contains some unit tests, e.g. in lab/lib/data_processing_test.py.

To run the tests, you may use the terminal:

$ pytest lab

... or the various GUI options provided in VS Code for running tests.

ny-house-price-estimator's People

Contributors

igzvasjaurbancic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.