Coder Social home page Coder Social logo

gmel's Introduction

Intro

This repo contains the code and data for our paper Learning Geo-Contextual Embeddings for Commuting Flow Prediction, published at the Thirty-Fourth AAAI Conference on Artificial Intelligence (2020). Click here for the full paper.

GMEL makes use of land use information, such as NYC's PLUTO, and commuting trips information to study the problem of commuting flow prediction using graph neural network.

GMEL Framework

Table of contents

Prerequisites

The code is written using Python. The following Python packages are required:

Python 3.x
Pytorch 1.4.0
DeepGraphLibrary 0.4.1
Scikit-Learn 0.21.3
Numpy 1.17.2
Pandas 0.25.1
numpy_indexed

Other tools:

Tensorboard

Structure

code directory contains all the code to run the experiments:

  • 01train_GMEL.py is the code to run GMEL training with multiple experimental settings. It will import train.py to train with each setting.

  • 02train_Predictor.py is the code to run predictor training corresponding to the settings of GMEL. It will need the embeddings generated from 01train_GMEL.py.

  • train.py is the code to train a single GMEL with a specific setting. If you are interested in the process of training a graph neural network, this is the code you should read.

  • model.py is the code for GMEL. Basically, it is a graph neural network combined with the interface of multitask loss, generating embeddings, etc. If you are interested in the multitask learning and graph neural network, this is the code you should read.

  • layers.py is the code for graph neural network layers. If you are interested in the message propagation process in graph neural network, this is the code you should read.

  • utils.py is the code for our-own-written tools, e.g. data loader, mini-batch generator, evaluation metrics etc. If you are interested in how we preprocess the data, this is the code you should read.

data directory contains all the data described in our paper.

  • LODES contains the commuting trips data collected from LODES. We have randomly split the data into three pieces, i.e. train, validation and test, with the ratio of 6 : 2 : 2. You could merge these dataset and shuffle to create your own dataset if you like.

  • PLUTO contains the aggregated census tract features from PLUTO. Notice that the presented data are preprocessed using location quotient. Location quotient is a relative measure which tells how salient is the feature of a sample in contrast to the entire sample set.

  • CensusTract2010 contains the census tract adjacency matrix and a node ID mapping table. The census tract version we used is 2010. The original census tract ID is preserved so that you can check the location of census tract on the Internet.

  • OSRM contains the distance matrix of census tracts measured by OSRM.

Usage

Step 1. Set your working director to code

Step 2. Run python 01train_GMEL.py. This might take a long time if you add more experimental settings. While training, you can check the running status in code/log or open tensorboard setting logdir to code/runs to check the gradient descent process, etc.

Step 3. Having finished Step 2, run python 02train_Predictor.py. When finished running the code, you can check the code/outputs directory for the testing performance of each GMEL setting. If you want to explore the model, code/models stores all the models. If you are interested in the generated embeddings, you can check code/embeddings to get the embeddings using Numpy.

Citation

@inproceedings{liu2020gmel,
  title={Learning Geo-Contextual Embeddings for Commuting Flow Prediction},
  author={Liu, Zhicheng and Miranda, Fabio and Xiong, Weiting and Yang, Junyan and Wang, Qiao and Silva, Claudio T.},
  booktitle={Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence},
  year={2020}
}

Contact

Please send any questions you might have about the code and/or the algorithm to zhi-cheng-liu AT seu.edu.cn (remove dash - in the address).

gmel's People

Contributors

jackmiemie avatar

Stargazers

 avatar  avatar  avatar 张云柯 avatar  avatar Fengshi avatar Sumin Han avatar Gurban avatar  avatar mfcai avatar Kazi Shahrukh Omar avatar Yuan Yuan avatar Rachel Green avatar  avatar Paulo Olveira avatar Zhuoli Yin avatar Konstantin Klemmer avatar loooffeeeey avatar Ramsey avatar Jie Feng avatar  avatar  avatar Fabio Miranda avatar

Watchers

 avatar  avatar paper2code - bot avatar

gmel's Issues

The file census_tract_trip_duration_matrix_bycar.csv seems incomplete

When I run code/02train_Predictor.py I got an error
IndexError: index 2088 is out of bounds for axis 0 with size 2075
at

feat_dist = dist[triplets[:, 0], triplets[:, 1]].reshape(-1, 1)

I checked that the distm from GMEL-master/data/OSRM/census_tract_trip_duration_matrix_bycar.csv shape (2075, 2168). There should be 2168 census tract so the distance matrix should shape (2168, 2168).
And then I checked the file GMEL-master/data/OSRM/census_tract_trip_duration_matrix_bycar.csv and found the index representing geocode of census tracts between 5005000 and 5990100 are missing. So the code can't work normally.

Could you please provide complete file of GMEL-master/data/OSRM/census_tract_trip_duration_matrix_bycar.csv? Thanks for reading.

How is the adjacency matrix created?

Thanks for sharing the code. I have tried to read the paper and the documentation, but still unable to figure out how the adjacency matrix ("adjacency_matrix_withweight.csv") is created. Would you please help to explain a bit?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.