UrbanGPT: Spatio-Temporal Large Language Models

A pytorch implementation for the paper: [UrbanGPT: Spatio-Temporal Large Language Models]

Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang* (*Correspondence)

Data Intelligence Lab@University of Hong Kong, South China University of Technology, Baidu Inc

This repository hosts the code, data, and model weights of UrbanGPT.

🎉 News

🎯🎯📢📢 We upload the models and data used in our UrbanGPT on 🤗 Huggingface. We highly recommend referring to the table below for further details:

🤗 Huggingface Address	🎯 Description
https://huggingface.co/bjdwh/UrbanGPT	It's the checkpoint of our UrbanGPT based on Vicuna-7B-v1.5-16k tuned on instruction data train-data
https://huggingface.co/datasets/bjdwh/ST_data_urbangpt	We release a portion of the dataset for evaluation.

[2023.02.23] 🚀🚀 Release the code of UrbanGPT.
[2023.02.29] Add video.
[2023.03.05] Release the full paper.
[2023.03.11] Upload the new checkpoint of our UrbanGPT.

👉 TODO

Release more st dataset.
Release instruction generation codes.
Release baselines codes.
...

Introduction

In this work, we present a spatio-temporal large language model that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks. To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables large language models (LLMs) to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity. Extensive experimental findings highlight the potential of building LLMs for spatio-temporal learning, particularly in zero-shot scenarios.

Demo Video

urbangpt_1.mp4

Getting Started

1. Code Structure [Back to Top]

.
|   README.md
|   urbangpt_eval.sh
|   urbangpt_train.sh
|   
+---checkpoints
|   \---st_encoder
|           pretrain_stencoder.pth
|           
+---playground
|   |   inspect_conv.py
|   |   
|   +---test_embedding
|   |       README.md
|   |       test_classification.py
|   |       test_semantic_search.py
|   |       test_sentence_similarity.py
|   |       
|   \---test_openai_api
|           anthropic_api.py
|           openai_api.py
|           
+---tests
|       test_openai_curl.sh
|       test_openai_langchain.py
|       test_openai_sdk.py
|       
\---urbangpt
    |   constants.py
    |   conversation.py
    |   utils.py
    |   __init__.py
    |   
    +---eval
    |   |   run_urbangpt.py                     # evaluation
    |   |   run_vicuna.py
    |   |   
    |   \---script
    |           run_model_qa.yaml
    |           
    +---model
    |   |   apply_delta.py
    |   |   apply_lora.py
    |   |   builder.py
    |   |   compression.py
    |   |   convert_fp16.py
    |   |   make_delta.py
    |   |   model_adapter.py
    |   |   model_registry.py
    |   |   monkey_patch_non_inplace.py
    |   |   STLlama.py                          # model
    |   |   utils.py
    |   |   __init__.py
    |   |   
    |   \---st_layers
    |           args.py
    |           ST_Encoder.conf
    |           ST_Encoder.py                   # ST-Encoder
    |           __init__.py
    |           
    +---protocol
    |       openai_api_protocol.py
    |       
    +---serve
    |   |   api_provider.py
    |   |   bard_worker.py
    |   |   cacheflow_worker.py
    |   |   cli.py
    |   |   controller.py
    |   |   controller_graph.py
    |   |   gradio_block_arena_anony.py
    |   |   gradio_block_arena_named.py
    |   |   gradio_css.py
    |   |   gradio_patch.py
    |   |   gradio_web_server.py
    |   |   gradio_web_server_graph.py
    |   |   gradio_web_server_multi.py
    |   |   huggingface_api.py
    |   |   inference.py
    |   |   model_worker.py
    |   |   model_worker_graph.py
    |   |   openai_api_server.py
    |   |   register_worker.py
    |   |   test_message.py
    |   |   test_throughput.py
    |   |   __init__.py
    |   |   
    |   +---examples
    |   |       extreme_ironing.jpg
    |   |       waterview.jpg
    |   |       
    |   +---gateway
    |   |       nginx.conf
    |   |       README.md
    |   |       
    |   \---monitor
    |           basic_stats.py
    |           clean_battle_data.py
    |           elo_analysis.py
    |           hf_space_leaderboard_app.py
    |           monitor.py
    |           
    \---train
            llama2_flash_attn_monkey_patch.py
            llama_flash_attn_monkey_patch.py
            stchat_trainer.py
            train_lora.py
            train_mem.py
            train_st.py                         # train

2.Environment [Back to Top]

Please first clone the repo and install the required environment, which can be done by running the following commands:

conda create -n urbangpt python=3.9.13

conda activate urbangpt

# Torch with CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# To support vicuna base model
pip3 install "fschat[model_worker,webui]"

# To install pyg and pyg-relevant packages
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html

# Clone our UrabnGPT or download it
git clone https://github.com/HKUDS/UrbanGPT.git
cd UrbanGPT

# Install required libraries
# (The recommendation is to install separately using the following method)
pip install deepspeed
pip install ray
pip install einops
pip install wandb
# （There is a version compatibility issue between "flash-attn" and "transformers". Please refer to the flash-attn [GitHub URL](https://github.com/Dao-AILab/flash-attention) for more information.）
pip install flash-attn==2.3.5  # or download from (https://github.com/Dao-AILab/flash-attention/releases, e.g. flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp39-cp39-linux_x86_64.whl)
pip install transformers==4.34.0

# （or you can install according to the requirements file.）
pip install -r requirements.txt

3. Training UrbanGPT [Back to Top]

3.1. Preparing Pre-trained Checkpoint [Back to Top]

UrabnGPT is trained based on following excellent existing models. Please follow the instructions to prepare the checkpoints.

Vicuna: Prepare our base model Vicuna, which is an instruction-tuned chatbot and base model in our implementation. Please download its weights here. We generally utilize v1.5 and v1.5-16k model with 7B parameters. You should update the 'config.json' of vicuna, for example, the 'config.json' in v1.5-16k can be found in config.json
Spatio-temporal Encoder: We employ a simple TCNs-based spatio-temporal encoder to encode the spatio-temporal dependencies. The weights of st_encoder are pre-trained through a typical multi-step spatio-temporal prediction task.
Spatio-temporal Train Data: We utilize pre-training data consisting of New York City's taxi, bike, and crime data, including spatio-temporal statistics, recorded timestamps, and information about regional points of interest (POIs). These data are organized in train_data. Please download it and put it at ./UrbanGPT/ST_data_urbangpt/train_data

3.2. Instruction Tuning [Back to Top]

Start tuning: After the aforementioned steps, you could start the instruction tuning by filling blanks at urbangpt_train.sh. There is an example as below:

# to fill in the following path to run our UrbanGPT!
model_path=./checkpoints/vicuna-7b-v1.5-16k
instruct_ds=./ST_data_urbangpt/train_data/multi_NYC.json
st_data_path=./ST_data_urbangpt/train_data/multi_NYC_pkl.pkl
pretra_ste=ST_Encoder
output_model=./checkpoints/UrbanGPT

wandb offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=8 --master_port=20001 \
    urbangpt/train/train_mem.py \
    --model_name_or_path ${model_path} \
    --version v1 \
    --data_path ${instruct_ds} \
    --st_content ./TAXI.json \
    --st_data_path ${st_data_path} \
    --st_tower ${pretra_ste} \
    --tune_st_mlp_adapter True \
    --st_select_layer -2 \
    --use_st_start_end \
    --bf16 True \
    --output_dir ${output_model} \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2400 \
    --save_total_limit 1 \
    --learning_rate 2e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

4. Evaluating UrbanGPT [Back to Top]

4.1. Preparing Checkpoints and Data [Back to Top]

Checkpoints: You could try to evaluate UrbanGPT by using your own model or our released checkpoints.
Data: We split test sets for NYC-taxi datasets and make the instruction data for evaluation. Please refer to the evaluating.

4.2. Running Evaluation [Back to Top]

You could start the second stage tuning by filling blanks at urbangpt_eval.sh. There is an example as below:

# to fill in the following path to evaluation!
output_model=./checkpoints/tw2t_multi_reg-cla-gird
datapath=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi.json
st_data_path=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi_pkl.pkl
res_path=./result_test/cross-region/NYC_taxi
start_id=0
end_id=51920
num_gpus=8

python ./urbangpt/eval/run_urbangpt.py --model-name ${output_model}  --prompting_file ${datapath} --st_data_path ${st_data_path} --output_res_path ${res_path} --start_id ${start_id} --end_id ${end_id} --num_gpus ${num_gpus}

Citation

If you find UrbanGPT useful in your research or applications, please kindly cite:

@misc{li2024urbangpt,
      title={UrbanGPT: Spatio-Temporal Large Language Models}, 
      author={Zhonghang Li and Lianghao Xia and Jiabin Tang and Yong Xu and Lei Shi and Long Xia and Dawei Yin and Chao Huang},
      year={2024},
      eprint={2403.00813},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

You may refer to related work that serves as foundations for our framework and code repository, Vicuna. We also partially draw inspirations from GraphGPT. The design of our website and README.md was inspired by NExT-GPT, and the design of our system deployment was inspired by gradio and Baize. Thanks for their wonderful works.