Coder Social home page Coder Social logo

hkuds / urbangpt Goto Github PK

View Code? Open in Web Editor NEW
106.0 6.0 12.0 1.62 MB

"UrbanGPT: Spatio-Temporal Large Language Models"

Home Page: https://urban-gpt.github.io

Python 99.47% Shell 0.53%
fundation-models instruction-tuning large-language-models pre-trained-model smart-cities spatio-temporal-prediction urban-computing urban-data-science

urbangpt's Introduction

UrbanGPT: Spatio-Temporal Large Language Models

A pytorch implementation for the paper: [UrbanGPT: Spatio-Temporal Large Language Models]

Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang* (*Correspondence)

Data Intelligence Lab@University of Hong Kong, South China University of Technology, Baidu Inc


YouTube • 🌐 中文博客

This repository hosts the code, data, and model weights of UrbanGPT.


🎉 News

🎯🎯📢📢 We upload the models and data used in our UrbanGPT on 🤗 Huggingface. We highly recommend referring to the table below for further details:

🤗 Huggingface Address 🎯 Description
https://huggingface.co/bjdwh/UrbanGPT It's the checkpoint of our UrbanGPT based on Vicuna-7B-v1.5-16k tuned on instruction data train-data
https://huggingface.co/datasets/bjdwh/ST_data_urbangpt We release a portion of the dataset for evaluation.
  • [2023.02.23] 🚀🚀 Release the code of UrbanGPT.
  • [2023.02.29] Add video.
  • [2023.03.05] Release the full paper.
  • [2023.03.11] Upload the new checkpoint of our UrbanGPT.

👉 TODO

  • Release more st dataset.
  • Release instruction generation codes.
  • Release baselines codes.
  • ...

Introduction

In this work, we present a spatio-temporal large language model that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks. To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables large language models (LLMs) to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity. Extensive experimental findings highlight the potential of building LLMs for spatio-temporal learning, particularly in zero-shot scenarios.

The detailed framework of the proposed UrbanGPT.

Demo Video

urbangpt_1.mp4

Getting Started

Table of Contents:


1. Code Structure [Back to Top]

.
|   README.md
|   urbangpt_eval.sh
|   urbangpt_train.sh
|   
+---checkpoints
|   \---st_encoder
|           pretrain_stencoder.pth
|           
+---playground
|   |   inspect_conv.py
|   |   
|   +---test_embedding
|   |       README.md
|   |       test_classification.py
|   |       test_semantic_search.py
|   |       test_sentence_similarity.py
|   |       
|   \---test_openai_api
|           anthropic_api.py
|           openai_api.py
|           
+---tests
|       test_openai_curl.sh
|       test_openai_langchain.py
|       test_openai_sdk.py
|       
\---urbangpt
    |   constants.py
    |   conversation.py
    |   utils.py
    |   __init__.py
    |   
    +---eval
    |   |   run_urbangpt.py                     # evaluation
    |   |   run_vicuna.py
    |   |   
    |   \---script
    |           run_model_qa.yaml
    |           
    +---model
    |   |   apply_delta.py
    |   |   apply_lora.py
    |   |   builder.py
    |   |   compression.py
    |   |   convert_fp16.py
    |   |   make_delta.py
    |   |   model_adapter.py
    |   |   model_registry.py
    |   |   monkey_patch_non_inplace.py
    |   |   STLlama.py                          # model
    |   |   utils.py
    |   |   __init__.py
    |   |   
    |   \---st_layers
    |           args.py
    |           ST_Encoder.conf
    |           ST_Encoder.py                   # ST-Encoder
    |           __init__.py
    |           
    +---protocol
    |       openai_api_protocol.py
    |       
    +---serve
    |   |   api_provider.py
    |   |   bard_worker.py
    |   |   cacheflow_worker.py
    |   |   cli.py
    |   |   controller.py
    |   |   controller_graph.py
    |   |   gradio_block_arena_anony.py
    |   |   gradio_block_arena_named.py
    |   |   gradio_css.py
    |   |   gradio_patch.py
    |   |   gradio_web_server.py
    |   |   gradio_web_server_graph.py
    |   |   gradio_web_server_multi.py
    |   |   huggingface_api.py
    |   |   inference.py
    |   |   model_worker.py
    |   |   model_worker_graph.py
    |   |   openai_api_server.py
    |   |   register_worker.py
    |   |   test_message.py
    |   |   test_throughput.py
    |   |   __init__.py
    |   |   
    |   +---examples
    |   |       extreme_ironing.jpg
    |   |       waterview.jpg
    |   |       
    |   +---gateway
    |   |       nginx.conf
    |   |       README.md
    |   |       
    |   \---monitor
    |           basic_stats.py
    |           clean_battle_data.py
    |           elo_analysis.py
    |           hf_space_leaderboard_app.py
    |           monitor.py
    |           
    \---train
            llama2_flash_attn_monkey_patch.py
            llama_flash_attn_monkey_patch.py
            stchat_trainer.py
            train_lora.py
            train_mem.py
            train_st.py                         # train
            

2.Environment [Back to Top]

Please first clone the repo and install the required environment, which can be done by running the following commands:

conda create -n urbangpt python=3.9.13

conda activate urbangpt

# Torch with CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# To support vicuna base model
pip3 install "fschat[model_worker,webui]"

# To install pyg and pyg-relevant packages
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html

# Clone our UrabnGPT or download it
git clone https://github.com/HKUDS/UrbanGPT.git
cd UrbanGPT

# Install required libraries
# (The recommendation is to install separately using the following method)
pip install deepspeed
pip install ray
pip install einops
pip install wandb
# (There is a version compatibility issue between "flash-attn" and "transformers". Please refer to the flash-attn [GitHub URL](https://github.com/Dao-AILab/flash-attention) for more information.)
pip install flash-attn==2.3.5  # or download from (https://github.com/Dao-AILab/flash-attention/releases, e.g. flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp39-cp39-linux_x86_64.whl)
pip install transformers==4.34.0

# (or you can install according to the requirements file.)
pip install -r requirements.txt

3. Training UrbanGPT [Back to Top]

3.1. Preparing Pre-trained Checkpoint [Back to Top]

UrabnGPT is trained based on following excellent existing models. Please follow the instructions to prepare the checkpoints.

  • Vicuna: Prepare our base model Vicuna, which is an instruction-tuned chatbot and base model in our implementation. Please download its weights here. We generally utilize v1.5 and v1.5-16k model with 7B parameters. You should update the 'config.json' of vicuna, for example, the 'config.json' in v1.5-16k can be found in config.json

  • Spatio-temporal Encoder: We employ a simple TCNs-based spatio-temporal encoder to encode the spatio-temporal dependencies. The weights of st_encoder are pre-trained through a typical multi-step spatio-temporal prediction task.

  • Spatio-temporal Train Data: We utilize pre-training data consisting of New York City's taxi, bike, and crime data, including spatio-temporal statistics, recorded timestamps, and information about regional points of interest (POIs). These data are organized in train_data. Please download it and put it at ./UrbanGPT/ST_data_urbangpt/train_data

3.2. Instruction Tuning [Back to Top]

  • Start tuning: After the aforementioned steps, you could start the instruction tuning by filling blanks at urbangpt_train.sh. There is an example as below:
# to fill in the following path to run our UrbanGPT!
model_path=./checkpoints/vicuna-7b-v1.5-16k
instruct_ds=./ST_data_urbangpt/train_data/multi_NYC.json
st_data_path=./ST_data_urbangpt/train_data/multi_NYC_pkl.pkl
pretra_ste=ST_Encoder
output_model=./checkpoints/UrbanGPT

wandb offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=8 --master_port=20001 \
    urbangpt/train/train_mem.py \
    --model_name_or_path ${model_path} \
    --version v1 \
    --data_path ${instruct_ds} \
    --st_content ./TAXI.json \
    --st_data_path ${st_data_path} \
    --st_tower ${pretra_ste} \
    --tune_st_mlp_adapter True \
    --st_select_layer -2 \
    --use_st_start_end \
    --bf16 True \
    --output_dir ${output_model} \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2400 \
    --save_total_limit 1 \
    --learning_rate 2e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

4. Evaluating UrbanGPT [Back to Top]

4.1. Preparing Checkpoints and Data [Back to Top]

  • Checkpoints: You could try to evaluate UrbanGPT by using your own model or our released checkpoints.
  • Data: We split test sets for NYC-taxi datasets and make the instruction data for evaluation. Please refer to the evaluating.

4.2. Running Evaluation [Back to Top]

You could start the second stage tuning by filling blanks at urbangpt_eval.sh. There is an example as below:

# to fill in the following path to evaluation!
output_model=./checkpoints/tw2t_multi_reg-cla-gird
datapath=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi.json
st_data_path=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi_pkl.pkl
res_path=./result_test/cross-region/NYC_taxi
start_id=0
end_id=51920
num_gpus=8

python ./urbangpt/eval/run_urbangpt.py --model-name ${output_model}  --prompting_file ${datapath} --st_data_path ${st_data_path} --output_res_path ${res_path} --start_id ${start_id} --end_id ${end_id} --num_gpus ${num_gpus}

Citation

If you find UrbanGPT useful in your research or applications, please kindly cite:

@misc{li2024urbangpt,
      title={UrbanGPT: Spatio-Temporal Large Language Models}, 
      author={Zhonghang Li and Lianghao Xia and Jiabin Tang and Yong Xu and Lei Shi and Long Xia and Dawei Yin and Chao Huang},
      year={2024},
      eprint={2403.00813},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

You may refer to related work that serves as foundations for our framework and code repository, Vicuna. We also partially draw inspirations from GraphGPT. The design of our website and README.md was inspired by NExT-GPT, and the design of our system deployment was inspired by gradio and Baize. Thanks for their wonderful works.

urbangpt's People

Contributors

lzh-ys1998 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

urbangpt's Issues

多卡训练问题

作者你好,我不咋接触大模型,这几天部署的时候,训练总是报多卡训练的问题,查阅之后说可能是多进程的问题,现在我想记录日志,请问应该在哪里添加设置呢,麻烦了

模型并行 or 数据并行?训练的GPU资源情况?

对vicuna-1.5-7b-16k 进行微调时的GPU情况是怎么样?模型并行 or 数据并行?

我们正在尝试使用8张3090(每张24G)复现微调模型的过程,但是发生了以下错误:

File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 6 has a total capacty of 23.69 GiB of which 84.94 MiB is free. Process 40141 has 23.61 GiB memory in use. Of the allocated memory 22.62 GiB is allocated by PyTorch, and 8.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2024-03-29 16:22:19,103] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 616 closing signal SIGTERM
[2024-03-29 16:22:19,103] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 618 closing signal SIGTERM
[2024-03-29 16:22:19,103] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 619 closing signal SIGTERM
[2024-03-29 16:22:19,104] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 621 closing signal SIGTERM
[2024-03-29 16:22:19,104] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 622 closing signal SIGTERM
[2024-03-29 16:22:20,873] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 615) of binary: /usr/local/miniconda3/bin/python
Traceback (most recent call last):
File "/usr/local/miniconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/miniconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 810, in
main()
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

urbangpt/train/train_mem.py FAILED

Failures:
[1]:
time : 2024-03-29_16:22:19
host : train-urbangpt-llm-kg-0
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 617)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-29_16:22:19
host : train-urbangpt-llm-kg-0
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 620)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-03-29_16:22:19
host : train-urbangpt-llm-kg-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 615)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

请帮我看一看

关于encoder的疑问

尊敬的作者我有两个关于encoder的疑问:

  1. 如何在训练过程中不使用st_encoder? ST_Llama.py中是否可以直接更改?
  2. 请问有pretrain encoder的pipeline吗?我想要在其他数据集上试着训练~
    谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.