Light

ryanwangzf / survtrace Goto Github PK

View Code? Open in Web Editor NEW

42.0 3.0 9.0 160 KB

SurvTRACE: Transformers for Survival Analysis with Competing Events

License: MIT License

Jupyter Notebook 41.53% Python 58.47%

survival-analysis time-to-event

survtrace's Introduction

⭐SurvTRACE: Transformers for Survival Analysis with Competing Events

This repo provides the implementation of SurvTRACE for survival analysis. It is easy to use with only the following codes:

from survtrace.dataset import load_data
from survtrace.model import SurvTraceSingle
from survtrace import Evaluator
from survtrace import Trainer
from survtrace import STConfig

# use METABRIC dataset
STConfig['data'] = 'metabric'
df, df_train, df_y_train, df_test, df_y_test, df_val, df_y_val = load_data(STConfig)

# initialize model
model = SurvTraceSingle(STConfig)

# execute training
trainer = Trainer(model)
trainer.fit((df_train, df_y_train), (df_val, df_y_val))

# evaluating
evaluator = Evaluator(df, df_train.index)
evaluator.eval(model, (df_test, df_y_test))

print("done!")

🔥See the demo

Please refer to experiment_metabric.ipynb and experiment_support.ipynb !

🔥How to config the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

or try to install from the requirement.txt

pip3 install -r requirements.txt

🔥How to get SEER data

Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.
After complete the step one, we should have the following seerstat software for data access. Open it and sign in with the username and password sent by seer.

Use seerstat to open the ./data/seer.sl file, we shall see the following.

Click on the 'excute' icon to request from the seer database. We will obtain a csv file.

move the csv file to ./data/seer_raw.csv, then run the python script process_seer.py, as
```
python process_seer.py
```
we will obtain the processed seer data named seer_processed.csv.

📝Functions

single event survival analysis
competing events survival analysis
multi-task learning
automatic hyperparameter grid-search

😄If you find this result interesting, please consider to cite this paper:

@article{wang2021survtrace,
      title={Surv{TRACE}: Transformers for Survival Analysis with Competing Events}, 
      author={Zifeng Wang and Jimeng Sun},
      year={2021},
      eprint={2110.00855},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

survtrace's People

Contributors

Stargazers

Watchers

Forkers

jhanratty junting98 ezioclark wzx-zzsdad juancq ivanunito whcsu

survtrace's Issues

equation 16 counterintuitive

hi, i'm new to survival analysis and equation 16 seems rather counterintuitive. To be more specific shouldn't the hazard ratio before i (so the 3rd term in equation 16) be close to zero(meaning survival rate high until event at i), but since the loss should decrease, i think it forces lambda to get bigger... which i dont understand. if i look at Reference 20's equation this is intuitive but may i ask how you derived your equation? i'm new to this field so please understand my short knowledge. thanks

your paper's eq 16

this is reference

Fail to install the enviroment

Hi,Zifeng:
Your work is very good and I really want to use this method .
But i meet some problem at the first step:

conda env create --name survtrace --file=survtrace.yml

Here is the problem

And my compter is _Architecture:

my conda version is 4.10.3.
thanks a lot,I will be appreciated if you reply me this stupid question

Wrong License

Hey,

You set your license to MIT, but scikit-survival is GPL. Since GNU licenses are contagious, this repository is GPL as well. Please consider updating it to avoid legal issues.

How to prepare model input from my own data?

Hi Dr. Wang,
I'm a surgeon in China. I'm really interested in your SurvTrace and i'd like to apply it on my research to predict the prognosis of cancer patients. However, I do just learned python not long ago. Could you show me how to prepare the model input from local files? E.g. A matrix (mxn), the row is patients ID, the col containing overall survival time, events, and features for modeling.

Checkpoint file name is not unique to the run

checkpoint file name is always the same by default.
If you run multiple concurrent survtrace jobs (e.g. on different datasets) they will crash each other or even scramble numerical results even in the presence of a RNG seed.

Question about inverse propensity score loss

Hi Zifeng,

I read your paper on Arxiv and got interested in the inverse propensity score loss that you implemented for debiasing the competing events. However, I still have some questions about this and hope you can help me with them.

I can see from the paper that IPS-weighting, , is trained to estimate the true distribution of the competing events. Based on your equation 20, , this IPS weighting is seemed to be obtained from scratch using a different model, not a downstream model after the latent representation .

However, I didn't find this implementation in this repo. Can you let me know in which part did you implement this IPS loss? Sorry if the questions are naive or due to my carelessness. I'm looking forward to hearing from you.

Best,
Shiang

Cannot run experiment with SEER

Hi,

Thank for the great paper.
I am trying to reproduce your code.
But I cannot run your notebook for experiment with SEER dataset.
I got this error

Could you help to check it out?

Best,
Hoang

Inquiry about how to visualize an attention map

Would you please share with me how to visualize an attention map?
You have provided an Attention Map within your paper.
How did you make this figure?
I am trying to visualize a similar figure by trial and error.
How do you use the last layer of attention?
I would appreciate it if you could share your method if possible.
Best Regards.

Account for sequential data in SEER

Hi,

I am just curious that how you transform seer data into sequential data. As shown in the seer data, there is time-invariant data (one patient one row). So I am wondering what your exact input is for the seer data.

Thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.