Coder Social home page Coder Social logo

survtrace's Introduction

⭐SurvTRACE: Transformers for Survival Analysis with Competing Events

This repo provides the implementation of SurvTRACE for survival analysis. It is easy to use with only the following codes:

from survtrace.dataset import load_data
from survtrace.model import SurvTraceSingle
from survtrace import Evaluator
from survtrace import Trainer
from survtrace import STConfig

# use METABRIC dataset
STConfig['data'] = 'metabric'
df, df_train, df_y_train, df_test, df_y_test, df_val, df_y_val = load_data(STConfig)

# initialize model
model = SurvTraceSingle(STConfig)

# execute training
trainer = Trainer(model)
trainer.fit((df_train, df_y_train), (df_val, df_y_val))

# evaluating
evaluator = Evaluator(df, df_train.index)
evaluator.eval(model, (df_test, df_y_test))

print("done!")

🔥See the demo

Please refer to experiment_metabric.ipynb and experiment_support.ipynb !

🔥How to config the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

or try to install from the requirement.txt

pip3 install -r requirements.txt

🔥How to get SEER data

  1. Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.

  2. After complete the step one, we should have the following seerstat software for data access. Open it and sign in with the username and password sent by seer.

  1. Use seerstat to open the ./data/seer.sl file, we shall see the following.

Click on the 'excute' icon to request from the seer database. We will obtain a csv file.

  1. move the csv file to ./data/seer_raw.csv, then run the python script process_seer.py, as

    python process_seer.py

    we will obtain the processed seer data named seer_processed.csv.

📝Functions

  • single event survival analysis
  • competing events survival analysis
  • multi-task learning
  • automatic hyperparameter grid-search

😄If you find this result interesting, please consider to cite this paper:

@article{wang2021survtrace,
      title={Surv{TRACE}: Transformers for Survival Analysis with Competing Events}, 
      author={Zifeng Wang and Jimeng Sun},
      year={2021},
      eprint={2110.00855},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

survtrace's People

Contributors

ivanrossi avatar ryanwangzf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

survtrace's Issues

equation 16 counterintuitive

hi, i'm new to survival analysis and equation 16 seems rather counterintuitive. To be more specific shouldn't the hazard ratio before i (so the 3rd term in equation 16) be close to zero(meaning survival rate high until event at i), but since the loss should decrease, i think it forces lambda to get bigger... which i dont understand. if i look at Reference 20's equation this is intuitive but may i ask how you derived your equation? i'm new to this field so please understand my short knowledge. thanks

your paper's eq 16
image

this is reference
image

Fail to install the enviroment

Hi,Zifeng:
Your work is very good and I really want to use this method .
But i meet some problem at the first step:

conda env create --name survtrace --file=survtrace.yml

Here is the problem

  • Collecting package metadata (repodata.json): done
  • Solving environment: failed
  • ResolvePackageNotFound:
  • - vs2015_runtime==14.27.29016=h5e58377_2
  • - m2w64-gmp==6.1.0=2
  • - cvxopt==1.2.5=py36h542453d_0
  • - glpk==4.65=hdc00fd2_2
  • - multiprocess==0.70.11.1=py36hf4a77e7_0
  • - mkl_fft==1.3.0=py36h46781fe_0
  • - icc_rt==2019.0.0=h0cc432a_1
  • - setuptools==58.0.4=py36haa95532_0
  • - libcblas==3.9.0=5_hd5c7e75_netlib
  • - fastcache==1.1.0=py36he774522_0
  • - sqlite==3.36.0=h2bbff1b_0
  • - wincertstore==0.2=py36h7fe50ca_0
  • - certifi==2021.5.30=py36ha15d459_0
  • - vc==14.2=h21ff451_1
  • - python==3.6.13=h3758d61_0
  • - scikit-learn==0.22.1=py36h7208079_1
  • - numexpr==2.7.3=py36hcbcaa1e_0
  • - scikit-survival==0.14.0=py36he350917_0
  • - scs==2.1.2=py36haa4650d_0
  • - ecos==2.0.7.post1=py36haa4650d_3
  • - msys2-conda-epoch==20160418=1
  • - scipy==1.5.2=py36h9439919_0
  • - mkl_random==1.1.1=py36h47e9c7a_0
  • - numpy-base==1.19.2=py36ha3acd2a_0
  • - m2w64-gcc-libs==5.3.0=7
  • - cvxpy-base==1.0.31=py36h6538335_0
  • - intel-openmp==2021.3.0=haa95532_3372
  • - m2w64-libwinpthread-git==5.0.0.4634.697f757=2
  • - libblas==3.9.0=1_h8933c1f_netlib
  • - mkl-service==2.3.0=py36h196d8e1_0
  • - pandas==1.1.5=py36hd77b12b_0
  • - osqp==0.5.0=py36haa4650d_3
  • - m2w64-gcc-libgfortran==5.3.0=6
  • - pip==21.0.1=py36haa95532_0
  • - m2w64-gcc-libs-core==5.3.0=7

And my compter is _Architecture:

  • x86_64
  • CPU op-mode(s): 32-bit, 64-bit
  • Byte Order: Little Endian
  • Address sizes: 46 bits physical, 48 bits virtual
  • CPU(s): 32
  • On-line CPU(s) list: 0-31
  • Thread(s) per core: 2
  • Core(s) per socket: 16
  • Socket(s): 1
  • NUMA node(s): 1
  • Vendor ID: GenuineIntel
  • CPU family: 6
  • Model: 85
  • Model name: Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz
  • Stepping: 7
  • CPU MHz: 3400.000
  • CPU max MHz: 4100.0000
  • CPU min MHz: 1200.0000
  • BogoMIPS: 6800.00
  • Virtualization: VT-x
  • L1d cache: 512 KiB
  • L1i cache: 512 KiB
  • L2 cache: 16 MiB
  • L3 cache: 35.8 MiB
  • NUMA node0 CPU(s): 0-31
  • Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
  • Vulnerability L1tf: Not affected
  • Vulnerability Mds: Not affected
  • Vulnerability Meltdown: Not affected
  • Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
  • ia prctl and seccomp
  • Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user
  • pointer sanitization
  • Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RS
  • B filling
  • Vulnerability Srbds: Not affected
  • Vulnerability Tsx async abort: Mitigation; TSX disabled
  • Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
  • r pge mca cmov pat pse36 clflush dts acpi mmx f
  • xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
  • tscp lm constant_tsc art arch_perfmon pebs bts
  • rep_good nopl xtopology nonstop_tsc cpuid aperf
  • mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s
  • mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid d
  • ca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadli
  • ne_timer aes xsave avx f16c rdrand lahf_lm abm
  • 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 inv
  • pcid_single intel_ppin ssbd mba ibrs ibpb stibp
  • ibrs_enhanced tpr_shadow vnmi flexpriority ept
  • vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep
  • bmi2 erms invpcid cqm mpx rdt_a avx512f avx512
  • dq rdseed adx smap clflushopt clwb intel_pt avx
  • 512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
  • xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm
  • mbm_local dtherm ida arat pln pts hwp hwp_act
  • window hwp_epp hwp_pkg_req pku ospke avx512_vnn
  • i md_clear flush_l1d arch_capabilities

my conda version is 4.10.3.
thanks a lot,I will be appreciated if you reply me this stupid question

Wrong License

Hey,

You set your license to MIT, but scikit-survival is GPL. Since GNU licenses are contagious, this repository is GPL as well. Please consider updating it to avoid legal issues.

How to prepare model input from my own data?

Hi Dr. Wang,
I'm a surgeon in China. I'm really interested in your SurvTrace and i'd like to apply it on my research to predict the prognosis of cancer patients. However, I do just learned python not long ago. Could you show me how to prepare the model input from local files? E.g. A matrix (mxn), the row is patients ID, the col containing overall survival time, events, and features for modeling.

Checkpoint file name is not unique to the run

checkpoint file name is always the same by default.
If you run multiple concurrent survtrace jobs (e.g. on different datasets) they will crash each other or even scramble numerical results even in the presence of a RNG seed.

Question about inverse propensity score loss

Hi Zifeng,

I read your paper on Arxiv and got interested in the inverse propensity score loss that you implemented for debiasing the competing events. However, I still have some questions about this and hope you can help me with them.

I can see from the paper that IPS-weighting, equation, is trained to estimate the true distribution of the competing events. Based on your equation 20, equation, this IPS weighting is seemed to be obtained from scratch using a different model, not a downstream model after the latent representation equation.

However, I didn't find this implementation in this repo. Can you let me know in which part did you implement this IPS loss? Sorry if the questions are naive or due to my carelessness. I'm looking forward to hearing from you.

Best,
Shiang

Cannot run experiment with SEER

Hi,

Thank for the great paper.
I am trying to reproduce your code.
But I cannot run your notebook for experiment with SEER dataset.
I got this error

image
Could you help to check it out?

Best,
Hoang

Inquiry about how to visualize an attention map

Would you please share with me how to visualize an attention map?
You have provided an Attention Map within your paper.
How did you make this figure?
I am trying to visualize a similar figure by trial and error.
How do you use the last layer of attention?
I would appreciate it if you could share your method if possible.
Best Regards.

Account for sequential data in SEER

Hi,

I am just curious that how you transform seer data into sequential data. As shown in the seer data, there is time-invariant data (one patient one row). So I am wondering what your exact input is for the seer data.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.