Coder Social home page Coder Social logo

Comments (13)

RyanWangZf avatar RyanWangZf commented on August 30, 2024

Hi there,
You can refer to
https://github.com/RyanWangZf/SurvTRACE/blob/main/data/seer_processed.csv
for the standard input format of the data.

After your data is formatted as that, you can refer to
https://github.com/RyanWangZf/SurvTRACE/blob/main/survtrace/dataset.py

especially the condition under

elif data == "seer":

to set the PATH_DATA, event_list, cols_categorical , cols_standardize, config['num_event'] to be fit to your data.

Moreover, since this survtrace method is built based on transformers, we need a GPU device with like RTX 3060 or sth to train it efficiently.

Feel free to reach out if there is a further question.😀

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Thanks for your response!! I'll try it :)

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Hi Zifeng,
I've done my own survtrace model as your suggestion, thanks!! However, I have a new question that "how to predict the prognosis of a patients/sample?"
I used to train Cox regression model (a simple statistical model) or XGBoost that could provide a predicted score (or a survival function value?) for each patients, so we could use these scores to stratify patients. Thus, I wonder if any way to provide a prediction for each sample and output a dataframe or matrix that includes these prediction? Or, how we use survtrace to assign a predicted score for each patient?
P.S. I modeled survtrace without competing risk, patients only has one event "death or alive".

from survtrace.

RyanWangZf avatar RyanWangZf commented on August 30, 2024

Hi, you can use these four functions to get the predicted hazard/risk/survival rate for patients.

On

def predict_hazard(self, input_ids, batch_size=None):

and below. It outputs hazard/risk/survival rate on each discrete time point corresponding to the time horizons we set

'horizons': [.25, .5, .75], # the discrete intervals are cut at 0%, 25%, 50%, 75%, 100%

It can be used like

surv = model.predict_surv(df_test, batch_size=val_batch_size)
risk = 1 - surv

for more details please refer to the evaluation function

class Evaluator:

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Thanks a lot!! It's really helpful for me 😀

from survtrace.

RyanWangZf avatar RyanWangZf commented on August 30, 2024

Thanks a lot!! It's really helpful for me 😀

It's my pleasure~ welcome to star our projects if it's helpful 😇

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Thanks a lot!! It's really helpful for me 😀

It's my pleasure~ welcome to star our projects if it's helpful 😇

Surely!!

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Hi Zifeng,
Sorry for disturbing you again but I encountered a new question during traning survtrace.😂
When I run a function "load_data" which from "dataset.py", it repoted that
_"UserWarning: Got event/censoring at start time. Should be removed! It is set s.t. it has no contribution to loss. warnings.warn("""Got event/censoring at start time. Should be removed! It is set s.t. it has no contribution to loss.""_
it from a code
y = labtrans.transform(*get_target(df)) # y = (discrete duration, event indicator)
However, I had checked my input data and found no censor or event existed in the begining time. And, I'm sure that all the patients did not meets "duration 0, event 1". May be this question is attributed to:

times = np.quantile(df["duration"][df["event"] == 1.0], horizons).tolist()
times
[389.500000125, 601.9999998, 1120.75]
In this code I see that the time interval has been set, however, I do had some patients whom "duration" are less than 389.5 and "event" are 1 (Death). Does it cause that question? If the answer is yes, I noted that even if I deleted these patients, the "times" will also change, and there will be new patients who do not meet the conditions.

How should I solve this problem? Or this problem does not affect the performance of the model and can therefore be ignored? I am eagerly looking forward to your reply.

P.S. ,part of my data are listed below, in which I show the patients who has the shortest duration in my data:

<style> </style>
duration event AURKA.FGD6 AURKA.GABRP CLDN9.IL27RA DPYD.FANCI
90 1 0 1 1 0
92 0 0 0 0 0
100 0 0 0 1 1
100 0 0 1 1 1
103 1 1 1 0 0
108 0 1 1 0 1
112 0 1 1 1 1
120 0 0 1 1 0
126 1 1 1 0 0

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

P.S. I figure that maybe a patients with shortest duration shoul not be "Death"? So I also deleted this patients and unfortunately I encountered this warnings again..😂

from survtrace.

RyanWangZf avatar RyanWangZf commented on August 30, 2024

P.S. I figure that maybe a patients with shortest duration shoul not be "Death"? So I also deleted this patients and unfortunately I encountered this warnings again..😂

I check the code where this warning raises on

if idx_durations.min() == 0:
warnings.warn("""Got event/censoring at start time. Should be removed! It is set s.t. it has no contribution to loss.""")
t_frac[idx_durations == 0] = 0
events[idx_durations == 0] = 0
idx_durations = idx_durations - 1
# get rid of -1
idx_durations[idx_durations < 0] = 0
return idx_durations.astype('int64'), events.astype('float32'), t_frac.astype('float32')

Before the operation on line 81, the patient has duration < 389.500000125 is actually assigned index 1 instead of 0. So, this warning raises because durations[idx_durations == 0] == 0.

Could you add a break point there and print(durations[idx_durations == 0]) to show me what's the output?

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

P.S. I figure that maybe a patients with shortest duration shoul not be "Death"? So I also deleted this patients and unfortunately I encountered this warnings again..😂

I check the code where this warning raises on

if idx_durations.min() == 0:
warnings.warn("""Got event/censoring at start time. Should be removed! It is set s.t. it has no contribution to loss.""")
t_frac[idx_durations == 0] = 0
events[idx_durations == 0] = 0
idx_durations = idx_durations - 1
# get rid of -1
idx_durations[idx_durations < 0] = 0
return idx_durations.astype('int64'), events.astype('float32'), t_frac.astype('float32')

Before the operation on line 81, the patient has duration < 389.500000125 is actually assigned index 1 instead of 0. So, this warning raises because durations[idx_durations == 0] == 0.

Could you add a break point there and print(durations[idx_durations == 0]) to show me what's the output?

Thanks Zifeng! I checked there and found the output is "92.00000018".

from survtrace.

RyanWangZf avatar RyanWangZf commented on August 30, 2024

P.S. I figure that maybe a patients with shortest duration shoul not be "Death"? So I also deleted this patients and unfortunately I encountered this warnings again..😂

I check the code where this warning raises on

if idx_durations.min() == 0:
warnings.warn("""Got event/censoring at start time. Should be removed! It is set s.t. it has no contribution to loss.""")
t_frac[idx_durations == 0] = 0
events[idx_durations == 0] = 0
idx_durations = idx_durations - 1
# get rid of -1
idx_durations[idx_durations < 0] = 0
return idx_durations.astype('int64'), events.astype('float32'), t_frac.astype('float32')

Before the operation on line 81, the patient has duration < 389.500000125 is actually assigned index 1 instead of 0. So, this warning raises because durations[idx_durations == 0] == 0.
Could you add a break point there and print(durations[idx_durations == 0]) to show me what's the output?

Thanks Zifeng! I checked there and found the output is "92.00000018".

Do you mean there is only one output and it's not zero? It's weird 😂
I copied these transform code from pycox
https://github.com/havakv/pycox/blob/d384d4f0ac89ddd8458daabfd3fe271ff26542e3/pycox/preprocessing/label_transforms.py#L150

don't know what happened.

But if there is only one output, only this single data will be deleted and I guess it will not influence the results much 😇

from survtrace.

Jwenyi avatar Jwenyi commented on August 30, 2024

Thanks! I checked my inputed data and processed data and found that the samples size seemed to change very little. 😀

from survtrace.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.