I believe it would be a great addition to add an example with real data so people coul

Example,about ihmeuw-msca/curvefit

Comments (58)

saravkin commented on June 16, 2024 6

Thanks for the request -- we recognize the need for examples and will work on helping people understand use cases when we can.

from curvefit.

mpf commented on June 16, 2024 6

+1. An example would be helpful. It doesn't have to be a full-fledged example. A small contrived example to illustrate the workflow would be sufficient.

from curvefit.

jason-curtis commented on June 16, 2024 5

+1, AFAICT this is the only source code provided for the projections at https://covid19.healthdata.org/projections , which are being increasingly used throughout the country. Example usage would also provide increased transparency which is crucial to understanding and trusting the projections for the USA.

from curvefit.

ibm-cuyler commented on June 16, 2024 4

An example would great. Country level data seems to be the next step.
An example data set that we could apply the code to would be helpful, and give us a good sense for what data we'd need when applying the models to other countries.

from curvefit.

kheedanonymous commented on June 16, 2024 4

@alexander Did you use [email protected] COZ I CANT SEE IT

…

On Sat, Apr 4, 2020, 18:13 Alexander Weps ***@***.***> wrote: @gits-png <https://github.com/gits-png> I sent you an email. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP6B25QNXQP2LLSG4I3RK5FDNANCNFSM4LYFDEJQ> .

from curvefit.

exander77 commented on June 16, 2024 4

I am kind of stumped that authors can't release their complete workflow so we can verify and reuse it. This is tedious reverse engineering work.

from curvefit.

andrewcolemfd commented on June 16, 2024 3

Thanks to the IHME team. We appreciate what you all are doing. I agree with previous posters, further documentation would be invaluable in furthering our understanding of our own communities' needs.

from curvefit.

philippemiron commented on June 16, 2024 3

Hi,

I made this function that retrieves data from the John Hopkins' Github data set for a selected country.

import pandas as pd

base = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/'
confirmed = 'time_series_covid19_confirmed_'
death = 'time_series_covid19_deaths_'
recovered = 'time_series_covid19_recovered_'

def data_country(selected_country, dataset='confirmed'):
    """ return dataset timeseries for a selected country """
    
    #select the right database
    if dataset == 'confirmed':
        url = base+confirmed
    elif dataset == 'death':
        url = base+death
    elif dataset == 'recovered':
        url = base+recovered
    
    if selected_country != 'US':
        df = pd.read_csv(url+'global.csv').groupby(['Country/Region']).sum()
        df.drop(['Lat', 'Long'], axis=1, inplace=True)
        df = df.loc[selected_country]
    else:
        df = pd.read_csv(url+'US.csv').groupby('Country_Region').sum()
        df.drop(['UID', 'code3', 'FIPS', 'Lat', 'Long_'], axis=1, inplace=True)
        if dataset == 'death':
            df.drop(['Population'], axis=1, inplace=True)
        df = df.sum()
    return df.index, df.values

You can call it for the different countries :
date, count = data_country('US', 'confirmed') # or Canada, etc. for confirmed cases

You can also get the death or recovered counts by changing the second argument to 'death' or 'recovered'.

I believe their example is not fully completed but will share a Notebook this afternoon with real data.

Cheers.

from curvefit.

philippemiron commented on June 16, 2024 2

It's been two days now, I don't understand how this is not the priority # 1. Once we are able to reproduce the results, I'm sure many people will help to improve the readability of the code and test each of the different components.

from curvefit.

dnola commented on June 16, 2024 2

Would really appreciate an example - I am trying to take this and apply it to some county level data, but can’t figure out how to use the code base.

Thank you!

from curvefit.

philippemiron commented on June 16, 2024 2

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

Also this is the model appendix... Anyone understand how to calculate the covariates from the death count time serie ?

from curvefit.

exander77 commented on June 16, 2024 2

I would like to apply the model to the data from my country, would it be possible to supply some examples?

from curvefit.

jason-curtis commented on June 16, 2024 2

Huge credit to the folks working on this model and thanks very much for the increased focus on transparency and documentation.

Folks on this thread may also be interested in this other project I just came across, which is farther along in terms of documentation and also has open source access to all of their data sources and complete codebase : https://covidactnow.org/ .

from curvefit.

emadubuko commented on June 16, 2024 1

Kudos to the IHME team. I have been comparing the daily statistics of actual data reported against your projections as published on https://covid19.healthdata.org/projections, and it has been quite close.
I would want to fit this model to my country dataset and generate siimilar projections.
Can you possibly provide data variable of input data for this model?
Thanks

from curvefit.

pfaris commented on June 16, 2024 1

Same for us – we’d like the data if possible. Thanks Peter Peter Faris, PhD Director, Health Services Statistical and Analytic Methods Analytics (DIMR) Foothills Medical Centre 1403-29 St. NW Calgary, AB T2N 2T9 tel: 403-944-0705 Office: Room 1101, South Tower FMC Alberta Health Services www.albertahealthservices.ca<http://www.albertahealthservices.ca/> [cid:[email protected]] From: emadubuko [mailto:[email protected]] Sent: Friday, April 03, 2020 9:43 AM To: ihmeuw-msca/CurveFit <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [ihmeuw-msca/CurveFit] Example (#12) Caution - This email came from an external address and may contain unsafe content. Ensure you trust this sender before opening attachments or clicking any links in this message.

…

________________________________ Kudos to the IHME team. I have been comparing the daily statistics of actual data reported against your projections as published on https://covid19.healthdata.org/projections, and it has been quite close. I would want to fit this model to my country dataset and generate siimilar projections. Can you possibly provide data variable of input data for this model? Thanks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALDDIQZ277HK2XMHYNJIBYLRKX7W3ANCNFSM4LYFDEJQ>.

________________________________ This message and any attached documents are only for the use of the intended recipient(s), are confidential and may contain privileged information. Any unauthorized review, use, retransmission, or other disclosure is strictly prohibited. If you have received this message in error, please notify the sender immediately, and then delete the original message. Thank you.

from curvefit.

exander77 commented on June 16, 2024 1

@7ayushgupta This is great! Keep up the good work.

from curvefit.

philippemiron commented on June 16, 2024 1

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Hi dhruvparamhans, would it be np.exp(-15) ~ 3.06e-07 and not 1x10**(-15).

from curvefit.

exander77 commented on June 16, 2024 1

I basically fed my own data:

measurement_value = [
        0, 3, 5, 8, 19, 26, 32, 38, 63, 94, 116, 141, 189, 298, 383, 450, 560, 765, 889, 1047,
        1161, 1287, 1472, 1763, 2022, 2395, 2657, 2817, 3001, 3308, 3589, 3858, 4190]
n_data       = len(measurement_value)

And then call predictions on the calculated model:

predictions = curve_model.predict(t=independent_var+beta_true)
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(predictions[1::])

from curvefit.

dnola commented on June 16, 2024 1

Also last question, has there or will there be any code or example released with respect to the simulation getting from death rate to hospital resource utilization?

from curvefit.

philippemiron commented on June 16, 2024 1

From their publication.

A covariate of days with expected exponential growth in the cumulative death rate was created using information on the number of days after the death rate exceeded 0.31 per million to the day when 4 different social distancing measures were mandated by local and national government:
school closures, non-essential business closures including bars and restaurants, stay-at-home recommendations, and travel restrictions including public transport closures. Days with 1 measure were counted as 0.67 equivalents, days with 2 measures as 0.334 equivalents and with 3
or 4 measures as 0.

I think I get what they did, but haven't obtain similar results yet. If I understand correctly. As a example:

timeline from day = 1 to 15.
The death rate exceed 0.31 per million on day 2 (start counting after 2)
1 social distancing measure starting day 4 (days after 4 count as 0.67)
1 more social distancing measure on day 7 (days after 7 count as 0.334)
and 2 more measures on day 10 (days after 10 are counted as 0)

The covariate would be:
covariates = [0, 0, 1, 2, 2.66, 3.32, 3.98, 4.31, 4.64, 4.97, 4.97, 4.97, 4.97, 4.97, 4.97].

Here is a little code to generate this:

# fictionnal data
death_rate_over_threshold = 1
timeline_measure = {
  3: 1,
  6: 2,
  9: 4,
}

# 0 measure = 1, 1 measure = 2/3, 2 measures = 1/3, 3-4 measures = 0
day_count_as = [1, 0.66, 0.33, 0, 0]

# construct the covariates for the 15 days
covariates = np.zeros(15)
nb_measures = 0
for day in range(0, len(covariates)):
    if day > death_rate_over_threshold:
        covariates[day] = covariates[day-1] + day_count_as[nb_measures]
    
    # adjust the number of social distancing measure
    if day in timeline_measure.keys():
        nb_measures = timeline_measure[day]
print(covariates)

ps: this is my best understanding so far !

from curvefit.

saravkin commented on June 16, 2024 1

Hi everyone!

We are working as fast as we can to support the analyses and update methodology. As we go, we are also picking up speed on documentation and examples. We expect to have an updated paper that documents major changes soon. Please keep checking the following websites:

Main projections: https://covid19.healthdata.org/projections
Updates and explanations of whats new: http://www.healthdata.org/covid/updates

We will post a link to updated paper in the main readme file when it posts, we are expecting end of day April 7th.

For specific locations and analyses please contact [email protected],
so you can coordinate with the broader ihme team. The purpose of the repository is to share the program that is doing the estimation. The broader team at IHME processes the data, does age standardization, covariate definitions, and all analyses, which are then released online. The pipeline will be documented in the updated paper, and we will continue our work in documenting the CurveFit program.

from curvefit.

7ayushgupta commented on June 16, 2024 1

I've mailed them, but they gave an automated reply. I think they would be in a huge crunch to effectively help us out.

from curvefit.

thoo commented on June 16, 2024 1

Here is the github link for covidactnow.org @thatneat mentioned.

from curvefit.

mbannick commented on June 16, 2024 1

Hi @knrg07, thanks for your question. We recently documented the ModelPipeline class and each of the subclasses here: https://ihmeuw-msca.github.io/CurveFit/code/#model-pipelines. It's doesn't provide an example yet (on the to-do list) but hopefully provides some more context. Data structure is the same as if you were passing it to CurveModel, it just provides another layer on top of that.

from curvefit.

mbannick commented on June 16, 2024 1

Thank you all very much for your patience! We appreciate your comments and interest in the IHME COVID-19 forecasting work. As you may have seen on the IHME website, IHME transitioned to using a different methodology (and code base) for COVID-19 forecasting over the last several months that is not based at all on the CurveFit approach (instead based on an SEIIR model). Our team was heavily involved with this transition and that is why it has taken so long for us to get back to you all.

This CurveFit repository is no longer active, but we have implemented a major refactor of the code base with the goal of making the code easier to understand. Please see the updated README for more details, and our updated documentation, in particular the user examples and developer docs if you are still interested in using this package for nonlinear mixed effects.

from curvefit.

ibm-cuyler commented on June 16, 2024

For sure - thank you very much to the IHME team. Great work, everybody.

from curvefit.

jason-curtis commented on June 16, 2024

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

from curvefit.

kheedanonymous commented on June 16, 2024

@philippemiron I would like to speak to you please ….my email is
@[email protected]

from curvefit.

dhruvparamhans commented on June 16, 2024

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

Also this is the model appendix... Anyone understand how to calculate the covariates from the death count time serie ?

I have the same question. Been trying to understand it for some time to no avail.

from curvefit.

kheedanonymous commented on June 16, 2024

Hey alex would it be able if you contacted me @[email protected]

…

On Sat, Apr 4, 2020, 01:57 Alexander Weps ***@***.***> wrote: I would like to apply the model to the data from my country, would it be possible to supply some examples? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZPZDLJCYWKEM63I7PMDRKZSUPANCNFSM4LYFDEJQ> .

from curvefit.

7ayushgupta commented on June 16, 2024

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

from curvefit.

kheedanonymous commented on June 16, 2024

Hey alexander please text @[email protected]

…

On Sat, Apr 4, 2020, 13:59 Alexander Weps ***@***.***> wrote: @7ayushgupta <https://github.com/7ayushgupta> This is great! Keep up the good work. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP7RB7LPP7Y6I2JOMUTRK4HJLANCNFSM4LYFDEJQ> .

from curvefit.

exander77 commented on June 16, 2024

@gits-png I sent you an email.

from curvefit.

exander77 commented on June 16, 2024

@gits-png Yes, also responded right now.

From: [email protected]

from curvefit.

dhruvparamhans commented on June 16, 2024

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

from curvefit.

philippemiron commented on June 16, 2024

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Also, if you look at the data_frame in their example. The 'data_group' is all set as 'world'... so I guess this is used to retrieve row-data for a specific country/states.

from curvefit.

exander77 commented on June 16, 2024

Any ideas on how to input national data?

from curvefit.

kheedanonymous commented on June 16, 2024

Not yet but i will update you if i do

…

On Sat, Apr 4, 2020, 20:47 Alexander Weps ***@***.***> wrote: Any ideas on how to input national data? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP4U75TDISQCSWSN7WLRK5XBZANCNFSM4LYFDEJQ> .

from curvefit.

exander77 commented on June 16, 2024

    independent_var  measurement_value  measurement_std  constant_one data_group
0              0.00                  0              0.1           1.0    czechia
1              0.15                  3              0.1           1.0    czechia
2              0.30                  5              0.1           1.0    czechia
3              0.45                  8              0.1           1.0    czechia
4              0.60                 19              0.1           1.0    czechia
5              0.75                 26              0.1           1.0    czechia
6              0.90                 32              0.1           1.0    czechia
7              1.05                 38              0.1           1.0    czechia
8              1.20                 63              0.1           1.0    czechia
9              1.35                 94              0.1           1.0    czechia
10             1.50                116              0.1           1.0    czechia
11             1.65                141              0.1           1.0    czechia
12             1.80                189              0.1           1.0    czechia
13             1.95                298              0.1           1.0    czechia
14             2.10                383              0.1           1.0    czechia
15             2.25                450              0.1           1.0    czechia
16             2.40                560              0.1           1.0    czechia
17             2.55                765              0.1           1.0    czechia
18             2.70                889              0.1           1.0    czechia
19             2.85               1047              0.1           1.0    czechia
20             3.00               1161              0.1           1.0    czechia
array([0.66666667, 1.        , 1.33333333])
array([[   2.25840555],
       [   2.69716202],
       [1765.66571494]])

from curvefit.

exander77 commented on June 16, 2024

I tried to change the measurement data to the Czech Republic ones, it gave me some prediction, but it fails if I feed it more data than the original 21.

from curvefit.

kheedanonymous commented on June 16, 2024

Actually dude am not a professional at coding but amma try harder .... Thats learning right

…

On Sat, Apr 4, 2020, 21:07 Alexander Weps ***@***.***> wrote: independent_var measurement_value measurement_std constant_one data_group 0 0.00 0 0.1 1.0 czechia 1 0.15 3 0.1 1.0 czechia 2 0.30 5 0.1 1.0 czechia 3 0.45 8 0.1 1.0 czechia 4 0.60 19 0.1 1.0 czechia 5 0.75 26 0.1 1.0 czechia 6 0.90 32 0.1 1.0 czechia 7 1.05 38 0.1 1.0 czechia 8 1.20 63 0.1 1.0 czechia 9 1.35 94 0.1 1.0 czechia 10 1.50 116 0.1 1.0 czechia 11 1.65 141 0.1 1.0 czechia 12 1.80 189 0.1 1.0 czechia 13 1.95 298 0.1 1.0 czechia 14 2.10 383 0.1 1.0 czechia 15 2.25 450 0.1 1.0 czechia 16 2.40 560 0.1 1.0 czechia 17 2.55 765 0.1 1.0 czechia 18 2.70 889 0.1 1.0 czechia 19 2.85 1047 0.1 1.0 czechia 20 3.00 1161 0.1 1.0 czechia array([0.66666667, 1. , 1.33333333]) array([[ 2.25840555], [ 2.69716202], [1765.66571494]]) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP3VH7KZJ2ILFARZ26DRK5ZPFANCNFSM4LYFDEJQ> .

from curvefit.

kheedanonymous commented on June 16, 2024

Which text editor are you using ?

…

On Sat, Apr 4, 2020, 21:08 Alexander Weps ***@***.***> wrote: I tried to change the measurement data to the Czech Republic ones, it gave me some prediction, but it fails if I feed it more data than the original 21. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP3ZK2I4SMSFZ3SK3ADRK5ZSXANCNFSM4LYFDEJQ> .

from curvefit.

exander77 commented on June 16, 2024

I am using vim, but you can edit python in whichever editor you like. (Actually I do not recommend vim. :D)

from curvefit.

kheedanonymous commented on June 16, 2024

Have you tried sublime

…

On Sat, Apr 4, 2020, 21:14 Alexander Weps ***@***.***> wrote: I am using vim, but you can edit python in whichever editor you like. (Actually I do not recommend vim. :D) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP7B36Y7IV7QKGZGHZLRK52ITANCNFSM4LYFDEJQ> .

from curvefit.

exander77 commented on June 16, 2024

I fed it with CZ data and nothing much sane:

    independent_var  measurement_value  measurement_std  constant_one data_group
0           0.00000                  0              0.1           1.0    czechia
1           0.09375                  3              0.1           1.0    czechia
2           0.18750                  5              0.1           1.0    czechia
3           0.28125                  8              0.1           1.0    czechia
4           0.37500                 19              0.1           1.0    czechia
5           0.46875                 26              0.1           1.0    czechia
6           0.56250                 32              0.1           1.0    czechia
7           0.65625                 38              0.1           1.0    czechia
8           0.75000                 63              0.1           1.0    czechia
9           0.84375                 94              0.1           1.0    czechia
10          0.93750                116              0.1           1.0    czechia
11          1.03125                141              0.1           1.0    czechia
12          1.12500                189              0.1           1.0    czechia
13          1.21875                298              0.1           1.0    czechia
14          1.31250                383              0.1           1.0    czechia
15          1.40625                450              0.1           1.0    czechia
16          1.50000                560              0.1           1.0    czechia
17          1.59375                765              0.1           1.0    czechia
18          1.68750                889              0.1           1.0    czechia
19          1.78125               1047              0.1           1.0    czechia
20          1.87500               1161              0.1           1.0    czechia
21          1.96875               1287              0.1           1.0    czechia
22          2.06250               1472              0.1           1.0    czechia
23          2.15625               1763              0.1           1.0    czechia
24          2.25000               2022              0.1           1.0    czechia
25          2.34375               2395              0.1           1.0    czechia
26          2.43750               2657              0.1           1.0    czechia
27          2.53125               2817              0.1           1.0    czechia
28          2.62500               3001              0.1           1.0    czechia
29          2.71875               3308              0.1           1.0    czechia
30          2.81250               3589              0.1           1.0    czechia
31          2.90625               3858              0.1           1.0    czechia
32          3.00000               4190              0.1           1.0    czechia
alpha: [2.20389855]
beta: [2.47338824]
p: [5352.08537721]

I am not sure what those params are alpha, beta, p? The prediction 5352? That is disappointing so far. I tried to tweak those params to undrstand, but no luck.

I would expect a prediction like 4500 or so.

from curvefit.

exander77 commented on June 16, 2024

I managed to do some predictions by calling predict on the continuation of independent_var series:

    independent_var  measurement_value  measurement_std  constant_one data_group
0           0.00000                  0              0.1           1.0    czechia
1           0.09375                  3              0.1           1.0    czechia
2           0.18750                  5              0.1           1.0    czechia
3           0.28125                  8              0.1           1.0    czechia
4           0.37500                 19              0.1           1.0    czechia
5           0.46875                 26              0.1           1.0    czechia
6           0.56250                 32              0.1           1.0    czechia
7           0.65625                 38              0.1           1.0    czechia
8           0.75000                 63              0.1           1.0    czechia
9           0.84375                 94              0.1           1.0    czechia
10          0.93750                116              0.1           1.0    czechia
11          1.03125                141              0.1           1.0    czechia
12          1.12500                189              0.1           1.0    czechia
13          1.21875                298              0.1           1.0    czechia
14          1.31250                383              0.1           1.0    czechia
15          1.40625                450              0.1           1.0    czechia
16          1.50000                560              0.1           1.0    czechia
17          1.59375                765              0.1           1.0    czechia
18          1.68750                889              0.1           1.0    czechia
19          1.78125               1047              0.1           1.0    czechia
20          1.87500               1161              0.1           1.0    czechia
21          1.96875               1287              0.1           1.0    czechia
22          2.06250               1472              0.1           1.0    czechia
23          2.15625               1763              0.1           1.0    czechia
24          2.25000               2022              0.1           1.0    czechia
25          2.34375               2395              0.1           1.0    czechia
26          2.43750               2657              0.1           1.0    czechia
27          2.53125               2817              0.1           1.0    czechia
28          2.62500               3001              0.1           1.0    czechia
29          2.71875               3308              0.1           1.0    czechia
30          2.81250               3589              0.1           1.0    czechia
31          2.90625               3858              0.1           1.0    czechia
32          3.00000               4190              0.1           1.0    czechia
array([0.66666667, 1.        , 1.33333333])
alpha: [2.20389855]
beta: [2.47338824]
p: [5352.08537721]
array([4265.23591878, 4433.28700621, 4580.05696837, 4706.79430815,
       4815.1652726 , 4907.05709468, 4984.42281497, 5049.169184  ,
       5103.08314254, 5147.78958555, 5184.73256054, 5215.17278241,
       5240.19564488, 5260.72531385, 5277.54175749, 5291.29860217,
       5302.54048831, 5311.71916453, 5319.20794429, 5325.31440039,
       5330.29132715, 5334.34608776, 5337.64850755, 5340.33748911,
       5342.52652345, 5344.30825965, 5345.7582799 , 5346.9382086 ,
       5347.89826693, 5348.67936767, 5349.31483043, 5349.83178419])

from curvefit.

dnola commented on June 16, 2024

How are we applying the covariates here? It looks like in the example we are just feeding in a constant one.

Additionally, how are we linking between data groups once we have those covariates determined per group?

Am I correct in thinking that we can just set that “constant one” field for each data group to the “time from threshold to social distancing” feature for that particular group?

Last, should “t” always be relative to the first detected death? Or the first death rate past a threshold? I.e. if we wanted to look at another country, would it be as simple as adding another set of rows starting from t=0 for that location for a new data group? Or does the timing across all series need to be aligned?

Ie if we wanted to look at both China and USA would we start both from t=0 at the time of their first cases, or does t start from first case in China, and first case in US would start at some much later t?

from curvefit.

dnola commented on June 16, 2024

@exander77 With respect to the parameters alpha beta and p, those are the three parameters of a logistic curve. The p you are asking about is the “carrying capacity” of the logistic model - Ie the max. That isn’t your prediction, that is what your predictions will ultimately taper off at

(Alpha is the growth rate and beta determines the inflection point)

That said, if we don’t figure out the covariate linking as well as the errors from fixed and random effects we lose what makes this approach unique, and are basically just doing a simple logistic regression like you could get out of the box in sklearn. So we should work on that next

from curvefit.

7ayushgupta commented on June 16, 2024

We did the same, but could not obtain good predictions.

They would have used a covariate model for Wuhan as well, do you know about that?

from curvefit.

philippemiron commented on June 16, 2024

Sadly that's where I am right now.

from curvefit.

thewanderer41 commented on June 16, 2024

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

The number 1e-15 is used because that is just larger than machine epsilon ie. the smallest number representable by a machine. Basically, we can't precisely store a value smaller than this using 64bits. If we tried, it's within measurement error and therefore invalid.

from curvefit.

kheedanonymous commented on June 16, 2024

Hold up....so this engineering you guys doing .....the programm already exists ??

…

On Sun, Apr 5, 2020, 13:51 Alexander Weps ***@***.***> wrote: I am kind of stumped that authors can't release their complete workflow so we can verify and reuse it. This is tedious reverse engineering work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOULZP3J45OY5ID55IVQWCDRLBPBPANCNFSM4LYFDEJQ> .

from curvefit.

HiroakiMachida commented on June 16, 2024

Got data referring to @philippemiron and ran main.py @7ayushgupta.
Still doesn't get an appropriate prediction.

https://github.com/HiroakiMachida/CurveFit/blob/master/main.py

(base) Hiroaki-no-MacBook:CurveFit hiroakimachida$ python main.py
0     Japan
      ...  
74    Japan
Name: State/UnionTerritory, Length: 75, dtype: object
Model pipeline setting up...
Model setup. Running fit...
Model fitted. Saving model...
Model saved.
Running PV for Japan
//anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py:938: UserWarning: You are merging on int and float columns where the float values are not equal to their int representation
  'representation', UserWarning)
[0.5        0.92250345 0.99777404 0.99999006 0.99999999 1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.        ]

from curvefit.

7ayushgupta commented on June 16, 2024

@HiroakiMachida @philippemiron @thewanderer41 we can discuss and find out a solution based on the understanding of the code that we've got. Any suitable time and platform would be good for me.
Let's do it urgently, and get some predictions.

from curvefit.

dhruvparamhans commented on June 16, 2024

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Hi dhruvparamhans, would it be np.exp(-15) ~ 3.06e-07 and not 1x10**(-15).

I think you are quite right. Now I feel quite stupid. The notation in the paper didnt help things. I remember reading 1e-15.

from curvefit.

philippemiron commented on June 16, 2024

@HiroakiMachida @philippemiron @thewanderer41 we can discuss and find out a solution based on the understanding of the code that we've got. Any suitable time and platform would be good for me.
Let's do it urgently, and get some predictions.

You can send me an email, see my profile, but I guess it's better to contact them directly...

from curvefit.

knrg07 commented on June 16, 2024

Any _pipeline example possible? Or is access to Wuhan data really necessary. I've tried feeding it the example / simulated dataset but understand I need more information. Could an example be provided on that front, or at least an idea of what the data structure should look like?

from curvefit.

7ayushgupta commented on June 16, 2024

@mbannick @saravkin The paper mentions that we can contact the authors for a list of data citations. Can you possibly help us with that?
If not, can you just tell which countries had you included for fitting the curves? And what are the preprocessing steps.

from curvefit.

Example about curvefit HOT 58 CLOSED

Comments (58)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent