richardli / summer Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 6.0 67.86 MB

SAE Unit/area Models and Methods for Estimation in R

Home Page: https://richardli.github.io/SUMMER/

R 99.15% TeX 0.85%

bayesian-inference r-package small-area-estimation space-time

summer's People

Contributors

Stargazers

Watchers

Forkers

nmmarquez kant liuyanguu paigejo chacalle peteragao

summer's Issues

fitINLA2 requires data frame with no survey variable

We need to make sure when using data from multiple surveys that the $cluster columns values are unique by survey. It's probably best to have people submit a data.frame with one extra column: $surveyyear so that internally unique IDs for clusters can be specified in a new cluster column and we can figure out how to combine across surveys

allow user to input column name for strata in `getDirect`

getDirect assumes that if there is a strata column, it's called "strata", but this isn't documented anywhere (you can find it by looking through the code). Would be nice to either have an input variable strataVar (similar to other column name parameters), or at least make a note of this in the documentation

mapPlot function will produce error caused by gpclibPermitStatus() if lacking certain packages

https://github.com/bryandmartin/SUMMER/blob/cd53e5d8d549434ea2b29cba9b0fbfa957fc8a98/R/mapPlot.R#L100

Error message: Error in maptools::unionSpatialPolygons: isTRUE(gpclibPermitStatus()) is not TRUE

Solved by installing gpclib from source according to this discussion

if (!require(gpclib)) install.packages("gpclib", type="source")

or maybe by just installing rgeos according to the same discussion.
Somehow it seems mapproj is also required somewhere in the mapPlot.

(Discoverd these when playing with packrat to pack the SUMMER app)
Great package,
Thank you. :)

fitINLA2: strata vs urban/rural

Hi, we refer to strata throughout SUMMER as the strata variable possibly v024xv025. But we need a separate argument for urban rural maybe? in the formula. As the strata for the direct estimates will be difference than the one for the fixed effect in the formula in fitINLA2.

FitINLA function(), fitted model

Hi, I fitted a Spatio-temporal smoothing model for under-five mortality rates with SUMMER package, specifically with fitINLA function. The related references (mercer et al, Li, et al) shows that the fitted yearly-model is a Spatio-temporal one with spatial structured and unstructured effect (ICAR, iid), also structured and unstructured temporal effect (RW2, iid) and the Spatio-temporal interaction based in Knorr. However, when I fitted the model and summarized it the unstructured random effect was not included and most of them are "generic" which means, an iid model for all of them. Moreover, I tried modifying the function but the latent model objects iid.new, rw.model are hidden and R doesn't recognize it.

thanks

Error when running spatial model using smoothSurvey

Hi,

I got this error message when running the codes below:

smoothed <- smoothSurvey(data = data, geo = districts, Amat = mat,
responseType = "binary", responseVar = "OW", strataVar = NULL, weightVar = NULL,
regionVar = "district", clusterVar = NULL, CI = 0.95)
summary(smoothed)

Strata not defined. Ignoring sample design
cluster not specified. Ignoring sample design
Error in fit$marginals.linear.predictor[[i]] : subscript out of bounds

There was no problem when running the weighted and smoothed estimates with the codes below.

svysmoothed <- smoothSurvey(data = data, geo = districts, Amat = mat,
responseType = "binary", responseVar="OW", strataVar = "PSU", weightVar ="Weight_Final",
regionVar ="district", clusterVar = "~1", CI = 0.95)
summary(svysmoothed)

I really appreciate your help. Thank you.

error massage in fitINLA() estimate the Random Walk 2 random effects on the yearly scale

Dear Dr. Z. Richard Li!

while trying to fit direct estimate using fitINLA() function to estimate the Random Walk 2 random effects on the yearly scale, it returns error message.

fit2 <- fitINLA(data = data, geo = NULL, Amat = NULL, year_label = years.all,
year_range = c(1985, 2019), rw = 2, is.yearly = TRUE, m = 5, type.st = 4)

returns:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 42, 72

INLA Repo in Vignette

INLA Repository in the Vignette was not updated in v0.2.1. Should match DESCRIPTION file.

Unclear error in getBirths

Hi Richard -- I'm getting the following error with getBirths which is unclear to me. Can you please take a look? I'll email you the data file.
Thanks,
Katie

data.raw <- readRDS("~/Desktop/data.raw.rds")
data <- SUMMER::getBirths(
  data = data.raw,
  surveyyear = 1996,
  year.cut = seq(1970, 1997, 1),
  variables = intersect(c("caseid", "v001", "v002", "v004", "v005", "v021", "v022", "v023", "v024",
                          "v025", "v101", "v102", "v139", "bidx", "v012"), names(data.raw)),
  strata = intersect(c("v022", "v024", "v025", "v101", "v102"), names(data.raw)),
  compact = F
)
#> Children with age at least 24 months are assumed to have recorded age truncated to full years. 
#> Recorded age + 5 months is used to adjust for the truncation for ages >= 24 and are multiples of 12.
#> Error in `$<-.data.frame`(`*tmp*`, "age", value = "0"): replacement has 1 row, data has 0

^{Created on 2024-01-25 with reprex v2.1.0}

Issue with getSmoothed() when data used comes from aggregateSurvey()

Dear Z. Richard Li,

I am working with 4 DHS surveys conducted in Nigeria (2003, 2008, 2013, 2018).
Using getDirectList(), I end up having HT estimates for a given period, region and survey. I combine them with aggregateSurvey(). I run fitINLA on this output which works fine. However, running getSmoothed() on the fitted object does not work and print the error message:
"Error in mod$marginals.lincomb.derived[[index]] :
attempt to select less than one element in get1index"
In addition, aggregateSurvey() inverts the lower and upper bound (at least in my case).

Code
data_multi <- getDirectList(births = data[[2]], years = years,regionVar = "region", timeVar = "time",
clusterVar = "~clustid + id", ageVar = "age", weightsVar = "weights",
geo.recode = NULL)
est <- aggregateSurvey(data_multi)
fit1 <- fitINLA(data = est, geo = NULL, Amat = NULL, year_label = years,
year_range = c(1989, 2018), rw = 2, m = 5, is.yearly = TRUE)
out1 <- getSmoothed(fit1)

TODO: Function documentation

All functions should have either an example or not be exported.

Travis Note: Line-width

schecking Rd line widths ... NOTE
Rd file 'mapPlot.Rd':
\examples lines wider than 100 characters:
mapPlot(countryname = "Uganda", results = results_rw2, geo = geo, countrysum = data, inlamod = inla_model)

Hatching using ggplot

The current hatchPlot function is not very flexible (e.g., difficult to transform scales and color gradients). It would be great to implement it in ggplot. I'm keeping this thread open in case anyone wants (and has the time and curiosity!) to help. I'll post related stuff here too.

For an illustration: https://imaddowzimet.github.io/crosshatch/, which implemented hatching, but seems not easily adaptable to change hatching density and show on legends.

Issue with the fitGeneric function

Hi there, I would like to use the summer package, Nigeria DHS 2018, district level to generate SAE for the malaria parasitemia and when I run fitGeneric() to obtain smoothed estimates without weights,

smoothed <- fitGeneric(data = pfpr_df_2, geo = LGA, Amat = mat, responseType = "binary",
responseVar = "hml32", strataVar = NULL, weightVar = NULL, regionVar = "LGA",
clusterVar = NULL, CI = 0.95)

I get the error

Error in fitGeneric(data = pfpr_df_2, geo = admin1shp, Amat = mat, responseType = "binary", :
Exist regions in data but not in the Amat.

However, checking the two datasets with the code below, which I think the function is doing (not sure), I see that it is counting the LGA names as 7933. The actual LGA length is 653 in pfpr_df_2
and 774 in mat. I am using the same name formats in both. I am not sure how to fix this. I will appreciate any assistance

sum(!pfpr_df_2$LGA %in% colnames(mat))
[1] 7933

Error message on Exist regions in data but not in the Amat using the BRFSS data

svysmoothed.year <- smoothSurvey(data = BRFSS, geo = KingCounty, Amat = mat,

                             responseType = "binary", responseVar = "diab2", strataVar = "strata", weightVar = "rwt_llcp",

                             regionVar = "hracode", clusterVar = "~1", timeVar = "year", time.model = "rw1",

                             type.st = 1)

Error in smoothSurvey(data = BRFSS, geo = KingCounty, Amat = mat, responseType = "binary", :
Exist regions in data but not in the Amat.

In the DESCRIPTION file, should INLA be in "imports" rather than "suggests"?

@richardli

fitINLA output years?

The years variable is character. The years variable from getDirect is factor (and is usually used to extract the year levels without typing them out, e.g., when you see years = levels(birth$years) in vignettes).

Maybe worth making both factors, is the consistency worth the risk of using factors?

Can not run this demo code. Found error when I write "is.yearly = TRUE"

Not run:

years <- levels(DemoData[[1]]$time)

obtain direct estimates

data_multi <- getDirectList(births = DemoData, years = years,
regionVar = "region", timeVar = "time", clusterVar = "~clustid+id",
ageVar = "age", weightsVar = "weights", geo.recode = NULL)
data <- aggregateSurvey(data_multi)

national model

years.all <- c(years, "15-19")
fit1 <- smoothDirect(data = data, Amat = NULL,
year_label = years.all, year_range = c(1985, 2019),
time.model = "rw2",
is.yearly=FALSE, m = 5)
out1 <- getSmoothed(fit1)
plot(out1, is.subnational=FALSE)

subnational model

fit2 <- smoothDirect(data = data, Amat = DemoMap$Amat,
year_label = years.all, year_range = c(1985, 2019),
time.model = "rw2", is.yearly=TRUE, m = 5, type.st = 4)
out2 <- getSmoothed(fit2)
plot(out2, is.subnational=TRUE)

BUG: slope.fixed.output undefined for time.model=rw

SUMMER/R/fitINLA2.R

Line 1244 in 92d0073

    
           out <- list(model = formula, fit = fit, family= family, Amat = Amat, newdata = exdat, time = seq(0, N - 1), area = seq(0, region_count - 1), time.area = time.area, survey.table = survey.table, is.yearly = FALSE, type.st = type.st, year_label = year_label, age.groups = age.groups, age.groups.new = age.groups.new, age.n = age.n, age.rw.group = age.rw.group, age.strata.fixed.group = age.strata.fixed.group, strata.base = strata.base, rw = rw, ar = ar, strata.time.effect = strata.time.effect,  priors = priors, year_range = NA, Amat = Amat, has.Amat = TRUE, is.temporal = is.temporal, covariate.names = covariate.names, slope.fixed.output = slope.fixed.output, control.fixed = control.fixed, msg = msg)

When time.model is "rw1" or "rw2", the slope.fixed.output object is never defined by the smoothCluster function so we get this error:

Error in smoothCluster(data = counts.all, Amat = DemoMap$Amat, family = "betabinomial",  : 
  object 'slope.fixed.output' not found

This is resolved when we use time.model == "ar1".

I found this issue when running the benchmarking example code (https://github.com/richardli/SUMMER/blob/master/R/benchmark.R#L33):

fit.bb  <- smoothCluster(data = counts.all, Amat = DemoMap$Amat, 
                         family = "betabinomial",
                         year_label = periods, 
                         survey.effect = TRUE,
                         linear.trend = TRUE,
                         time.model = "ar1")

Thanks!

plot add CI

Added CI currently need to have different names as the CI in the dataset to plot (because of the merge), could be good to have a simple name change internally to avoid that.

Projection of Mortality U5MR

NFSH data is having record of births data upto 2020. But after running program its giving project from 2015 onward but i want to project after 2020. How to get that Please suggest.

vignette error

Aloha!

I am currently running through the vignette, and I am running into the following error:


fit2 <- fitINLA(data = data, geo = NULL, Amat = NULL, year_names = years.all,
year_range = c(1985, 2019), priors = priors, rw = 2,
is.yearly=TRUE, m = 5)

Error in exists("my.cache", envir = envir, mode = "list") : 
  use of NULL environment is defunct

I am running R 3.5.1 on Windows.

Error when running spatial temporal model using smoothing survey,

smooth.year <- smoothSurvey(data =COD_ANC_comb_data, geo = geo,
Amat = Amat, responseType ="binary", responseVar = "anc_timing",
strataVar = "residence", weightVar = "sample_weight", regionVar = "region",
clusterVar = "~DHSCLUST+caseid",timeVar = "year", time.model="rw2",
CI = 0.95, formula=formula1, type.st = 4, nest=TRUE)

**** l get this error below when l run the code below, please help
Error in fit$marginals.linear.predictor[[i]] : subscript out of bounds

region variable not defined

dear Richard!
the getDirectList() functin failed to take the regionvar. returns
Error in getDirect(births = births[[1]], years = years, regionVar = regionVar, : region variable not defined, and no v101 or v024!

Originally posted by @Awugchew in #15 (comment)

Cran OSx Old Release Error

ERROR: this R is version 3.3.2, package 'SUMMER' requires R >= 3.4.2

Can fix by changing R requirement.

No option to set nest=TRUE in the svydesign in fitGeneric

I'm using data from a two-stage cluster sample and have some duplicate id's between clusters. To fix this with svydesign(), I would just set nest=T. There is no option to do that in fitGeneric().

As an aside, is there a reason it is preferable to have to recreate a the svydesign object in the fitGeneric() call, rather than just having the option to just reference an existing svydesign object?

> fitGeneric(data = khi_base, geo = UCmap, Amat = mat, responseType = "binary", 
+            responseVar = "novac", strataVar = "mc_104", weightVar = "dsgnwt", regionVar = "mc_104", 
+            clusterVar = "~mc_105", CI = 0.95)

Error in svydesign.default(ids = stats::formula(clusterVar), weights = ~weights0,  : 
  Clusters not nested in strata at top level; you may want nest=TRUE.

Error in vignette

Hi, I followed the tutorial in the vignette and observed the newformula as seen below was commented out but I tried to use it anyway since you used the penalized complexity priors and I wanted to see how the result differed from the default priors. However, Amat in the "graph=Amat" was not previously defined and so I replaced it with "mat" which has used to define the adjacency matrix previously but my results differed (see attachment) The code below is from the vignette.

newformula <- "f(region.struct, model = 'bym2', graph = Amat, constr = TRUE,scale.model = TRUE, hyper = list( phi = list(prior = 'pc', param = c(0.5 , 2/3) , initial = -3), prec = list(prior = 'pc.prec', param = c(0.2/0.31 , 0.01) , initial = 5)))" svysmooth.2 <- fitSpace(data = BRFSS, geo = KingCounty, Amat = mat, family = "binomial", responseVar="diab2", strataVar="strata", weightVar="rwt_llcp", regionVar="hracode", clusterVar = "~1", hyper=NULL, CI = 0.95, newformula = newformula)

Error in f(region.struct, model = "bym2", graph = Amat, constr = TRUE, : object 'Amat' not found In addition: Warning message: In inla.model.properties.generic(inla.trim.family(model), (mm[names(mm) == : Model 'bym2' in section 'latent' is marked as 'experimental'; changes may appear at any time. Use this model with extra care!!! Further warnings are disabled.

I replaced graph = Amat with graph = mat and it worked. Problem is my results were different for the PC prior

svysmooth.2 <- fitSpace(data = BRFSS, geo = KingCounty, Amat = mat, family = "binomial", responseVar="diab2", strataVar="strata", weightVar="rwt_llcp", regionVar="hracode", clusterVar = "~1", hyper=NULL, CI = 0.95, newformula = newformula)

CRAN Solaris Warning

See https://stat.ethz.ch/pipermail/r-devel/2014-December/070252.html

May fix to change vignette encoding.

mapPlot check for region names match when merging

Currently it's just merging the names. Good, but not informative when errors show up. But should not check all regions in data also in map (since there maybe region = "All" usually in plotting direct estimates), or all map regions in data (since there could be missing?). Maybe make sure that when merging, keep all for map.