Coder Social home page Coder Social logo

tbates / umx Goto Github PK

View Code? Open in Web Editor NEW
44.0 7.0 17.0 39.51 MB

Making Structural Equation Modeling (SEM) in R quick & powerful

Home Page: https://tbates.github.io/

R 100.00%
statistics psychology r sem structural-equation-modeling behavior-genetics twin-models cran umx tutorials

umx's Introduction

umx

Codecov test coverage Github commits cran version Monthly Downloads Total Downloads Rdoc DOI License

Road map, and Tutorials (let me know what you'd like, or perhaps a book?)

umx is a package designed to make structural equation modeling easier, from building, to modifying and reporting.

citation("umx")

You should cite: Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. DOI:10.1017/thg.2019.2

umx includes high-level functions for complex models such as multi-group twin models, as well as graphical model output.

Install it from CRAN:

install.packages("umx")
library(umx)
?umx

Most functions have extensive and practical examples (even figures for the twin models): so USE THE HELP :-).

See what is on offer with '?umx'. There are online tutorials at tbates.github.io.

umx stands for "user" OpenMx functions. It provides over 100 functions, but most importantly:

  1. umxRAM that makes path-based SEM in R straightforward, with umxSummary and plot for table and graphical display of your models. It can also interpret basic lavaan if you get a script in that language.
  2. A suite of twin modelling functions, such as umxACE.

These are supported by many low-level functions automating activities such as parameter labels, start values etc., as well as helping with data-wrangling, journal-ready presentation (try umxAPA() among other tasks.

Some highlights include:

  1. Building Path Models
    • umxRAM() # Take umxPaths + data data = run and return a model, along with a plot and umxSummary
    • umxPath() # write paths with human-readable language like var = , mean = cov = , fixedAt=. Quickly define a variance and mean ('v.m. = ') and more.
  2. Reporting output
    • umxSummary(model) # Nice summary table, in markdown or browser. Designed for journal reporting (Χ², p, CFI, TLI, & RMSEA). Optionally show path loadings
    • plot(model, std=TRUE, digits = 3, ...) # Graphical model in your browser! or edit in programs like OmniGraffle
    • parameters(m1, "below", .1, pattern="_to_")) # A powerful assistant to get labels and values from a model (e.g. all 'to' params, below .1 in value)
    • residuals(m1, supp=.1) # Show residual covariances filtered for magnitude
  3. Modify models
    • umxModify(model, update = ) *# Modify and re-run a model. You can add objects, drop or add paths, including by wildcard label matching), re-name the model, and even return the comparison. All in 1 line *
  4. Twin modeling!
    • umxACE # Twin ACE modeling with aplomb paths are labeled! Works with plot() and umxSummary!
    • umxCP, umxIP, umxGxE, umxCP, umxGxEbiv, umxSexLim
    • umxACE
  5. Easy-to-remember options
    • umx_set_cores()
    • umx_set_optimizer()
  6. Many more miscellaneous helpers e.g.
    • umx_time(model1, model2) reports and compares run times in a compact programmable format (also "start" and "stop" a timer)
    • umxHetcor(data, use = "pairwise.complete.obs") # Compute appropriate pair-wise correlations for mixed data types.
    • Dozens more: Check out the "family links" in ?umx and in any help file!

Code and requests welcome via Github. Tell your friends! Publish good science :-)

For thrill-seekers and collaborators only: the bleeding-edge development version is here:

install.packages("devtools")
library("devtools")
install_github("tbates/umx")
library("umx")
?umx

umx's People

Contributors

bwiernik avatar jpritikin avatar khusmann avatar lf-araujo avatar psychelzh avatar tbates avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

umx's Issues

Repair install.OpenMx() on Windows

Per this thread. Loading umx into R's workspace will necessarily also load in OpenMx, but Windows will forbid OpenMx from being overwritten on disk.
install.OpenMx() needs to condition its behavior on whether or not it is running under Windows. Consider the definition of omxGetNPSOL():

function() {
	if (imxHasNPSOL()) {
		message(paste("NPSOL is available.",
			      "You have already installed the non-CRAN version of OpenMx."))
		return()
	}
	if(.Platform$OS.type=="windows"){
		message(
			paste("Windows users should either restart R or run\n",
						"detach('package:OpenMx',unload=TRUE)\n",
						", and then run\n",
						"source('http://openmx.ssri.psu.edu/getOpenMx.R')\n")
		)
		return()
	}
    if(version$major < 3) {
        message(paste0("You are using R 2.15 or earlier.  ",
            "OpenMx 2.0 and higher do not support versions ",
            "of R before 3.0, so I'm fetching OpenMx 1.4 instead.\n",
            "Getting OpenMx 1.4 from http://openmx.ssri.psu.edu/."))
        source("http://openmx.ssri.psu.edu/getOpenMx1.4.R")
    } else {
        message("Getting OpenMx from http://openmx.ssri.psu.edu/.")
        source("http://openmx.ssri.psu.edu/getOpenMx.R")
    }
}

Add a means table print out to umxSummary

It would be useful for users to be able to get a table of means (just as they can request genetic correlations with showRg= TRUE now)

for twins, where we now print:

a1 c1 e1
wt1 -0.83 . -0.56

We could do:

a1 c1 e1 mean
wt1 -0.83 . -0.56 75.3

Enable the selVars argument to pull the correct columns from twinData to save user doing this?

Currently, the user specifies the selVars (or provides the base names to selVars + a separator and umxACE etc. build the twin names) AND provide mzData and dzData already trimmed so that only the needed columns are provided.

This is redundant, as we could let the user simply create generic mz and dz data frames and use selVars to pull the correct data for them on a model by model basis.

This might aid teaching, as the examples would no longer need the earlier lines selecting out the relevant columns.

start values for umxCP

The start values for umxCP models are hard coded at values (0.7): OK for z-scored variables, but unlikely to be Kosher for a wide set of inputs: they were just chosen to not die on the runway. (old code below).

umx 3.0 code should be smarter. Any thoughts on choosing good starts for the different free elements of the model?

# Latent common factor
if(correlatedA){
	a_cp_matrix = umxMatrix("a_cp", "Lower", nFac, nFac, free = TRUE, values = .7, jiggle = .05) 
} else {
	a_cp_matrix = umxMatrix("a_cp", "Diag", nFac, nFac, free = TRUE, values = .7, jiggle = .05)
}

a_cp  = .7   # Latent common factor) 
c_cp  = 0    # latent common factor Common environmental path coefficients
e_cp =  .7   #  latent common factor Unique environmental path coefficients
cp_loadings = .6  # loadings on latent phenotype

ALSO: should the variances of the factors in umxCP be bounded at zero?

umxModify: Add autoRun argument

I'd like to use the regex functionality in umxModify to make a series of changes to a model. I'd like to not have the optimizer run each time I make one change. Could the autorun argument be added to this function with default=TRUE?

Multiple enhancements to Improve umx_make_TwinData

umx_make_TwinData can be made much more flexible, as a 1-line go-to solution for simulated twin data.

In the first instance it should:

  1. Allow returning threshold data ( nThresh = )
  2. Take variance input: big A, C, and E
  3. If any two of these are given, make the third up to equal total var = 1
  4. Allow user to set seed for repeatability
  5. Set the varNames in the returned dataframes
  6. Be smart/flexible regarding input
  7. Have better docs and a complete set of examples

umxSexLim features and umxSummary()

hi all,
Correlated Factors multi-variate umxSexLim() is taking shape.

m1 = umxSexLim("sexlim", selDVs, mzmData, dzmData, mzfData, dzfData, dzoData, A_or_C = "A"))

Any thoughts on how to plot?

umxSummary will return something like below -thoughts?

Standardized A, C, & E by Sex

ssc sil caf tri bic
Am 0.77 0.73 0.70 0.75 0.67
Cm 0.01 0.08 0.15 0.06 0.10
Em 0.21 0.19 0.15 0.19 0.23
Af 0.78 0.77 0.43 0.70 0.65
Cf 0.02 0.07 0.41 0.11 0.15
Ef 0.20 0.16 0.16 0.19 0.20

Perhaps layout factor correlations f in upper tri, m in lower?

Factor Correlations

(note: Females on lower triangle, Males on upper-triangle)

A Factor Correlations

sscAm silAm cafAm triAm bicAm
ssc 1.00 0.72 0.33 0.56 0.56
sil 0.67 1.00 0.34 0.59 0.52
caf 0.27 0.34 1.00 0.55 0.21
tri 0.52 0.64 0.58 1.00 0.61
bic 0.57 0.57 0.37 0.65 1.00

C Factor Correlations

sscCm silCm cafCm triCm bicCm
ssc 1.00 0.31 0.16 -0.11 0.03
sil 0.31 1.00 0.24 0.17 0.64
caf 0.16 0.24 1.00 0.35 0.46
tri -0.11 0.17 0.35 1.00 0.74
bic 0.03 0.64 0.46 0.74 1.00

E Factor Correlations

sscEm silEm cafEm triEm bicEm
ssc 1.00 0.41 0.28 0.41 0.32
sil 0.55 1.00 0.33 0.41 0.37
caf 0.33 0.36 1.00 0.37 0.42
tri 0.37 0.39 0.28 1.00 0.41
bic 0.37 0.43 0.33 0.46 1.00

enhance umx_make_TwinData to work with raw correlations as well as AA, CC, EE

umx_make_TwinData is already pretty slick in simulating tiwn data based on any 2 of AA, CC, and EE, gxE, etc.

A handy enhancement would be to support MZr and DZr as parameters, so this would work:

require(umx)
tmp = umx_make_TwinData(10000, MZr = .86, DZr= .60, varNames= "IQ")
mzData = tmp[[1]]
dzData = tmp[[2]]
umxAPA(mzData); umxAPA(dzData)

m1 = umxACE(selDVs = "IQ", dzData = dzData, mzData = mzData, sep = "_T")

plot(m1)
|   |   a1|   c1|   e1|
|:--|----:|----:|----:|
|IQ | 0.72| 0.58| 0.37|

string syntax for umxRAM

Current slowness in prototyping is the typing out of long variable names, and the visual interference of commas and = signs in umxPath commands.

Graphviz has nice feature of allowing short names. We could do this if, for instance the user provided manifests, we could then let them refer to these as simply m1, m2 etc in path statements.
This might make umxPath calls easy enough, that string input is not worth the hassle.

If we have string input, it should include equate in string

  1. a1 == a2

  2. a1 + a2 + a3 == 3

    umxRAM("play", manifests = c("extraversion", "neuroticism", "conscientiousness", "depression"),
    sem = "v2*v3->v4"
    )

Add umxACEv (variance components version of ACE model)

Will need

  • new umxACEv function
    • what bounds etc. will help make this function stay sane?
  • new umxSummary.ACEv S3 method
  • new plot.ACEv S3 method
  • support 'umxReduce'

Should we include support for reporting path coefficients? (or just big ACE?)

allow umxMatrices and mxMatrices in umxRAM models

make this work (h/t @bwiernik)

require(umx)
data(demoOneFactor)
myData = mxData(cov(demoOneFactor), type = "cov", numObs = 500)
m1 <- umxRAM("OneFactor", data = myData,
	mxMatrix(name = "v", "Full", nrow = 5, ncol = 1, free = TRUE)
)
# Error in .hasSlot(x, name) : invalid type or length for slot name

better start values in umxCP models

The start values for umxCP models are hard coded at values (0.7) unlikely to be Kosher for a wide set of inputs: they were just chosen to not die on the runway. (old code below)

umx 3.0 code should be smarter. Any thoughts on choosing good starts for the different free elements of the model?

# Latent common factor
if(correlatedA){
	a_cp_matrix = umxMatrix("a_cp", "Lower", nFac, nFac, free = TRUE, values = .7, jiggle = .05) 
} else {
	a_cp_matrix = umxMatrix("a_cp", "Diag", nFac, nFac, free = TRUE, values = .7, jiggle = .05)
}

Start values for umxACE etc.

Sometimes (with variables varying widely in scale? or having a variable with a large variance?), and an optimizer other than NPSOL, umxACE will fail

  1. Establish some example models with this behavior
  2. Diagnose the problem (divergence in scale, or just one being big?)
  3. Either, alter user by inspecting the data OR fix my starts algorithm (pretty easy)

Add WLS to twin models

  1. Use the same type = c("Auto", "FIML", "WLS", ...) parameter used in umxRAM
    • Auto = treat as FIML if raw and ML if cov
  2. Remove failed "censored-data" mode
  3. Check model has no def vars.
  4. Swap data object to mxDataWLS.
  5. Swap fit function to mxFitFunctionWLS.
  6. Apply to
  • umxACE
  • umxCP
  • umxIP
  1. [] Create tobit mode for censored continuous data

enhance umx_aggregate

Allow this umx_aggregate to take more than one left-hand argument (like doBy::doBy)
so umx_aggregate(rbind(mpg, qsec) ~ cyl, data = mtcars) would work (like this does now:

umx_aggregate(cbind(mpg, qsec) ~ cyl, data = mtcars)
umx_aggregate(mpg ~ cyl, data = mtcars)

|cyl        |mpg          |
|:----------|:------------|
|4 (n = 11) |26.66 (4.51) |
|6 (n = 7)  |19.74 (1.45) |
|8 (n = 14) |15.1 (2.56)  |
  1. More built-in summary options
    1. currently have a nice mean and sd built-in. what else to add?
  2. Additional/more flexible formatting of output
    1. transpose table?

add fixed (definition variable) covariates to umxACE

Currently, users wanting to use covariates are encouraged to use umx_residualize on their data. This doesn't work for ordinal variables (it's not good turn sex from a binary to a continuous-bimodal distribution...) and also it's nice to have the means in the model, and to retain the raw data.

v 2.0 of umx should support covariates, report how many rows were lost, have the means and covariate betas printed in summary,

standardize umxACEv (direct variance model)

the standardization code for umxACEv includes this additional initial step of converting each of A, C, and E to correlation matrices:

A = cov2cor(abs(A)) * sign(A)
C = cov2cor(abs(C)) * sign(C)
E = cov2cor(abs(E)) * sign(E)

When given as input to the conventional code below, for a univariate model, this results in A C and E each having unit variance, which isn't right.

What should it be?

Vtot = A + C + E;  # Total variance
I  = diag(nVar);  # nVar Identity matrix
InvSD = solve(sqrt(I * Vtot))
model$top$A$values = InvSD %&% A; # Standardized variance components

Add df adjustments to umxSummary, other functions

A common practice in personality SEM analyses is to first estimate a measurement model, then fix these parameters before estimating a path model. With this procedure, the fit statistics for the final model are computed using an incorrect DF parameter--they need to be adjusted downward by the number of fixed parameters that were freely estimated in the measurement model. Currently, I've been doing these adjustments by hand--would it be possible to add a df.adjust parameter to umxSummary and other functions to streamline this process?

umxObjectiveFIML to make the common 2-line call into one

umxObjectiveFIML would wrap the expectation and fit function (used to work this way in OpenMx 1). It was convenient for many default uses).

instead of
mxExpectationNormal(covariance, means, dimnames = NA, thresholds = NA, threshnames = dimnames)
mxFitFunctionML(vector = FALSE)

You would say:

umxObjectiveFIML(covariance, means, dimnames = NA, thresholds = NA, threshnames = dimnames, vector = FALSE)

Would that be useful, or confusing when making models?

Might do same for other functions

  • mxAlgebraObjective
  • mxFIMLObjective
  • mxLISRELObjective
  • mxRowObjective
  • mxRObjective
  • mxRAMObjective
  • mxMLObjective

Referencing for Variance components approach

As raised in #38, we need references for the new/old direct variance components estimate approach.

We need to source refs for

  1. historical refs for the variance components approach: Martin?
  2. Ref comparing this to the 2*(MZ-DZ) = 2* (1-.2) = h^2 >1 problem. How same, how different
    3 historical refs for the "symmetric unstable/not-positive definite" ∴ → Cholesky decomposition
  3. historical refs for why the cholesky decomposition is not saturated/restricts solution space
  4. ref for C rejected too often
  5. ref for not following χ² distribution.
  6. ref about how this already occurs for genetic correlations, which can exceed 1
  7. Refs for what edge-case solutions mean
    1. Clear statement of whether >1 values etc. mean the model is wrong, or have an interpretation as i, and what this is

better start values in RAM model

The code for asymmetric start values is currently as below.
The hard coded .9 is OK for only a small subset of models. A better smart start would be: sqrt(.2*Variance)/nFactors

# ======================================================
# = Put modest starts into the asymmetric (one headed) =
# ======================================================
Arows = nrow(obj$matrices$A$free)
Acols = ncol(obj$matrices$A$free)
if(onlyTouchZeros) {
	freePaths = (obj$matrices$A$free[1:Arows, 1:Acols] == TRUE) & obj$matrices$A$values[1:Arows, 1:Acols] == 0
} else {
	freePaths = (obj$matrices$A$free[1:Arows, 1:Acols] == TRUE)			
}

obj$A@values[1:Arows, 1:Acols][freePaths] = .9
return(obj)

enhance plot to work with super/sub models

#45 ended with umxSummary outputting nice tables for nested models.

plot should be able to just draw each sub as it would on its own, and then do a smart concatenatation of the dot files before drawing/writing them.

Should be doable.

Specifying a mixture model?

Thank you for creating (and documenting so well) such an excellent package. I'm currently trying to specify a mixture model (specifically, for Latent Profile Analysis or Latent Class Analysis). After surveying the available options in R, it looks like OpenMx may generally be the best way to carry it out, and umx seems like the best way to use OpenMx.

I read some of the OpenMx documentation on how to specify mixtures / classes (i.e., here, and after reviewing the help (and reading through the specification for multiple group models, which is close but if I understand correctly a bit different from discovering the classes or mixture components, am wondering if you could point me in the right direction for whether - and if so how - you could specify this using umx... functions.

warn users if data variances are too small

this version is available on github via
install_github("tbates/umx")

It runs fine for univariate, but a bivariate example is failing with bad starting values.

The un-run model can be gotten by saying:

m1 = umxACEv(selDVs = c("ht", "bmi"), dzData = dzData, mzData = mzData, sep="", autoRun = F)

Here's the code to show the failure:

require(umx); data(twinData)
mzData <- twinData[twinData$zygosity %in% "MZFF", ]
dzData <- twinData[twinData$zygosity %in% "DZFF", ]

# 1 var works
m1 = umxACEv(selDVs =  "ht", dzData = dzData, mzData = mzData, sep="")

# 2, bivariate doesn't get off the ground
m1 = umxACEv(selDVs = c("ht", "bmi"), dzData = dzData, mzData = mzData, sep="")

1: The continuous part of the model implied covariance (loc2) is not positive definite in data 'DZ.data' row 5. Detail:
# covariance =  matrix(c(    # 3x3
#  0.00427128, 0.3, 0.00213564
# , 0.3, 0.941062, 0.15
# , 0.00213564, 0.15, 0.00427128), byrow=TRUE, nrow=3, ncol=3)
#
# )
# In addition: Warning message:
# In model 'ACE' Optimizer returned a non-zero status code 10. Starting values are not feasible. Consider mxTryHard()

allow variance components as output in tables and plots (A instead of a)

re this comment

We could switch so plots and tables output A, C, E instead of a, c, and e

What's normal (or at least, what convention do we want to encourage?

Show A, C, E by default? Allow a choice?

As it stands, the default CIs included in umxACE are a, c, e. Can switch them to A, C, E if that's the right thing to do, will break some existing output.

Another thing is that we can use SEs for a (also on the todo list, as MV profile CIs take forever.

These don't exist for big ACE- so would need to use mxSE() to compute. No great hassle.

FYI, I've noticed odd behavior in profile CIs, where dropping the parameter has a p value of, say, .5 but the CI on the parameter doesn't include zero. Also lots of failures in computing CIs, mostly on C.

formatting for plots output by umx

This is a typical plot output by umx (umxACE)
It uses the variable name for twin 1 as the variable name, and prints to digits= precision

Is it OK as is, or should we, by default

  1. Strip the twin identifer where known. so IQ_T1 --> IQ
  2. For compactness and ease of reading, strip the "0." So "0.77" --> "77"

default graph output

teach plotCP to count above 9

Would avoid this :-)

cp with more than 10 manifests

issue is regex on digit not allowing for more than 1 digit in each row or column: so this fails to match "a_r10c10"

Might take the opportunity too, to recode the plot() family to traverse the matrices, not the labels at the same time. Doing this in simplex plot now, and it was actually more straightforward.

Documentation of umxACEv

Top of it says: "umxACEv supports modeling with the ACE Cholesky model, a core model in behavior genetics (Neale and Cardon, 1996)."

But I thought it was doing the direct symmetric type of modeling? BTW Neale & Cardon was '92, yeah, I'm that old...

Set how zero cells are printed in the output of umxACE

In #15, @mcneale requested the ability to set how zero cells are printed in the output of umxACE etc.

Turns out previous me solved this request back in 2012 :-)
The default is "." for clarity, but you can set zero.print="whatever you want")

e.g.:

require(umx)
data(twinData) # ?twinData set from Australian twins.
# Pick the variables
selDVs = c("bmi1", "bmi2")
mzData <- twinData[twinData$zygosity %in%  "MZFF", ]
dzData <- twinData[twinData$zygosity %in% "DZFF", ]
m1 = umxACE(selDVs = selDVs, dzData = dzData, mzData = mzData)

-2 × log(Likelihood)
'log Lik.' 9659.215 (df=4)

Standardized solution

a1 c1 e1
bmi1 0.86 . 0.51
umxSummaryACE(m1, std= FALSE, zero.print="mike")

Raw solution

a1 c1 e1
bmi1 0.84 mike 0.49

[minor] Allow "sep" as a synonym for "suffix" in twin models

The separator in twin variable names (e.g. the "_T" in var1_T1 is called suffix in umxACE, umxCP, umxIP etc. Might aid consistency and R expectations if sep was allowed?

e.g.
m1 = umxCP(selDVs = genItems, dzData = dzData, mzData = mzData, sep = "_T") # proposed m1 = umxCP(selDVs = genItems, dzData = dzData, mzData = mzData, suffix = "_T") # existing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.