The ours from derekbeaton

Error in cat.mcd.find.sample.R, line 51

Hi,

We found an error in the cat.mcd.find.sample.R file, in line 51.

Original function:
final.configs <- unique(unique.min.configs[1:min(nrow(unique.min.configs), perc.cut),])
This returns a vector, but the function immediately following this one (final.dets) seems to be expecting a matrix.

We were able to get around this problem by adding a drop=F parameter to the function:
final.configs <- unique(unique.min.configs[1:min(nrow(unique.min.configs), perc.cut),,drop=F])

Should I push this up?

nrow(final.configs) fails when final.configs is not a matrix

See, e.g., when the CatMCD example is run with num.subsets = 5.

This is because final.configs is a vector and not a matrix. I think at this point if we have a final.configs that is only a vector, we should just return it as is without the final concentration (c-) step.

Categorical data: Wide table report

A function to create a wide table from outlier information and the contributions information.

rename cat. & gen.

gen. should actually become the "core", with cat., ord., and mixed. as prefixes for mcd(). then it all gets passed into that gen.mcd core

Continous data: Wide table report

A function to create a wide table from outlier information and the contributions information.

Error in svd: infinite/missing

The ghost of zero variance has re-appeared:

Error in svd(x, nu = nu, nv = nv) : infinite or missing values in 'x'

This happens when a column or row have identical values. This needs to be caught (try/catch) and handled somehow... it might reflect that there is a robust group or not. Not sure yet. Easiest way to handle is to skip over it, but to limit the number of times it is skipped over in succession.

Data types & transformations vignette to be made fancier

As per our email. @derekbeaton

make.data.nominal - blows up sometimes

See recent example of header data.

Small spelling mistake in function name (continuous_corrmax) causing error in Outliers

One of the function names in the OuRS package was misspelled and causing an error in the Outliers app, namely the function supposed to be named continuous_corrmax was written as "continous_corrmax" (the 'u' was missing) in the OuRS package. Should I just change the name in the Outliers app?

In files: corrmaxs.R, continous_corrmax.Rd, NAMESPACE

@derekbeaton

score correlations

I'm not so sure those are correct anymore... because of arbitrary flips.

Remove the forced rownames/colnames from data transforms

They seem to be unnecessary and were put there for convenience (laziness) for the variable maps. I can make it so that if they do not exist, then the variable maps will use the indices instead of the names.

ca_preproc should be able to handle NAs

GSVD has some stupid tests/conditions

The gsvd() function needs an update to correctly allow for the ignoring of RW and LW.

As of now I think it's the slightly older code. I need to pull the latest code from the GSVD package. Eventually OuRS will depend on the GSVD package.

thermometer.coding needs to be from 0

For the scale to work correctly, it should be from 0 where the minimum is set to the 0.

Continuous data: Long table report

A function to create a long table from outlier information and the contributions information.

Categorical data: Long table report

A function to create a long table from outlier information and the contributions information.

Two Fold needs comparable output to MCD

There are some missing pieces, so there needs to be a wrapper around the two.fold.* functions so that all the same kinds of inputs compare to *.mcd() go in and similar results come out (except OD obviously).

drop & bring back bootstrap approach

drop it from the core package (for now) and then bring it back as a utility. I think we may want to include an "inference_utils.R" for these and other approaches

h.alpha.n: based on columns or rank?

I think this function should change to where p is based on available rank i.e., length($d) instead of ncol(DATA). That's because ncol(DATA) is inherently collinear for the generalized case. So it should be based on the span of the subspace, not the total number of columns

there are a lot of redundancies

not only in the code between e.g., find_sample and c_step, but also here in the issues & projects!

eliminate those when possible

Thermometer coding needs to change

The better/more flexible one exists in GPLS. It should be ported over ASAP.

disjunctive_coding fails when one of the columns is all NA

I found this error when I tried running the disjunctive_coding function on a data set and all the elements of one of the columns were NA.

Error in `[<-`(`*tmp*`, which(DATA[, i] == unique_no_na[j]), j, value = 1) : 
  subscript out of bounds

Error happens in this line since unique_no_na is a logical(0):
mini.mat[which(DATA[, i] == unique_no_na[j]), j] <- 1

Maybe this is fine to leave in if we don't intend to allow NA columns, but I didn't see any mention of it in the docs so I thought I would bring it up.

@derekbeaton

Add in correction factors, e.g., the constant scaling and/or the consistency factors. Those are not necessary for now but we used those (by proxy) when we used the robust covariance matrix for the resampling.

bring in design for split-half

there are two possible designs for split-half: resampling within a given row factor or resampling constrained to.

The first is to take splits from within. The second is to split the whole sample, but take entire portions of rows together (e.g., fMRI).

derekbeaton / ours Goto Github PK

ours's People

Contributors

Stargazers

Watchers

ours's Issues

Recommend Projects

Recommend Topics

Recommend Org