Coder Social home page Coder Social logo

raamana / confounds Goto Github PK

View Code? Open in Web Editor NEW
38.0 6.0 14.0 27.95 MB

Conquering confounds and covariates: methods, library and guidance

Home Page: https://raamana.github.io/confounds

License: Apache License 2.0

Makefile 1.65% Python 98.35%
confound covariates machine-learning cross-validation scikit-learn statistics regression classification neuroimaging neuroscience

confounds's Introduction

Conquering confounds and covariates in machine learning

image

image

image

image

News

Vision / Goals

The high-level goals of this package is to develop high-quality library to conquer confounds and covariates in ML applications. By conquering, we mean methods and tools to

  1. visualize and establish the presence of confounds (e.g. quantifying confound-to-target relationships),
  2. offer solutions to handle them appropriately via correction or removal etc, and
  3. analyze the effect of the deconfounding methods in the processed data (e.g. ability to check if they worked at all, or if they introduced new or unwanted biases etc).

Documentation

https://raamana.github.io/confounds

Methods

Available:

  • Residualize (e.g. via regression)
  • Augment (include confounds as predictors)
  • Some utils

To be added:

  • Harmonize (correct batch effects via rescaling or normalization etc)
  • Stratify (sub- or re-sampling procedures to minimize confounding)
  • Full set of utilities (Goals 1 and 3)
  • reweight (based on propensity scores as in IPW, or based on confounds)
  • estimate propensity scores

In a more schematic way:

image

Resources

any useful resources; papers, presentations, lectures related to the problems of confounding can be found here https://github.com/raamana/confounds/blob/master/docs/references_confounds.rst

Citation

If you found any parts of confounds to be useful in your research, directly or indirectly, I'd appreciate if you could cite the following:

  • Pradeep Reddy Raamana (2020), "Conquering confounds and covariates in machine learning with the python library confounds", Version 0.1.1, Zenodo. http://doi.org/10.5281/zenodo.3701528

Contributors are most welcome.

Your contributions of all kinds will be greatly appreciated. Learn how to contribute to this repo here.

All contributors making non-trivial contributions will be

  • publicly and clearly acknowledged on the authors page
  • become an author on the [software] paper to be published when it's ready soon.

confounds's People

Contributors

dinga92 avatar jameschapman19 avatar jrasero avatar raamana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

confounds's Issues

better validation of inputs to Deconfounders

the #19 reminds me of how some users can be confused given the code lets the second argument to .fit() and .transform() optional with y=None. The only reason we have y=None is to try follow sklearn conventions and to pass their tests, but given we can't pass them anyway, we should tighten them up and make it an error to not supply the second [necessary] input argument.

cc @jrasero @jameschapman19

drop-in replacements for cross_val_predict and cross_val_score etc

Pradeep,

could something like this be of interest for the library?

The idea would be to create a class that would do fit and predict including deconfounding and the use of the estimator in an encapsulated way.

Below is a skeleton example. This would only deconfound the input data.

cross_val_predict and cross_val_score functions could as well be implemented.

from sklearn.base import clone

class SklearnWrapper():

    def __init__(self,
                 deconfounder,
                 estimator):

        self.deconfounder = deconfounder
        self.estimator = estimator

    def fit(self,
            input_data,
            target_data,
            confounders,
            sample_weight=None):

        # clone input arguments
        deconfounder = clone(self.deconfounder)
        estimator = clone(self.estimator)

        # Deconfound input data
        deconf_input = deconfounder.fit_transform(input_data, confounders)
        self.deconfounder_ = deconfounder

        # Fit deconfounded input data
        estimator.fit(deconf_input, target_data, sample_weight)
        self.estimator_ = estimator

        return self

    def predict(self,
                input_data,
                confounders):

        deconf_input = self.deconfounder_.transform(input_data, confounders)

        return self.estimator_.predict(deconf_input)

Performance score stratified by confound

utils.score_stratified_by_confound()

Helper to summarize the performance score (accuracy, MSE, MAE etc) for each
level or variant of confound. This is helpful to assess any bias towards a
particular value when confounds are categorical (such as site or gender). So
if the MSE (of target) for Females is much lower compared to Males, then it
may indicate a potential bias of the model towards Females (due to imbalance in
size?)

Error fitting Residualize

  • confounds version: 0.1.1
  • Python version: 3.9.7
  • Operating System: macOS 11.6

Description

I tried to run the example code with some dummy data, but get an error when I try to fit Residualize

What I Did

# Using the diabetes dataset as an example
from sklearn import datasets

df = datasets.load_diabetes(as_frame=True)['data']
X = df[['bmi', 'age', 's1']].values # some predictors
y = df['s6'].values # the outcome variable
c = df['sex'].values # a confound - does not matter which

# Splitting into a training and a test set
from sklearn.model_selection import train_test_split

train_ind, test_ind = train_test_split(np.arange(0, len(y)), test_size=0.2)
train_X = X[train_ind, :]
train_y = y[train_ind]
train_C = c[train_ind]

test_X = X[test_ind, :]
test_y = y[test_ind]
test_C = c[test_ind]

# Fitting Residualize to remove the confound
from confounds import Residualize

resid = Residualize()
resid.fit(train_X, train_C)
deconf_train_X = resid.transform(train_X, train_C)

Error message:

TypeError: check_is_fitted() takes from 1 to 2 positional arguments but 3 were given
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/m0/mddm8pfx1vs3q52qvgx4mpxw0000gp/T/ipykernel_27134/3595471338.py in <module>
      1 resid = Residualize()
      2 resid.fit(train_X, train_C)
----> 3 deconf_train_X = resid.transform(train_X, train_C)

/opt/anaconda3/envs/brain_shadows/lib/python3.9/site-packages/confounds/base.py in transform(self, X, y)
    186         """Placeholder to pass sklearn conventions"""
    187 
--> 188         return self._transform(X, y)
    189 
    190 

/opt/anaconda3/envs/brain_shadows/lib/python3.9/site-packages/confounds/base.py in _transform(self, test_features, test_confounds)
    192         """Actual deconfounding of the test features"""
    193 
--> 194         check_is_fitted(self, 'model_', 'n_features_')
    195         test_features = check_array(test_features, accept_sparse=True)
    196 

TypeError: check_is_fitted() takes from 1 to 2 positional arguments but 3 were given

Comment

It looks like there is some incompatibility, but I'm not sure what package is causing the error. Any help would be greatly appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.