Coder Social home page Coder Social logo

uds-helms / beclear Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 1.09 MB

Correction of batch effects in DNA methylation data

Home Page: https://bioconductor.org/packages/release/bioc/html/BEclear.html

License: GNU General Public License v3.0

R 90.67% TeX 7.11% C++ 2.22%
bioconductor-package dna-methylation rpackage missing-values batch-effects methylation missing-data latent-factor-model stochastic-gradient-descent

beclear's People

Contributors

dtenenba avatar hpages avatar jwokaty avatar livia-rasp avatar nturaga avatar sonali-bioc avatar vobencha avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

krferrier

beclear's Issues

Simplify gradient descent function

Simplify gdepoch and dlossp function so that it only uses the gdepoch function and works with the matrix instead of iterating over each cell of the matrix.
This should improve the performance as well.

Do this after issue #10

[FEATURE] Bias modelling

Is your feature request related to a problem? Please describe.
At the moment the LFM has to account for all the variation in the data.
It could however improve the data imputation to add a bias, which accounts for sample and feature specific effects so that the LFM would only need to account for the effect of the interactions of samples and features.

Describe the solution you'd like
As described by Koren et al
Bias could either just be row and column means or also be trained during the GD

Additional context
It could also be interesting to return the bias to recieve some feedback. Maybe also for further analyses.

Input of data

One past user asked the question per email on how to use "own" data as an input.
We could probably provide an example in a vignette where data is read in from a file.

[FEATURE] After Merging Blocks Continue GD

Is your feature request related to a problem? Please describe.
One possible idea is to split the overall matrix into small blocks first and do GD, then merge them into larger blocks and continue GD.

Describe alternatives you've considered
It is not clear at the moment, if this method ois feasible or if it is even possible easily to merge the Latent Factors of the blocks.

Reuse error from the loss function

The Error, difference between D - L*R is already calculated in the loss function, but is than calculated again during the gradient descent.
Saving the Error should save time. Implement this after issue #11

Use data.table in data imputation

Usage of data.table for the block matrices to improve the performance, as a lot of the runtime seems to get lost due to matrix accessions.

ks.test - not enough 'x' data

The calculation of the p-values returned:
ks.test - not enough 'x' data
for some users. It's most probably because of already existing NAs in their matrix.

[FEATURE] Dixon Test for Outlier Detection

Is your feature request related to a problem? Please describe.
When looking at a group of batches and their BEscore it can be of interest to find which batches are outliers regarding to their BEscore.

Describe the solution you'd like
As described by Akulenko et al.

Describe alternatives you've considered
Maybe use another package for outlier detection

Limit memory during calcMedian

High memory usage, about 5 times higher than the actual input data, during the usage of calcMedians.
This is probably because of temporary copies of the data matrix and could probably be avoided using data.table.

[FEATURE] Test For Convergence In Epochs

Is your feature request related to a problem? Please describe.
During the GD there are so many epochs executed as predefined, it might however fasten the computation, if the method would stop after convergence.

Describe the solution you'd like
Define a treshold for convergence and then test for it during each epoch.

Additional context
Return the Loss and some information about its convergence to give user a confirmation, if it converges at all .

Error in serialize(data, node$con, xdr = FALSE)

Error with bigger dataset:

Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Error in unserialize(node$con) :
embedded nul in string: 'B\n\002\0\0\0\001\003\003\0'
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal

[FEATURE] Don't Save Temporary Results By Default

Is your feature request related to a problem? Please describe.
The imputation of BEclear saves on the disk the solution for each block by default. Afterwards they are loaded and merged again.
This however doesn't help a lot with memory consumption and it could help run time to don't save them on the disk.

Describe the solution you'd like
Make saving the temporary solutions optional.

[FEATURE] Replace for loops in calcSummary and calcScore functions

Is your feature request related to a problem? Please describe.
Those two functions are right now implemented with for loops instead of apply functions or clever usage of data.table features. This makes them unnecessarily slow.

Describe the solution you'd like
Replace them through straight forward use data.table functions.

Describe alternatives you've considered
Using lapply instead.

[FEATURE] Treating Data-Sets Without Batches

Is your feature request related to a problem? Please describe.
Even though BEclear is in first line for detecting and correcting batch effects, it could be sensible to provide some possibility to ork with data, where there are no batches defined

Describe the solution you'd like
Just test each sample against all other samples, i.e. treat each sample as a batch.

Describe alternatives you've considered
Define batches de novo by e.g. clustering the samples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.