Coder Social home page Coder Social logo

plink.jl's People

Contributors

klkeys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

plink.jl's Issues

Roadmap for future development

PLINK.jl currently implements some linear algebra facilities, but they are currently limited to the minimum necessary for the IHT.jl package to perform penalized linear least squares regression. If the package is to become useful, then it requires some direction.

From an engineering standpoint, the best thing to do is to make BEDFile objects behave more like actual arrays. Some proposals include:

  • incorporating means and precisions into the BEDFile object as additional fields, which entails adding methods (e.g. addmeans!, addinvstds!) to add the requisite vectors to a BEDFile as well as constructor methods that can read the means + precisions from file.
  • reformatting xb! and xty! to instead overload A_mul_B! and At_mul_B! with methods targeting the BEDFile. Housing means and precisions in a BEDFile goes hand-in-glove with simplifying calls to xb! and xty!. Matrix-vector multiplication methods currently see means and precisions as optional keyword arguments, but the proposed revamped A_mul_B! with means/precisions in the BEDFile itself would absolve the user from the constant need to put means and precisions into the method calls.
  • in a similar vein, reformatting xb and xty to instead overload *()
  • add facilities for dense A_mul_B! to enable iterative linear solves via the conjugate gradient method (CG). Currently xb! assumes that the vector multiplicand is sparse. Given the computational burden associated with dense linear algebra with PLINK BED files, this operation should also have both a CPU and a GPU-accelerated variant.
  • add facilities for matrix-matrix multiplication in order to enable operations such as principal components analysis (PCA). This operation should also have both CPU and GPU variants.
  • sumsq should become an overloaded sumabs2 and should work on both row and column margins. If the need is present, then sumabs and related functions should be added.

Issues pertaining to parallelism:

  • Julia v0.5 is slated to include native multithreading. The current parallel model exploits SharedArray parallelism, which is slightly different. Should PLINK.jl abandon SharedArray parallelism in favor of multithreading?
  • Test the possibility of splitting BED files over multiple GPUs. Modern massive BED files can easily contain 200,000 subjects and 6-8M markers, which will never fit as one file on a GPU.
  • Rigorously examine and minimize communication between cores in SharedArray operations. This matters especially for matrix-vector operations in which the compressed data and the vectors may require cores to share data at the boundaries of their local indices. A fully optimized parallel model with SharedArrays would ensure that each core operates completely independently on its portions of x.x, x.xt, and the vector multiplicand.
  • Currently xb! is not parallelized since it is optimized for sparse vector multiplicands, but that would change if support for dense vectors is added.

Issues pertaining to compression:

  • PLINK.jl enables array indexing of BED files, but getindex currently requires ~1.5-2x the memory and compute time of normal array indexing. Any small improvements in getindex can precipitate potentially dramatic reductions in compute time since array indexing is a fundamental operation in the linear algebra routines
  • Decompression is not (yet) parallelized.
  • As of 22 May 2016, PLINK2 is slated to include a new compression standard for fractional dosages. Future decompression routines should account for the new standard.

Issues pertaining to design, code base maintenance:

  • All algorithm parameters and defaults are currently manipulated with optional arguments to the function. The number of such arguments can grow embarrassingly large; see the GPU variant of xty! as an example. It may improve matters to build a PLINKOptions structure with all default arguments included whenever the module is loaded. All functions can then use the parameter defaults via PLINKOptions. Users can modify the entries in PLINKOptions as they see fit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.