Coder Social home page Coder Social logo

dtucomputestatisticsanddataanalysis / mbpls Goto Github PK

View Code? Open in Web Editor NEW
28.0 2.0 8.0 17 MB

(Multiblock) Partial Least Squares Regression for Python

Home Page: https://mbpls.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 97.08% Shell 0.44% TeX 2.47%
chemometrics metabolomics data-science pattern-recognition multivariate-statistics subspace-learning supervised-learning data-fusion data-integration multivariate-analysis

mbpls's People

Contributors

b0nsaii avatar lvermue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mbpls's Issues

[JOSS review]: statement of need

The manuscript already includes hints of why the software is needed, but I believe that the JOSS review criteria require an explicit statement of need.

[JOSS review]: community guidelines

From the JOSS review criteria: "Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support?". Please add these items to your documentation. Consider using a Issue template and adding a CONTRIBUTING.md file.

[JOSS review]: compatibility with scikit learn

The compatibility with scikit learn should be confirmed by adding a test that uses check_estimator

I think something like this should work:

def test_sklearn_compatibility():
    from sklearn.utils.estimator_checks import check_estimator
    from mbpls.mbpls import MBPLS
    check_estimator(MBPLS)

Plot error

I am attempting to run through the very simple MBPLS example from the Docs page. I'm running the code exactly as shown below and getting an error in the mbpls.plot() function. Not sure if this might be an issue of compatibility with the version of a dependency? Thanks!

`import numpy as np
from mbpls.mbpls import MBPLS

num_samples = 40
num_features_x1 = 200
num_features_x2 = 250

Generate two random data matrices X1 and X2 (two blocks)

x1 = np.random.rand(num_samples, num_features_x1)
x2 = np.random.rand(num_samples, num_features_x2)

Generate random reference vector y

y = np.random.rand(num_samples, 1)

Establish prediction model using 3 latent variables (components)

mbpls = MBPLS(n_components=3)
mbpls.fit([x1, x2],y)
y_pred = mbpls.predict([x1, x2])

Use built-in plot method for exploratory analysis of multiblock pls models

mbpls.plot(num_components=3)

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Python version: 3.10
  • Version of this software: 1.0.4
  • Versions of required Python packages:
    • Numpy: 1.24.1
    • Scipy: 1.9.3
    • Scikit-learn: 1.2.1
    • Pandas: 1.5.2

Additional information

Error message is as follows:


ValueError Traceback (most recent call last)
Cell In[41], line 2
1 # Use built-in plot method for exploratory analysis of multiblock pls models
----> 2 mbpls.plot(num_components=3)

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/mbpls/mbpls.py:1483, in MBPLS.plot(self, num_components)
1480 for block in range(self.num_blocks_):
1481 # Inverse transforming weights/loadings
1482 if self.standardize:
-> 1483 P_inv_trans.append(self.x_scalers_[block].inverse_transform(self.P_[block][:, comp]))
1484 else:
1485 P_inv_trans.append(self.P_[block][:, comp])

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/preprocessing/_data.py:1034, in StandardScaler.inverse_transform(self, X, copy)
1031 check_is_fitted(self)
1033 copy = copy if copy is not None else self.copy
-> 1034 X = check_array(
1035 X,
1036 accept_sparse="csr",
1037 copy=copy,
1038 dtype=FLOAT_DTYPES,
1039 force_all_finite="allow-nan",
1040 )
1042 if sparse.issparse(X):
1043 if self.with_mean:

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/utils/validation.py:902, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
900 # If input is 1D raise error
901 if array.ndim == 1:
--> 902 raise ValueError(
903 "Expected 2D array, got 1D array instead:\narray={}.\n"
904 "Reshape your data either using array.reshape(-1, 1) if "
905 "your data has a single feature or array.reshape(1, -1) "
906 "if it contains a single sample.".format(array)
907 )
909 if dtype_numeric and array.dtype.kind in "USV":
910 raise ValueError(
911 "dtype='numeric' is not compatible with arrays of bytes/strings."
912 "Convert your data to numeric values explicitly instead."
913 )

ValueError: Expected 2D array, got 1D array instead:
array=[ 0.16242773 -0.90119075 0.05693651 -1.39487254 0.52740615 1.53864419
0.19384363 -1.04456622 1.33753475 -0.12001207 -2.63750884 -1.25099171
0.46053789 1.46558114 -0.57801252 0.52272024 0.37772636 -2.30847845
0.93480207 0.44214023 -0.44434544 -0.08997732 -0.1186683 2.01285023
-0.98363685 0.16022649 -1.16944362 1.72728776 1.34994501 2.50601505
-0.23280568 -2.02817317 -0.7787971 -0.36583488 0.52913013 -1.31727876
-0.66529331 -0.14586002 0.12121093 1.10357558 0.45660313 -2.30975725
0.88526315 -0.15688676 0.36485809 -0.34201796 -0.38742152 0.03129468
-1.45172671 -0.44566978 1.06682479 -0.20197725 -0.19454723 0.39258781
0.90435152 -0.63915512 1.58018959 1.8360178 0.03612573 -0.10075124
1.57287608 -2.16566173 1.94442864 -0.63692063 -0.99461265 0.93788319
-0.38755028 0.26529161 -1.75552239 -0.44738291 1.17226965 0.77052395
-0.4260226 0.18251133 0.89851425 -0.23944746 0.04028808 0.44832614
-0.54714231 -1.13502509 0.37087471 -1.25212905 -0.95237723 -0.07714901
-0.63079535 -1.3229635 2.21617018 0.27867475 -0.77521484 0.38658397
2.00861021 -0.12747727 -0.95342378 -1.6529634 -0.25206374 -0.16668305
-0.21098895 -1.35220926 -0.77684738 -0.91144111 -1.50654517 0.05666307
-0.39321412 -0.3387659 -0.61008176 0.81017022 -0.45387348 1.51662983
-0.05679999 -0.57791232 -0.91545596 0.16725033 -0.85116323 -0.76413108
0.3340035 -0.32600628 0.53474856 -0.51572487 1.21295054 0.19976358
0.33056191 0.62291484 -0.15561658 -1.12258708 1.74775337 -0.58084756
0.38960204 1.24972012 1.0377948 1.91234132 1.27835914 0.50352515
-0.94301361 0.37051374 -0.6044645 1.15750204 -0.36847713 -0.39267774
0.85437988 -1.19265517 -0.7386583 -0.285102 0.7977028 0.17117149
-2.41819117 1.65545797 -1.82631145 -0.90465321 0.42059891 0.04521615
0.08294255 1.14001634 3.29792127 0.73195648 1.34766419 -1.29008346
2.14243689 -0.91052722 -2.56618403 -0.10871541 -2.94942164 0.8296527
0.52906363 1.72234762 0.47726349 0.38223241 -1.42525341 -0.88165607
0.16920468 -0.60258708 1.36064972 -0.25423957 -0.18568209 -2.29655989
1.36165476 1.21798649 0.17978893 1.40157221 0.54140441 -1.15602939
0.12731688 1.71642263 0.76208638 0.61265948 0.61727227 -0.27165942
-1.36591625 0.36943787 -1.27815774 -0.41023464 1.66012597 1.58381415
-0.15379111 -1.99536391 -1.2228214 -1.15689281 0.34514969 0.14252237
1.35108987 -1.68321132].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Variable importance in projection

Is your feature request related to a problem? Please describe.
Variable importance in projection (VIP) is a useful metric for PLS models to help understand feature importance. I use the mbpls package a lot in my research and it would be great for there to be a multiblock VIP attribute implemented.

Describe the solution you'd like
Implementation of VIP as an attribute of the mbpls class, so that after model fitting VIP scores can be easily accessed. A definition of VIP can be found in Mehmood et al 2012 (https://doi.org/10.1016/j.chemolab.2012.07.010.) This definition is for standard (single-block) PLS rather than multi-block, however VIP should technically be extensible to MB-PLS by using the superscores.

Describe alternatives you've considered
I have attached python code of the function for MB-PLS VIP which I have implemented myself. It uses attributes from the mbpls class (weights, scores etc) to calculate VIP. But it would be great if this could be implemented in the main package.

import numpy as np

def VIP_multiBlock(x_weights, x_superscores, x_loadings, y_loadings):
    # stack the weights from all blocks 
    weights = np.vstack(x_weights)
    # normalise the weights
    weights_norm = weights / np.sqrt(np.sum(weights**2, axis=0))
    # calculate product of sum of squares of superscores and y loadings
    sumsquares = np.sum(x_superscores**2, axis=0) * np.sum(y_loadings**2, axis=0)
    # p = number of variables - stack the loadings from all blocks
    p = np.vstack(x_loadings).shape[0]
    
    # VIP is a weighted sum of squares of PLS weights 
    vip_scores = np.sqrt(p * np.sum(sumsquares*(weights_norm**2), axis=1) / np.sum(sumsquares))
    return vip_scores```

Computation of diff_t in the NIPALS Algorithm.

In the line 855 of the MultiBlockPLS implementation in mbpls.py .

You use: diff_t = np.sum(superscores_old - superscores)

While actually the Euclidean Metric should be used as far as I understand.
with: diff_t = np.sum((superscores_old - superscores)**2)
or diff_t = np.sum((superscores_old - superscores)**2)**0.5

Could you clarify your use of this "metric" for the vectors please, in case I am mistaken?

Move the link to documentation higher up

Github allows you to post a website link at the top of the repo page. I would recommend posting the link to the online docs there. At the very least, I would move it from the end of the readme to a more prominent location (e.g., at the top of the README).

Test file name is confusing

I would have expected to find a file called test_mbpls with tests for the package. The file-name test_Installation.py led me to initially think that this contains something that tests the installation process. Definitely not a requirement for the JOSS review (I will mark required changes with "[JOSS review]"), but I would rename that file to test_mbpls.py.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.