dtucomputestatisticsanddataanalysis / mbpls Goto Github PK

(Multiblock) Partial Least Squares Regression for Python

License: BSD 3-Clause "New" or "Revised" License

Python 97.08% Shell 0.44% TeX 2.47%

bioinformatics chemometrics data-fusion data-integration data-science machine-learning metabolomics multivariate-analysis multivariate-statistics pattern-recognition subspace-learning supervised-learning

mbpls's People

Contributors

Stargazers

Watchers

Forkers

li59135016 vieslink pzuvela lycheemart yangbiyun jmarinme2 cwieder kyg0910

mbpls's Issues

Test file name is confusing

I would have expected to find a file called test_mbpls with tests for the package. The file-name test_Installation.py led me to initially think that this contains something that tests the installation process. Definitely not a requirement for the JOSS review (I will mark required changes with "[JOSS review]"), but I would rename that file to test_mbpls.py.

[JOSS review]: compatibility with scikit learn

The compatibility with scikit learn should be confirmed by adding a test that uses check_estimator

I think something like this should work:

def test_sklearn_compatibility():
    from sklearn.utils.estimator_checks import check_estimator
    from mbpls.mbpls import MBPLS
    check_estimator(MBPLS)

[JOSS review]: statement of need

The manuscript already includes hints of why the software is needed, but I believe that the JOSS review criteria require an explicit statement of need.

[JOSS review]: community guidelines

From the JOSS review criteria: "Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support?". Please add these items to your documentation. Consider using a Issue template and adding a CONTRIBUTING.md file.

Computation of diff_t in the NIPALS Algorithm.

In the line 855 of the MultiBlockPLS implementation in mbpls.py .

You use: diff_t = np.sum(superscores_old - superscores)

While actually the Euclidean Metric should be used as far as I understand.
with: diff_t = np.sum((superscores_old - superscores)**2)
or diff_t = np.sum((superscores_old - superscores)**2)**0.5

Could you clarify your use of this "metric" for the vectors please, in case I am mistaken?

Move the link to documentation higher up

Github allows you to post a website link at the top of the repo page. I would recommend posting the link to the online docs there. At the very least, I would move it from the end of the readme to a more prominent location (e.g., at the top of the README).

Plot error

I am attempting to run through the very simple MBPLS example from the Docs page. I'm running the code exactly as shown below and getting an error in the mbpls.plot() function. Not sure if this might be an issue of compatibility with the version of a dependency? Thanks!

`import numpy as np
from mbpls.mbpls import MBPLS

num_samples = 40
num_features_x1 = 200
num_features_x2 = 250

Generate two random data matrices X1 and X2 (two blocks)

x1 = np.random.rand(num_samples, num_features_x1)
x2 = np.random.rand(num_samples, num_features_x2)

Generate random reference vector y

y = np.random.rand(num_samples, 1)

Establish prediction model using 3 latent variables (components)

mbpls = MBPLS(n_components=3)
mbpls.fit([x1, x2],y)
y_pred = mbpls.predict([x1, x2])

Use built-in plot method for exploratory analysis of multiblock pls models

mbpls.plot(num_components=3)

Environment (please complete the following information):

OS: Ubuntu 20.04
Python version: 3.10
Version of this software: 1.0.4
Versions of required Python packages:
- Numpy: 1.24.1
- Scipy: 1.9.3
- Scikit-learn: 1.2.1
- Pandas: 1.5.2

Additional information

Error message is as follows:

ValueError Traceback (most recent call last)
Cell In[41], line 2
1 # Use built-in plot method for exploratory analysis of multiblock pls models
----> 2 mbpls.plot(num_components=3)

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/mbpls/mbpls.py:1483, in MBPLS.plot(self, num_components)
1480 for block in range(self.num_blocks_):
1481 # Inverse transforming weights/loadings
1482 if self.standardize:
-> 1483 P_inv_trans.append(self.x_scalers_[block].inverse_transform(self.P_[block][:, comp]))
1484 else:
1485 P_inv_trans.append(self.P_[block][:, comp])

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/preprocessing/_data.py:1034, in StandardScaler.inverse_transform(self, X, copy)
1031 check_is_fitted(self)
1033 copy = copy if copy is not None else self.copy
-> 1034 X = check_array(
1035 X,
1036 accept_sparse="csr",
1037 copy=copy,
1038 dtype=FLOAT_DTYPES,
1039 force_all_finite="allow-nan",
1040 )
1042 if sparse.issparse(X):
1043 if self.with_mean:

File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/utils/validation.py:902, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
900 # If input is 1D raise error
901 if array.ndim == 1:
--> 902 raise ValueError(
903 "Expected 2D array, got 1D array instead:\narray={}.\n"
904 "Reshape your data either using array.reshape(-1, 1) if "
905 "your data has a single feature or array.reshape(1, -1) "
906 "if it contains a single sample.".format(array)
907 )
909 if dtype_numeric and array.dtype.kind in "USV":
910 raise ValueError(
911 "dtype='numeric' is not compatible with arrays of bytes/strings."
912 "Convert your data to numeric values explicitly instead."
913 )

ValueError: Expected 2D array, got 1D array instead:
array=[ 0.16242773 -0.90119075 0.05693651 -1.39487254 0.52740615 1.53864419
0.19384363 -1.04456622 1.33753475 -0.12001207 -2.63750884 -1.25099171
0.46053789 1.46558114 -0.57801252 0.52272024 0.37772636 -2.30847845
0.93480207 0.44214023 -0.44434544 -0.08997732 -0.1186683 2.01285023
-0.98363685 0.16022649 -1.16944362 1.72728776 1.34994501 2.50601505
-0.23280568 -2.02817317 -0.7787971 -0.36583488 0.52913013 -1.31727876
-0.66529331 -0.14586002 0.12121093 1.10357558 0.45660313 -2.30975725
0.88526315 -0.15688676 0.36485809 -0.34201796 -0.38742152 0.03129468
-1.45172671 -0.44566978 1.06682479 -0.20197725 -0.19454723 0.39258781
0.90435152 -0.63915512 1.58018959 1.8360178 0.03612573 -0.10075124
1.57287608 -2.16566173 1.94442864 -0.63692063 -0.99461265 0.93788319
-0.38755028 0.26529161 -1.75552239 -0.44738291 1.17226965 0.77052395
-0.4260226 0.18251133 0.89851425 -0.23944746 0.04028808 0.44832614
-0.54714231 -1.13502509 0.37087471 -1.25212905 -0.95237723 -0.07714901
-0.63079535 -1.3229635 2.21617018 0.27867475 -0.77521484 0.38658397
2.00861021 -0.12747727 -0.95342378 -1.6529634 -0.25206374 -0.16668305
-0.21098895 -1.35220926 -0.77684738 -0.91144111 -1.50654517 0.05666307
-0.39321412 -0.3387659 -0.61008176 0.81017022 -0.45387348 1.51662983
-0.05679999 -0.57791232 -0.91545596 0.16725033 -0.85116323 -0.76413108
0.3340035 -0.32600628 0.53474856 -0.51572487 1.21295054 0.19976358
0.33056191 0.62291484 -0.15561658 -1.12258708 1.74775337 -0.58084756
0.38960204 1.24972012 1.0377948 1.91234132 1.27835914 0.50352515
-0.94301361 0.37051374 -0.6044645 1.15750204 -0.36847713 -0.39267774
0.85437988 -1.19265517 -0.7386583 -0.285102 0.7977028 0.17117149
-2.41819117 1.65545797 -1.82631145 -0.90465321 0.42059891 0.04521615
0.08294255 1.14001634 3.29792127 0.73195648 1.34766419 -1.29008346
2.14243689 -0.91052722 -2.56618403 -0.10871541 -2.94942164 0.8296527
0.52906363 1.72234762 0.47726349 0.38223241 -1.42525341 -0.88165607
0.16920468 -0.60258708 1.36064972 -0.25423957 -0.18568209 -2.29655989
1.36165476 1.21798649 0.17978893 1.40157221 0.54140441 -1.15602939
0.12731688 1.71642263 0.76208638 0.61265948 0.61727227 -0.27165942
-1.36591625 0.36943787 -1.27815774 -0.41023464 1.66012597 1.58381415
-0.15379111 -1.99536391 -1.2228214 -1.15689281 0.34514969 0.14252237
1.35108987 -1.68321132].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Variable importance in projection

Is your feature request related to a problem? Please describe.
Variable importance in projection (VIP) is a useful metric for PLS models to help understand feature importance. I use the mbpls package a lot in my research and it would be great for there to be a multiblock VIP attribute implemented.

Describe the solution you'd like
Implementation of VIP as an attribute of the mbpls class, so that after model fitting VIP scores can be easily accessed. A definition of VIP can be found in Mehmood et al 2012 (https://doi.org/10.1016/j.chemolab.2012.07.010.) This definition is for standard (single-block) PLS rather than multi-block, however VIP should technically be extensible to MB-PLS by using the superscores.

Describe alternatives you've considered
I have attached python code of the function for MB-PLS VIP which I have implemented myself. It uses attributes from the mbpls class (weights, scores etc) to calculate VIP. But it would be great if this could be implemented in the main package.

import numpy as np

def VIP_multiBlock(x_weights, x_superscores, x_loadings, y_loadings):
    # stack the weights from all blocks 
    weights = np.vstack(x_weights)
    # normalise the weights
    weights_norm = weights / np.sqrt(np.sum(weights**2, axis=0))
    # calculate product of sum of squares of superscores and y loadings
    sumsquares = np.sum(x_superscores**2, axis=0) * np.sum(y_loadings**2, axis=0)
    # p = number of variables - stack the loadings from all blocks
    p = np.vstack(x_loadings).shape[0]
    
    # VIP is a weighted sum of squares of PLS weights 
    vip_scores = np.sqrt(p * np.sum(sumsquares*(weights_norm**2), axis=1) / np.sum(sumsquares))
    return vip_scores```

[JOSS review]: reference DOI

References should include DOI where possible.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.