dtucomputestatisticsanddataanalysis / mbpls Goto Github PK
View Code? Open in Web Editor NEW(Multiblock) Partial Least Squares Regression for Python
Home Page: https://mbpls.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
(Multiblock) Partial Least Squares Regression for Python
Home Page: https://mbpls.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
The manuscript already includes hints of why the software is needed, but I believe that the JOSS review criteria require an explicit statement of need.
From the JOSS review criteria: "Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support?". Please add these items to your documentation. Consider using a Issue template and adding a CONTRIBUTING.md file.
References should include DOI where possible.
The compatibility with scikit learn should be confirmed by adding a test that uses check_estimator
I think something like this should work:
def test_sklearn_compatibility():
from sklearn.utils.estimator_checks import check_estimator
from mbpls.mbpls import MBPLS
check_estimator(MBPLS)
I am attempting to run through the very simple MBPLS example from the Docs page. I'm running the code exactly as shown below and getting an error in the mbpls.plot() function. Not sure if this might be an issue of compatibility with the version of a dependency? Thanks!
`import numpy as np
from mbpls.mbpls import MBPLS
num_samples = 40
num_features_x1 = 200
num_features_x2 = 250
x1 = np.random.rand(num_samples, num_features_x1)
x2 = np.random.rand(num_samples, num_features_x2)
y = np.random.rand(num_samples, 1)
mbpls = MBPLS(n_components=3)
mbpls.fit([x1, x2],y)
y_pred = mbpls.predict([x1, x2])
mbpls.plot(num_components=3)
Environment (please complete the following information):
Additional information
Error message is as follows:
ValueError Traceback (most recent call last)
Cell In[41], line 2
1 # Use built-in plot method for exploratory analysis of multiblock pls models
----> 2 mbpls.plot(num_components=3)
File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/mbpls/mbpls.py:1483, in MBPLS.plot(self, num_components)
1480 for block in range(self.num_blocks_):
1481 # Inverse transforming weights/loadings
1482 if self.standardize:
-> 1483 P_inv_trans.append(self.x_scalers_[block].inverse_transform(self.P_[block][:, comp]))
1484 else:
1485 P_inv_trans.append(self.P_[block][:, comp])
File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/preprocessing/_data.py:1034, in StandardScaler.inverse_transform(self, X, copy)
1031 check_is_fitted(self)
1033 copy = copy if copy is not None else self.copy
-> 1034 X = check_array(
1035 X,
1036 accept_sparse="csr",
1037 copy=copy,
1038 dtype=FLOAT_DTYPES,
1039 force_all_finite="allow-nan",
1040 )
1042 if sparse.issparse(X):
1043 if self.with_mean:
File ~/mambaforge/envs/py310_env/lib/python3.10/site-packages/sklearn/utils/validation.py:902, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
900 # If input is 1D raise error
901 if array.ndim == 1:
--> 902 raise ValueError(
903 "Expected 2D array, got 1D array instead:\narray={}.\n"
904 "Reshape your data either using array.reshape(-1, 1) if "
905 "your data has a single feature or array.reshape(1, -1) "
906 "if it contains a single sample.".format(array)
907 )
909 if dtype_numeric and array.dtype.kind in "USV":
910 raise ValueError(
911 "dtype='numeric' is not compatible with arrays of bytes/strings."
912 "Convert your data to numeric values explicitly instead."
913 )
ValueError: Expected 2D array, got 1D array instead:
array=[ 0.16242773 -0.90119075 0.05693651 -1.39487254 0.52740615 1.53864419
0.19384363 -1.04456622 1.33753475 -0.12001207 -2.63750884 -1.25099171
0.46053789 1.46558114 -0.57801252 0.52272024 0.37772636 -2.30847845
0.93480207 0.44214023 -0.44434544 -0.08997732 -0.1186683 2.01285023
-0.98363685 0.16022649 -1.16944362 1.72728776 1.34994501 2.50601505
-0.23280568 -2.02817317 -0.7787971 -0.36583488 0.52913013 -1.31727876
-0.66529331 -0.14586002 0.12121093 1.10357558 0.45660313 -2.30975725
0.88526315 -0.15688676 0.36485809 -0.34201796 -0.38742152 0.03129468
-1.45172671 -0.44566978 1.06682479 -0.20197725 -0.19454723 0.39258781
0.90435152 -0.63915512 1.58018959 1.8360178 0.03612573 -0.10075124
1.57287608 -2.16566173 1.94442864 -0.63692063 -0.99461265 0.93788319
-0.38755028 0.26529161 -1.75552239 -0.44738291 1.17226965 0.77052395
-0.4260226 0.18251133 0.89851425 -0.23944746 0.04028808 0.44832614
-0.54714231 -1.13502509 0.37087471 -1.25212905 -0.95237723 -0.07714901
-0.63079535 -1.3229635 2.21617018 0.27867475 -0.77521484 0.38658397
2.00861021 -0.12747727 -0.95342378 -1.6529634 -0.25206374 -0.16668305
-0.21098895 -1.35220926 -0.77684738 -0.91144111 -1.50654517 0.05666307
-0.39321412 -0.3387659 -0.61008176 0.81017022 -0.45387348 1.51662983
-0.05679999 -0.57791232 -0.91545596 0.16725033 -0.85116323 -0.76413108
0.3340035 -0.32600628 0.53474856 -0.51572487 1.21295054 0.19976358
0.33056191 0.62291484 -0.15561658 -1.12258708 1.74775337 -0.58084756
0.38960204 1.24972012 1.0377948 1.91234132 1.27835914 0.50352515
-0.94301361 0.37051374 -0.6044645 1.15750204 -0.36847713 -0.39267774
0.85437988 -1.19265517 -0.7386583 -0.285102 0.7977028 0.17117149
-2.41819117 1.65545797 -1.82631145 -0.90465321 0.42059891 0.04521615
0.08294255 1.14001634 3.29792127 0.73195648 1.34766419 -1.29008346
2.14243689 -0.91052722 -2.56618403 -0.10871541 -2.94942164 0.8296527
0.52906363 1.72234762 0.47726349 0.38223241 -1.42525341 -0.88165607
0.16920468 -0.60258708 1.36064972 -0.25423957 -0.18568209 -2.29655989
1.36165476 1.21798649 0.17978893 1.40157221 0.54140441 -1.15602939
0.12731688 1.71642263 0.76208638 0.61265948 0.61727227 -0.27165942
-1.36591625 0.36943787 -1.27815774 -0.41023464 1.66012597 1.58381415
-0.15379111 -1.99536391 -1.2228214 -1.15689281 0.34514969 0.14252237
1.35108987 -1.68321132].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Is your feature request related to a problem? Please describe.
Variable importance in projection (VIP) is a useful metric for PLS models to help understand feature importance. I use the mbpls package a lot in my research and it would be great for there to be a multiblock VIP attribute implemented.
Describe the solution you'd like
Implementation of VIP as an attribute of the mbpls class, so that after model fitting VIP scores can be easily accessed. A definition of VIP can be found in Mehmood et al 2012 (https://doi.org/10.1016/j.chemolab.2012.07.010.) This definition is for standard (single-block) PLS rather than multi-block, however VIP should technically be extensible to MB-PLS by using the superscores.
Describe alternatives you've considered
I have attached python code of the function for MB-PLS VIP which I have implemented myself. It uses attributes from the mbpls class (weights, scores etc) to calculate VIP. But it would be great if this could be implemented in the main package.
import numpy as np
def VIP_multiBlock(x_weights, x_superscores, x_loadings, y_loadings):
# stack the weights from all blocks
weights = np.vstack(x_weights)
# normalise the weights
weights_norm = weights / np.sqrt(np.sum(weights**2, axis=0))
# calculate product of sum of squares of superscores and y loadings
sumsquares = np.sum(x_superscores**2, axis=0) * np.sum(y_loadings**2, axis=0)
# p = number of variables - stack the loadings from all blocks
p = np.vstack(x_loadings).shape[0]
# VIP is a weighted sum of squares of PLS weights
vip_scores = np.sqrt(p * np.sum(sumsquares*(weights_norm**2), axis=1) / np.sum(sumsquares))
return vip_scores```
In the line 855 of the MultiBlockPLS implementation in mbpls.py .
You use: diff_t = np.sum(superscores_old - superscores)
While actually the Euclidean Metric should be used as far as I understand.
with: diff_t = np.sum((superscores_old - superscores)**2)
or diff_t = np.sum((superscores_old - superscores)**2)**0.5
Could you clarify your use of this "metric" for the vectors please, in case I am mistaken?
Github allows you to post a website link at the top of the repo page. I would recommend posting the link to the online docs there. At the very least, I would move it from the end of the readme to a more prominent location (e.g., at the top of the README).
I would have expected to find a file called test_mbpls
with tests for the package. The file-name test_Installation.py
led me to initially think that this contains something that tests the installation process. Definitely not a requirement for the JOSS review (I will mark required changes with "[JOSS review]"), but I would rename that file to test_mbpls.py
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.