mespadoto / proj-quant-eval Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 3.0 411.22 MB

Python 14.66% MATLAB 65.17% C 2.97% C++ 3.65% Batchfile 0.01% Shell 0.05% CoffeeScript 6.55% CSS 4.24% HTML 2.71%

proj-quant-eval's People

Contributors

Stargazers

Watchers

Forkers

tianzonglin zhouyu603568 xiangfasong

proj-quant-eval's Issues

Can't reproduce the metrics results

I could not get the whole project working, and so I took parts of it to measure my projections. I downloaded the .npy files for the datasets and used the same PCA from scikit-learn. But my results differ from yours for normalized stress and shepard goodness. I looked in your code for hours to find the mistake, but I can not find any. I double-checked everything. The projection is the same, the dataset, the projected result and the code for the metrics are the same, but I get different results. How is this possible? Mathematically, should the results be the same, but they aren't.
Did you use the Datasets uploaded to the Website? Is the data changed somewhere, or the results?

def compute_distance_list(X):
    return spatial.distance.pdist(X, 'euclidean')

def compute_distance_matrix(X):
    D = spatial.distance.pdist(X, 'euclidean')
    return spatial.distance.squareform(D)

def metric_pq_shepard_diagram_correlation(X_high, X_low):
    D_high = compute_distance_list(X_high)
    D_low = compute_distance_list(X_low)
    return stats.spearmanr(D_high, D_low)[0]

I do exactly the same above, but get for the “bank” dataset 0.5329917367864474 and not 0,766496244905907 as you. Or for the “cifar10” I get 0.7688356643971043 and not your 0,884418301185305.
Did I miss something?

from sklearn.decomposition import PCA
pca = PCA(n_components=2, random_state=42)
result = pca.fit_transform(data)

This is my PCA

data = np.load(dataRootPath + dataName + '/X.npy', mmap_mode='c')
label = np.load(dataRootPath + dataName + '/y.npy', mmap_mode='c')

This how I load the data.

Using metrics.py

Greetings,

I would like to use a fragment of your code on metrics.py.

I have the X and y datasets saved on my local space manually and to use it on my code, I use a panda.read_csv(). I wanted to use the quality measures functions and I was wondering if I could directly use the X.csv and y.csv from this link: https://mespadoto.github.io/proj-quant-eval/post/datasets/
I did not use the get_datasets.py to fetch them from the given link.

When i simply tried this:

bank_X = df= pd.read_csv('datasets/bank/X.csv')
bank_y = df= pd.read_csv('datasets/bank/y.csv')

test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y)

It gave me this error:

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:179: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
return self._fit(X, y)
Traceback (most recent call last):
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 203, in
test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y)
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 137, in metric_dc_neighborhood_hit_k_03
return metric_neighborhood_hit(X, y, 3)
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 53, in metric_neighborhood_hit
return np.mean(np.mean((y[neighbors] == np.tile(y.reshape((-1, 1)), k)).astype('uint8'), axis=1))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([ (0, 1221, 1745), (1, 1492, 642), (2, 139, 706),\n (3, 2057, 130), (4, 1219, 1470), (5, 1419, 20),\n (6, 1986, 1758), (7, 42, 1325), (8, 932, 328),\n (9, 759, 523),\n ...\n (2048, 138, 1941), (2049, 1040, 617), (2050, 1262, 1667),\n (2051, 1430, 1600), (2052, 1125, 2013), (2053, 712, 724),\n (2054, 1636, 1624), (2055, 1412, 546), (2056, 749, 169),\n (2057, 1817, 3)],\n dtype='object', length=2058)] are in the [columns]"
/usr/local/bin/python3 /Users/--/Desktop/Hilbert_Projection_Copy/from_git.py
(base) --@---MacBook-Pro Hilbert_Projection_Copy % /usr/local/bin/python3 /Users/--/Desktop/Hilbert_Projection_Copy/from_git.py
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:179: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
return self._fit(X, y)
Traceback (most recent call last):
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 203, in
test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y)
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 137, in metric_dc_neighborhood_hit_k_03
return metric_neighborhood_hit(X, y, 3)
File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 53, in metric_neighborhood_hit
return np.mean(np.mean((y[neighbors] == np.tile(y.reshape((-1, 1)), k)).astype('uint8'), axis=1))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([ (0, 1221, 1745), (1, 1492, 642), (2, 139, 706),\n (3, 2057, 130), (4, 1219, 1470), (5, 1419, 20),\n (6, 1986, 1758), (7, 42, 1325), (8, 932, 328),\n (9, 759, 523),\n ...\n (2048, 138, 1941), (2049, 1040, 617), (2050, 1262, 1667),\n (2051, 1430, 1600), (2052, 1125, 2013), (2053, 712, 724),\n (2054, 1636, 1624), (2055, 1412, 546), (2056, 749, 169),\n (2057, 1817, 3)],\n dtype='object', length=2058)] are in the [columns]"

I would like to know how I can validate my dataset so I am able to use it against your functions.

I would also like to know what id_run variable is.

Thank you in advance.

test scripts not working

Dear Mateus,

thank you for making your code available. Two days ago I enjoyed a talk from your supervisor Alex Telea. He mentioned the code used to test different dimension reduction algorithms was publicly available and referred to this GitHub. I am interested to use your code on my microbiome (bacterial community) data to see how different algorithms perform.
Although your installation instructions are very clear, I was not able to get either test_wrappers.py or test_projections.py to successfully complete.

First, I used the following lines of code to install the software on our HPC.

mamba create --name dim-red -c conda-forge python=3.6 shogun=6.1.3 octave=4.2.1
conda activate dim-red
mamba install -c bioconda java-jdk=8.0
pip install umap-learn wget keras numpy pandas scikit-learn # also tried a mamba-install; same issues
# installation Multicore-TSNE as described
pip install tensorflow # based on a message after running test_projections.py

I combined installation of shogun and Octave in one call to prevent a clash over 'openssl'. This seems to work.

Upon running python test_wrappers.py I receive many (many!) errors generated by Java. Please see the terminal output in
output_test_wrappers.txt. I am not knowledgeable enough to even know where to start troubleshooting. Any advice?

python test_projections.py gives me:

2020-11-18 10:10:26.852618: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /hpc/local/CentOS7/dla_mm/lib
2020-11-18 10:10:26.852723: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "test_projections.py", line 13, in <module>
    import projections
  File "/hpc/dla_mm/bogaert/wsteenhu/dim-red/code/01_data_collection/projections.py", line 14, in <module>
    import ae
  File "/hpc/dla_mm/bogaert/wsteenhu/dim-red/code/01_data_collection/ae.py", line 20, in <module>
    tf.set_random_seed(42)
AttributeError: module 'tensorflow' has no attribute 'set_random_seed'

Again, any advice on how to get this to work?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.