Coder Social home page Coder Social logo

problems about scHPL training about scarches HOT 13 CLOSED

YawnC avatar YawnC commented on September 21, 2024
problems about scHPL training

from scarches.

Comments (13)

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

I think the error comes from the slight difference in input to the train_tree function. The train_tree function uses the X matrix as input instead of the AnnData object (so source_adata.X instead of source_adata).

Hope this helps! If this doesn't solve the problem, could you provide the complete trace-back and your input, so it's easier to debug?

from scarches.

YawnC avatar YawnC commented on September 21, 2024

Hi, thanks for replying.
For the previous problem I found it comes from my annotation. I made each annotation level under different clustering resolution, which means in lower hierarchy some cells may crossover and thereby belong to other clusters in higher hierarchy. This problem made the tree 'dirty'. After reannotation this problem got solved.
However here comes a new problem: when I continued to the next step, update the hierarchy with new dataset, an error appeared as the following
image
I tried different datasets, the same problem appeared.

from scarches.

YawnC avatar YawnC commented on September 21, 2024

And here is the full report, thank you again!
image
image

from scarches.

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

Hmm, interesting. The error is caused when the tree trained on data_2 (the query data) is used to predict the labels of data_1 (the reference data). When doing pca.transform(test_data) it seems that there a no cells in the testdata which causes an error. Is the query_latent the combined latent space of the reference and query? And if so, are the labels of reference exactly called 'reference'? You can have a look at this notebook to see an example of how to concatenate the reference and query data.

from scarches.

YawnC avatar YawnC commented on September 21, 2024

Yes I had walked through this notebook previously, and it worked well with such "one-level-annotation" (as shown below), with exact the same datasets.
image

However, when I switched it to "multi-level-annotation-tree" (shown as below, which follows this notebook https://github.com/lcmmichielsen/treeArches-reproducibility/blob/main/Figure2-HLCA%20healthy/Figure2%2C%20S9-S13.ipynb), this problem comes out.
image

from scarches.

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

What do you mean exactly with this problem? Is that related to the problem that you mentioned before about the zero samples in the test data? Or is that solved and is your problem related to the figure you attached now?

from scarches.

YawnC avatar YawnC commented on September 21, 2024

The problem "0 sample (0,30)" is about the "multi-level-annotation-tree", constructed as the picture here.
https://user-images.githubusercontent.com/118878017/270230240-2586f2dd-dd4e-4f54-bf13-fd5cf05231d8.png

What I tend to say is when I abandon the multi-hierarchy annotation structure above, using the lowest hierarchy instead, the model is trained perfectly, so I guess the problem may come from the hierarchy structure?

from scarches.

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

Good to know. Did you check these two things I mentioned earlier:

  • Is the query_latent the combined latent space of the reference and query?
  • And if so, are the labels of reference exactly called 'reference'?

from scarches.

YawnC avatar YawnC commented on September 21, 2024

Oh, actually not, the query_latent is only the query dataset, as I followed the GitHub reproducibility notebook. The reference label is 'reference' though.
I will try full_latent first and report you the result then. Thank you for your patient and generous help!

from scarches.

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

Okay, let me know whether this helps!

Btw, in codeblock 14 of the notebook you mentioned (https://github.com/lcmmichielsen/treeArches-reproducibility/blob/main/Figure2-HLCA%20healthy/Figure2%2C%20S9-S13.ipynb), we also merge the reference (LCA) and query (emb_M) into one object before updating the hierarchy. So there you could see another example of how you could implement it for your dataset.

from scarches.

YawnC avatar YawnC commented on September 21, 2024

Hi, here is my issue updating: I moved the jupyter file into vscode, and picked out the package learn.py as a subprocess, here is the debugging result:
image
the variable data_1, data_2 and trees seem fine, but the problem is still there:
image

I know it may be complex to figure out what is going on inside it as the dataset varies, so if it is too bothering just ignore my issue and close it. Thank you again!

from scarches.

lcmmichielsen avatar lcmmichielsen commented on September 21, 2024

Hmm this is weird. Now your code also crashes at another spot right? It used to be at labels_1_pred = predict_labels(data_1, tree_2, threshold=rej_threshold), but now it's a step earlier during tree = train_tree(data_1, labels_1, tree, classifier, dimred, useRE, FN, n_neighbors, dynamic_neighbors, distkNN), right?

Do the labels you input to the learn_tree function still correspond to the labels that were already in the hierarchy?

It's quite difficult to debug, so without proper error traceback for this new problem and your input code, I am afraid I cannot help you.

from scarches.

YawnC avatar YawnC commented on September 21, 2024

Thank you again! I will use the demo dataset instead.

from scarches.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.