Hi team, This could be somewhat related to my previous issue but it may be easier

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

genes to include for query adata about scarches HOT 6 CLOSED

theislab commented on May 24, 2024

genes to include for query adata

from scarches.

Comments (6)

M0hammadL commented on May 24, 2024

Hi @notimenocall. The current ecosystem requires gene names to be the same. Thus, the query genes should be subsetted to the ones in reference otherwise set zero for the ones which are not available. For example, if you x_ref = [g1, g2, g3] and your query has x_qeury = [g1, g3] then the x_qeury should change to x_qeury = [g1, 0, g3] . More importantly, x_qeury can not include any gene name which is not included in the reference data.

from scarches.

notimenocall commented on May 24, 2024

Thanks for your reply and clarification. This is what I have been doing now.
I am just thinking will this create bias, since the hvg are not selected based on the query data sets.

from scarches.

M0hammadL commented on May 24, 2024

yes, you are right. This might induce a bias. However, the assumption is the reference data is so rich and it hast most of the signal which is needed for that organ. Incorporating new genes that were not included in reference might not make that much sense if we never had that gene and will be hard if we had this gene and the gene did not make it to final training data after feature selection. I will think about it and see if we can find an intermediate solution in later updates.

from scarches.

notimenocall commented on May 24, 2024

It makes sense. Alternatively, will it help or hurt to include more hvg genes (> 5k?) when training the reference? Thank you.

from scarches.

M0hammadL commented on May 24, 2024

@notimenocall it depends on the organ or the complexity of the system you are working with, I would start with 2k and then go to 5k. For multi-tissue/species analysis higher genes are needed.

from scarches.

notimenocall commented on May 24, 2024

Good point. Thank you.

from scarches.

Recommend Projects

genes to include for query adata about scarches HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent