Coder Social home page Coder Social logo

hyfa's People

Contributors

chaitjo avatar gamazonlab avatar rvinas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hyfa's Issues

Questions about the normalized RNA-seq data processing

Hi, I have a couple of questions about the Normalized bulk transcriptomics according to your paper and GTEx pipeline, as below:

  1. Discard under-represented tissues (n = 5), namely bladder,
    cervix (ectocervix, endocervix), fallopian tube and kidney
    (medulla).
  2. Select set of overlapping protein-coding genes across all
    tissues.
  3. Discard donors with only one collected tissue (n = 4).
  4. Select genes on the basis of expression thresholds of ≥0.1
    transcripts per kilobase million in ≥20% of samples and ≥6 reads
    (unnormalized) in ≥20% of samples.
  5. Normalize read counts across samples using the trimmed mean
    of M values method.
  6. Apply inverse normal transformation to the expression values
    for each gene.

I would greatly appreciate any clarification you could provide on these matters:

  1. In step 1, I want to know the minimum sample size of each tissue that pass the filtering of "under-represented tissues".
  2. In step 2, Does that mean we should select protein-coding genes that pass the filtering of step 3 and step 4 in all tissues?
  3. I wonder if we should filter genes and normalize expressions separately for each tissue and then merge them together, or if it is more appropriate to merge all the tissue samples first and then conduct gene filtering and expression normalization on the merged dataset.
  4. Is it acceptable for the dataset to contain overlapped tissue samples. For example, both Brain and Cerebral cortex expression data from the same individual are included.

Warm regards,
Mian

AssertionError when running "assert np.allclose(y_test_, y_test)"

Hi, when running the codes of compare to different baselines in evaluate_GTEx_v8_normalised.ipynb using my own data, I found an AssertionError in line 142 a below:

AssertionError Traceback (most recent call last)
Cell In[32], line 142
140 y_test_pred = out['px_rate'].cpu().numpy() # torch.distributions.normal.Normal(loc=out['px_rate'], scale=out['px_r']).mean.cpu().numpy()
141 y_test_ = d.x_target.cpu().numpy()
--> 142 assert np.allclose(y_test_, y_test)
144 sample_scores = score_fn(y_test, y_test_pred, sample_corr=sample_corr)
146 # Append results

AssertionError:

The target array of the aux_test_dataset is not consistent with the target array of the corresponding HypergraphDataset after we converted the aux_test_dataset into the HypergraphDataset.

After comparing d.target_dynamic['Participant ID'] and aux_test_dataset.adata_target.obs['Participant ID'], and also their expression arrays, I found the order of "Participand ID" and their expression data has changed. It seems that d.target_dynamic['Participant ID'] is ordered numerically and alphabetically instead of in the same order as aux_test_dataset.adata_target.obs['Participant ID']. For example:
print(aux_test_dataset.adata_target.obs['Participant ID'].values)
I got:

['GS12' 'GW133' 'GZ137' 'GW142' 'CT146' 'LBJ18' 'XQN39' 'SLG43' 'QG44' 'XQN75' 'ZGJ176' 'XN9063' 'PZ140']

After HypergraphDataset convertion and DataLoader:
aux_test_dataset = HypergraphDataset(adata[test_mask], obs_source={'Tissue': source_tissues}, obs_target={'Tissue': [tt]})'
aux_test_loader = DataLoader(aux_test_dataset, batch_size=len(aux_test_dataset),collate_fn=collate_fn, shuffle=False)
d = next(iter(aux_test_loader))
print(d.target_dynamic['Participant ID'])
The result changed into:

['CT146' 'GS12' 'GW133' 'GW142' 'GZ137' 'LBJ18' 'PZ140' 'QG44' 'SLG43' 'XN9063' 'XQN39' 'XQN75' 'ZGJ176']

The same is true of expression matrices (i.e. aux_test_dataset.adata_target.layers['x'].toarray() and d.x_target.cpu().numpy()).

How could I fix this error?

Thanks in advance!

Mian

Installation Problem

I'm having trouble installing packages due to version conflicts. Can you suggest the right Python version for compatibility with the required packages?

The difference between testing dataset and validation dataset

Hi, I found the GTEx bulk RNA-seq donors were divided into three parts (training, validation, and testing donors). I can grasp the purposes of the training and validation subsets in relation to the Hypergraph model's training and accuracy validation respectively, but I cannot fully comprehend the role of the testing dataset.
Could anyone elaborate on the specific purpose of the testing dataset and how it differs from the validation dataset? Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

Thanks in advance!
Mian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.