I was just wondering is it possible to get the importance score of the protein sequenc

is it possible to get the importance score of the protein sequence? about evoprotgrad HOT 9 CLOSED

anonimoustt commented on July 20, 2024

is it possible to get the importance score of the protein sequence?

from evoprotgrad.

Comments (9)

pemami4911 commented on July 20, 2024

Hi, sorry for the delay in getting back to you!

The score of the original protein sequence (i.e., the wild type sequence specified via the wt_fasta or wt_protein arguments of the DirectedEvolution sampler class), is stored in this wt_score attribute within each expert. Each expert uses this wt_score to compute the relative score of a variant with respect to the wild type.

As to getting importance scores of each variant, the DirectedEvolution sampler will return both the list of variants and their corresponding scores as a tuple. You can see in the demo notebook--when the output argument is set to "all", the scores tensor will have shape [parallel_chains, steps], and it's up to you to decide whether to grab the last score for each variant (scores[:,-1]) or the best, etc.

from evoprotgrad.

anonimoustt commented on July 20, 2024

It is not clear. Specifically, from the code
variants, scores = evo_prot_grad.DirectedEvolution(
wt_protein = wildtype_sequence,
output = 'best', # return best, last, all variants
experts = [expert], # list of experts to compose
parallel_chains = 2, # number of parallel chains to run
n_steps = 100, # number of MCMC steps per chain
max_mutations = -1, # maximum number of mutations per variant
preserved_regions = None, # List of regions (start,end) to preserve
verbose = False # print debug info to command line
)()

wtseq = ' '.join(wildtype_sequence.strip())

for v,s in zip(variants,scores):
evo_prot_grad.common.utils.print_variant_in_color(v, wtseq)
print(s)

if I set output = 'all', then I will get the original sequence with score along with variant right?

from evoprotgrad.

pemami4911 commented on July 20, 2024

No, scores will only contain a score for each variant, even if output is set to all. Here, all refers to returning the intermediate scores of the variants at each sampling step. In this example, scores would have shape [2,100] since parallel_chains = 2 and n_steps = 100.
If having the wildtype sequence's score returned alongside the scores of each variant is useful, I can add that.

from evoprotgrad.

anonimoustt commented on July 20, 2024

Hi,
Yes it would be helpful if the score of the original sequence can be determined. I did not understand scores would have shape [2,100]. I see the score in float number format. parallel_chains = 2 defines top two best variants based on score right. Would you please clarify?

Also how was the score computed? Are you taking embedding: let us say using ESM-2 model you are computing the embedding of original sequence, and its variants . Next, we are computing the cosine similarity?

from evoprotgrad.

pemami4911 commented on July 20, 2024

I think it could help to spend a little time reading the documentation about what scores are in EvoProtGrad and how they are estimated: https://nrel.github.io/EvoProtGrad/getting_started/experts/#what-is-a-product-of-experts ! The score in EvoProtGrad is an unnormalized log probability. However, in practice we subtract the wild type sequence log prob from the variant log prob, so the score actually is a difference between log probs.

The shape of the scores tensor will vary depending on what you set the argument output to. If output = best or output = last, that means for each of the parallel_chains Markov chains, either the best/last (respectively) variants will be returned. Hence, scores has shape [parallel_chains]. When output = all, this means every variant produced by each Markov chain at each step 1..n_steps will be returned, hence scores has shape [parallel_chains, n_steps]. This is useful when entire distributions of "good" variants are desired instead of just point estimates of "good" variants.

from evoprotgrad.

anonimoustt commented on July 20, 2024

Thanks. EvoProtGrad is really interesting. I am working on kinase domain sequences ( https://huggingface.co/datasets/waylandy/phosformer_curated/raw/main/curated/phosphosites_11mer_kinase_specific.tsv). EvoProtGrad might be interesting tool to get the variants of a kinase sequence for analysis.

from evoprotgrad.

anonimoustt commented on July 20, 2024

Hi one more query: Can EvoProtGrad be used to detection significant connection between two protein sequences? Let us say, I have protein 1 and protein 2 two sequences. Now using EvoProtGrad I got the top 3 variants of protein1 and top 3 variants of protein 2. Then compute the similarity scores of the variants is it possible get the relational significance of the protein 1 and protein 2.

from evoprotgrad.

anonimoustt commented on July 20, 2024

Hi ,

I see if parallel_chains = 5, then I see the 5 variants and the corresponding score. Higher the score means more closer to the original sequence?

from evoprotgrad.

pemami4911 commented on July 20, 2024

Accessing a particular expert's score for a variant sequence is now easier in v0.2 https://github.com/NREL/EvoProtGrad/releases/tag/v0.2. You can now call get_model_output with an expert to get this particular expert's score https://nrel.github.io/EvoProtGrad/api/experts/.

from evoprotgrad.

is it possible to get the importance score of the protein sequence? about evoprotgrad HOT 9 CLOSED

Comments (9)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent