Coder Social home page Coder Social logo

Comments (9)

pemami4911 avatar pemami4911 commented on July 20, 2024

Hi, sorry for the delay in getting back to you!

The score of the original protein sequence (i.e., the wild type sequence specified via the wt_fasta or wt_protein arguments of the DirectedEvolution sampler class), is stored in this wt_score attribute within each expert. Each expert uses this wt_score to compute the relative score of a variant with respect to the wild type.

As to getting importance scores of each variant, the DirectedEvolution sampler will return both the list of variants and their corresponding scores as a tuple. You can see in the demo notebook--when the output argument is set to "all", the scores tensor will have shape [parallel_chains, steps], and it's up to you to decide whether to grab the last score for each variant (scores[:,-1]) or the best, etc.

from evoprotgrad.

anonimoustt avatar anonimoustt commented on July 20, 2024

It is not clear. Specifically, from the code
variants, scores = evo_prot_grad.DirectedEvolution(
wt_protein = wildtype_sequence,
output = 'best', # return best, last, all variants
experts = [expert], # list of experts to compose
parallel_chains = 2, # number of parallel chains to run
n_steps = 100, # number of MCMC steps per chain
max_mutations = -1, # maximum number of mutations per variant
preserved_regions = None, # List of regions (start,end) to preserve
verbose = False # print debug info to command line
)()

wtseq = ' '.join(wildtype_sequence.strip())

for v,s in zip(variants,scores):
evo_prot_grad.common.utils.print_variant_in_color(v, wtseq)
print(s)

if I set output = 'all', then I will get the original sequence with score along with variant right?

from evoprotgrad.

pemami4911 avatar pemami4911 commented on July 20, 2024

No, scores will only contain a score for each variant, even if output is set to all. Here, all refers to returning the intermediate scores of the variants at each sampling step. In this example, scores would have shape [2,100] since parallel_chains = 2 and n_steps = 100.
If having the wildtype sequence's score returned alongside the scores of each variant is useful, I can add that.

from evoprotgrad.

anonimoustt avatar anonimoustt commented on July 20, 2024

Hi,
Yes it would be helpful if the score of the original sequence can be determined. I did not understand scores would have shape [2,100]. I see the score in float number format. parallel_chains = 2 defines top two best variants based on score right. Would you please clarify?

Also how was the score computed? Are you taking embedding: let us say using ESM-2 model you are computing the embedding of original sequence, and its variants . Next, we are computing the cosine similarity?

from evoprotgrad.

pemami4911 avatar pemami4911 commented on July 20, 2024

I think it could help to spend a little time reading the documentation about what scores are in EvoProtGrad and how they are estimated: https://nrel.github.io/EvoProtGrad/getting_started/experts/#what-is-a-product-of-experts ! The score in EvoProtGrad is an unnormalized log probability. However, in practice we subtract the wild type sequence log prob from the variant log prob, so the score actually is a difference between log probs.

The shape of the scores tensor will vary depending on what you set the argument output to. If output = best or output = last, that means for each of the parallel_chains Markov chains, either the best/last (respectively) variants will be returned. Hence, scores has shape [parallel_chains]. When output = all, this means every variant produced by each Markov chain at each step 1..n_steps will be returned, hence scores has shape [parallel_chains, n_steps]. This is useful when entire distributions of "good" variants are desired instead of just point estimates of "good" variants.

from evoprotgrad.

anonimoustt avatar anonimoustt commented on July 20, 2024

Thanks. EvoProtGrad is really interesting. I am working on kinase domain sequences ( https://huggingface.co/datasets/waylandy/phosformer_curated/raw/main/curated/phosphosites_11mer_kinase_specific.tsv). EvoProtGrad might be interesting tool to get the variants of a kinase sequence for analysis.

from evoprotgrad.

anonimoustt avatar anonimoustt commented on July 20, 2024

Hi one more query: Can EvoProtGrad be used to detection significant connection between two protein sequences? Let us say, I have protein 1 and protein 2 two sequences. Now using EvoProtGrad I got the top 3 variants of protein1 and top 3 variants of protein 2. Then compute the similarity scores of the variants is it possible get the relational significance of the protein 1 and protein 2.

from evoprotgrad.

anonimoustt avatar anonimoustt commented on July 20, 2024

Hi ,

I see if parallel_chains = 5, then I see the 5 variants and the corresponding score. Higher the score means more closer to the original sequence?

from evoprotgrad.

pemami4911 avatar pemami4911 commented on July 20, 2024

Accessing a particular expert's score for a variant sequence is now easier in v0.2 https://github.com/NREL/EvoProtGrad/releases/tag/v0.2. You can now call get_model_output with an expert to get this particular expert's score https://nrel.github.io/EvoProtGrad/api/experts/.

from evoprotgrad.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.