Comments (9)
Hi, sorry for the delay in getting back to you!
The score of the original protein sequence (i.e., the wild type sequence specified via the wt_fasta
or wt_protein
arguments of the DirectedEvolution sampler class), is stored in this wt_score
attribute within each expert. Each expert uses this wt_score
to compute the relative score of a variant with respect to the wild type.
As to getting importance scores of each variant, the DirectedEvolution sampler will return both the list of variants and their corresponding scores as a tuple. You can see in the demo notebook--when the output
argument is set to "all"
, the scores
tensor will have shape [parallel_chains, steps]
, and it's up to you to decide whether to grab the last score for each variant (scores[:,-1]
) or the best, etc.
from evoprotgrad.
It is not clear. Specifically, from the code
variants, scores = evo_prot_grad.DirectedEvolution(
wt_protein = wildtype_sequence,
output = 'best', # return best, last, all variants
experts = [expert], # list of experts to compose
parallel_chains = 2, # number of parallel chains to run
n_steps = 100, # number of MCMC steps per chain
max_mutations = -1, # maximum number of mutations per variant
preserved_regions = None, # List of regions (start,end) to preserve
verbose = False # print debug info to command line
)()
wtseq = ' '.join(wildtype_sequence.strip())
for v,s in zip(variants,scores):
evo_prot_grad.common.utils.print_variant_in_color(v, wtseq)
print(s)
if I set output = 'all', then I will get the original sequence with score along with variant right?
from evoprotgrad.
No, scores
will only contain a score for each variant, even if output
is set to all
. Here, all
refers to returning the intermediate scores of the variants at each sampling step. In this example, scores would have shape [2,100]
since parallel_chains = 2
and n_steps = 100
.
If having the wildtype sequence's score returned alongside the scores of each variant is useful, I can add that.
from evoprotgrad.
Hi,
Yes it would be helpful if the score of the original sequence can be determined. I did not understand scores would have shape [2,100]. I see the score in float number format. parallel_chains = 2 defines top two best variants based on score right. Would you please clarify?
Also how was the score computed? Are you taking embedding: let us say using ESM-2 model you are computing the embedding of original sequence, and its variants . Next, we are computing the cosine similarity?
from evoprotgrad.
I think it could help to spend a little time reading the documentation about what scores
are in EvoProtGrad and how they are estimated: https://nrel.github.io/EvoProtGrad/getting_started/experts/#what-is-a-product-of-experts ! The score in EvoProtGrad is an unnormalized log probability. However, in practice we subtract the wild type sequence log prob from the variant log prob, so the score actually is a difference between log probs.
The shape of the scores
tensor will vary depending on what you set the argument output
to. If output = best
or output = last
, that means for each of the parallel_chains
Markov chains, either the best/last (respectively) variants will be returned. Hence, scores
has shape [parallel_chains]
. When output = all
, this means every variant produced by each Markov chain at each step 1..n_steps
will be returned, hence scores
has shape [parallel_chains, n_steps]
. This is useful when entire distributions of "good" variants are desired instead of just point estimates of "good" variants.
from evoprotgrad.
Thanks. EvoProtGrad is really interesting. I am working on kinase domain sequences ( https://huggingface.co/datasets/waylandy/phosformer_curated/raw/main/curated/phosphosites_11mer_kinase_specific.tsv). EvoProtGrad might be interesting tool to get the variants of a kinase sequence for analysis.
from evoprotgrad.
Hi one more query: Can EvoProtGrad be used to detection significant connection between two protein sequences? Let us say, I have protein 1 and protein 2 two sequences. Now using EvoProtGrad I got the top 3 variants of protein1 and top 3 variants of protein 2. Then compute the similarity scores of the variants is it possible get the relational significance of the protein 1 and protein 2.
from evoprotgrad.
Hi ,
I see if parallel_chains = 5, then I see the 5 variants and the corresponding score. Higher the score means more closer to the original sequence?
from evoprotgrad.
Accessing a particular expert's score for a variant sequence is now easier in v0.2 https://github.com/NREL/EvoProtGrad/releases/tag/v0.2. You can now call get_model_output
with an expert to get this particular expert's score https://nrel.github.io/EvoProtGrad/api/experts/.
from evoprotgrad.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evoprotgrad.