Coder Social home page Coder Social logo

Comments (11)

spond avatar spond commented on August 22, 2024

Dear @glitterseaworms,

Trust neither. There is a serious issue with the alignment itself: probably lack of homology. You can see this because many branch lengths in the file are infinite (the total tree length is > 10,000 subs/site!).

image

With such alignments, where sequences look so diverged as to be seemingly unrelated, all inference becomes very unstable and unreliable.

Would you share your input alignment here so that I could take a closer look?

Best,
Sergei

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Dang, my data is really giving me a hard time. This is such a bummer. I think my worms in general are just very divergent. What is a tree length subs/site value cut off that you would suggest is reasonable to trust results at?
I am having trouble attaching the alignments here, as it says file type not supported--can I please email them to you?
Thank you for your dedication and time.

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Hi Sergei, I think I may have solved the issue. It looks like if I reduce my tree and alignments to the family taxon level as opposed to representatives from the whole order that I had included, this alleviates the subs/site issue. For ATP6 gene for instance, I am now receiving a subs/site value of 9.956 as opposed to 7207.802 with the larger dataset. I will keep you posted as tests finish, but I think I will proceed by re-running everything with the reduced trees.

from hyphy.

spond avatar spond commented on August 22, 2024

Dear @glitterseaworms,

Thanks for the update! Could you attach the reduced ATP6 (as a .txt) file? I'll take a quick look to make sure nothing looks suspicious.

I took a look at the full (56 sequences) ATP6 file, and if you use raxml-ng to infer the ML tree, its total length is ~12 (~21 with codon models), which is not unreasonable. What have you been using to build the trees?

Here's what BUSTED-E on it looks like. I typically use it as a screen to look for oddities...

 hyphy busted --alignment ATP6_codon_aligned.fas --tree ATP6_codon_aligned.fas.raxml.bestTree --code Invertebrate-mtDNA --intermediate-fits ATP6.fit --error-sink Yes --output ATP6.BUSTED-E.json

....

* Log(L) = -22981.63, AIC-c = 46236.08 (135 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.001     |   67.171    |                                   |
|        Negative selection         |     0.018     |   32.479    |                                   |
|         Neutral evolution         |     1.000     |    0.257    |                                   |
|         Error absorption          |    100.000    |    0.094    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.079               |    53.394     |                                   |
|               0.725               |    24.993     |                                   |
|               3.593               |    21.613     |                                   |


### No evidence for episodic diversifying positive selection under the unconstrained model, skipping constrained model fitting
----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.5000**.

Use https://observablehq.com/@spond/busted to visualize the resulting JSON

image image image

Looks like you have a couple of species that are highly diverged, possible sequencing/assembly errors because there are things like 3x substitutions along a single branch (like the red branch in the tree), and runs of sequences where there are may be some homology issues (or they are just too diverged).

Best,
Sergei

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Hi Sergei,

Thanks so much for responding very thoroughly. I truly appreciate you and will be acknowledging you in my manuscript :-) This means a lot and has helped me better understand results and how to properly work with my dataset.
I use RAxML to build to input trees. Dang, not sure what is going on with the red branch, but that species sequence should hypothetically be correct. I do think my clade of interest is very divergent in general. I have attached the reduced ATP6 alignment. Please let me know if anything seems strange with it. All genes seem to run properly with the reduced dataset, so I am going to re-run all 13 genes with aBSREL, Busted and Meme using this reduced dataset instead.
Looking forward to hearing your thoughts on my alignment, and truly, thanks again.

Best,
Avery
ATP6_reduced_codon_aligned.txt

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Hello again,

I just wanted to follow up regarding the issue of different results when running replicates. I just finished running 4 replicates of aBSREL with my reduced data set. All subs/site values are values that can be trusted. However, for 6 of the 13 genes I am testing, I obtain variation in results across the replicates. I've attached the output files for these genes. If you have the ability, can you please see if this is normal? If so, my plan was to only report results that are the same across the 4 replicates, and dismiss other branches that show up.

Thanks again,
Avery
ATP6_aBSREL_Deep_rep0_output.txt
ATP6_aBSREL_Deep_rep1_output.txt
ATP6_aBSREL_Deep_rep2_output.txt
ATP6_aBSREL_Deep_rep3_output.txt
COX2_aBSREL_Deep_rep0_output.txt
COX2_aBSREL_Deep_rep1_output.txt
COX2_aBSREL_Deep_rep2_output.txt
COX2_aBSREL_Deep_rep3_output.txt
ND3_aBSREL_Deep_rep0_output.txt
ND3_aBSREL_Deep_rep1_output.txt
ND3_aBSREL_Deep_rep2_output.txt
ND3_aBSREL_Deep_rep3_output.txt
ND4_aBSREL_Deep_rep0_output.txt
ND4_aBSREL_Deep_rep1_output.txt
ND4_aBSREL_Deep_rep2_output.txt
ND4_aBSREL_Deep_rep3_output.txt
ND5_aBSREL_Deep_rep0_output.txt
ND5_aBSREL_Deep_rep1_output.txt
ND5_aBSREL_Deep_rep2_output.txt
ND5_aBSREL_Deep_rep3_output.txt
ND6_aBSREL_Deep_rep0_output.txt
ND6_aBSREL_Deep_rep1_output.txt
ND6_aBSREL_Deep_rep2_output.txt
ND6_aBSREL_Deep_rep3_output.txt

from hyphy.

spond avatar spond commented on August 22, 2024

Dear @glitterseaworms,

Could you attach the alignment & tree for one of the datasets where you are seeing discordant results (e.g. ATP6)?

Best,
Sergei

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Attached below:
4NT_14AA_Tree_48terminals_allloci_500mDeep.txt
ATP6_reduced_codon_aligned.txt

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Hi Sergei,

Not sure if that's affecting it, but my tree was a mess. I am going to try using the attached tree instead with no branch lengths and unrooted. I'll run 4 replicates and see if I have the same issue. I'll keep you posted.

Avery
4NT_14AA_Tree_48terminals_allloci_unrooted_noBL_deep.txt

from hyphy.

spond avatar spond commented on August 22, 2024

Dear @glitterseaworms,

Ah, it looks like the default optimization precision is insufficient for this dataset (probably still too divergent). If you see negative LRTs (especially <-1) in the testing table, this is an indication. Let me look into why this happening. A few days.

Best,
Sergei

from hyphy.

glitterseaworms avatar glitterseaworms commented on August 22, 2024

Thanks Sergei! In case this is useful for the investigation in some way, I tried running ATP6 with an unrooted input tree containing branch lengths and one without branch lengths previously specified. Both outputs are attached. As you can see, aBSREL analysis with branch lengths on input tree results in subs/site 9.955, but without branch lengths on input tree results in 1583.030. However, both outputs contain -LRTs in the testing tables.
Attached are the 2 different trees I used, and the 2 output files. Hopefully this helps. Thank you again!!
4NT_14AA_Tree_48terminals_allloci_unrooted_noBL_deep.txt
4NT_14AA_Tree_48terminals_allloci_unrooted_yesBL_deep.txt
ATP6_aBSREL_Deep_unrooted_noBL_rep0_output.txt
ATP6_aBSREL_Deep_unrooted_yesBL_rep0_output.txt

from hyphy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.