Coder Social home page Coder Social logo

astral's People

Contributors

chaoszhang avatar esayyari avatar hyphaltip avatar maryamrabiee avatar smirarab avatar tandyw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

astral's Issues

input newick

(from pranjal)

ASTRAL fails on inputs where a single taxon is contained in
parentheses. Technically these might not be legal newick files but a
better error message might be nice.

e.g.

$ cat /tmp/astral-test-2

(a, (b), (c, d))

$ java -jar ~/.local/lib/astral.4.10.12.jar -i /tmp/astral-test-2

================== ASTRAL =====================

This is ASTRAL version 4.10.12
Gene trees are treated as unrooted
1 trees read from /tmp/astral-test-2
All output trees will be arbitrarily rooted at a

======== Running the main analysis
Number of taxa: 4 (4 species)
Taxa: [a, b, c, d]
Taxon occupancy: {a=1, b=1, c=1, d=1}
Number of gene trees: 1
0 trees have missing taxa
Calculating quartet distance matrix (for completion of X)
Exception in thread "main" java.util.NoSuchElementException
at java.util.ArrayDeque.removeFirst(java.base@9-Ubuntu/ArrayDeque.java:264)
at java.util.ArrayDeque.pop(java.base@9-Ubuntu/ArrayDeque.java:499)
at phylonet.coalescent.SimilarityMatrix.populateByQuartetDistance(SimilarityMatrix.java:139)
at phylonet.coalescent.WQDataCollection.calculateDistances(WQDataCollection.java:398)
at phylonet.coalescent.WQDataCollection.computeTreePartitions(WQDataCollection.java:372)
at phylonet.coalescent.AbstractInference.setupSearchSpace(AbstractInference.java:288)
at phylonet.coalescent.AbstractInference.setup(AbstractInference.java:269)
at phylonet.coalescent.CommandLine.runOnOneInput(CommandLine.java:600)
at phylonet.coalescent.CommandLine.runInference(CommandLine.java:584)
at phylonet.coalescent.CommandLine.main(CommandLine.java:390)

Raxml issue

Hi,

Everytime I run either PASTA or sate, on Raxml option, the invocation of Raxml fails on both. The error file says the folowing:

SATe ERROR: SATe is exiting because of an error:
SATe failed because one of the programs it tried to run failed.
The invocation that failed was:
"C:\Users\Peji\satewin-v2.2.7-2013Feb15\bin\raxmlp.exe" "-m" "GTRCAT" "-n" "default" "-q"
"C:\Users\Peji.sate\satejob\temphaplqj\init_tree\tempraxmlogtwgs\partition.txt" "-s"
"C:\Users\Peji.sate\satejob\temphaplqj\init_tree\tempraxmlogtwgs\input.phy" "-T" "4"

p.s. The invocation did not fail on the "extra Raxml search" option on PASTA, however, my laptop ran out of battery before the job was completed.

Please advice.

Best wishes,
Pejvak

Strains level phylogenomic

Hello, thank you for the development of ASTRAL, it is very helpful.

Just wonder if ASTRAL can be applied to microbial phylogenomic in strains leve lwithin the same genus say Escherichia coli, given that I already have a group of gene trees constructed using RAxML. Since I have read some publications which cited ASTRAL, most of them used ASTRAL to infer the phylogeny in species level for both eukaryotes and prokaryotes.

multi-copy genes

Hi,

Does ASTRAL accept multi-copy gene trees as input? I raise this question because single-copy genes are restricted in some datasets, especially in plants.

Bests,

Tao

-x in MP-Similarity branch

in MP-Similarity branch, using -x gives:

Exception in thread "main" java.lang.RuntimeException: Not supported
        at phylonet.util.BitSet$ImmutableBitSet.clear(BitSet.java:67)
        at phylonet.coalescent.AbstractDataCollection.addAllPossibleSubClusters(AbstractDataCollection.java:53)
        at phylonet.coalescent.AbstractInference.setupSearchSpace(AbstractInference.java:326)
        at phylonet.coalescent.AbstractInference.setup(AbstractInference.java:299)
        at phylonet.coalescent.CommandLine.runOnOneInput(CommandLine.java:607)
        at phylonet.coalescent.CommandLine.runInference(CommandLine.java:598)
        at phylonet.coalescent.CommandLine.main(CommandLine.java:508)

multiind version not working with mappingfile

I have been trying to run the multiind version for ASTRAL, but it doesn't work with my mapping file for some reason. I have tried both ways suggested in the readme, but get the same error:

================== ASTRAL ===================== 

This is ASTRAL version 5.4.4
Gene trees are treated as unrooted
The input file is not in correct format
Any gene name can only map to one species

The top lines of mapping files with each of the formats look like this:
First suggested format:
Pl_aurea 10 KM192911-Pl_aurea KM155391-Pl_aurea KM17690-Pl_aurea KM82093-Pl_aurea KM117990-Pl_aurea KM20739-Pl_aurea KM120160-Pl_aurea KM137120-Pl_aurea KM164896-Pl_aurea KM126522-Pl_aurea

Second suggested format:
Pl_aurea:KM192911-Pl_aurea,KM155391-Pl_aurea,KM17690-Pl_aurea,KM82093-Pl_aurea,KM117990-Pl_aurea,KM20739-Pl_aurea,KM120160-Pl_aurea,KM137120-Pl_aurea,KM164896-Pl_aurea,KM126522-Pl_aurea

When i run it without the mapping file it works fine, so it either has to do with the settings or the way I setup the mapping file.

Any ideas?

Permissions issue in lib directory

This is a minor issue, but the jar files in the lib directory are not set to be readable by all users of a system. This caused an issue when I installed ASTRAL on a shared machine-- I could execute ASTRAL but another user could not.

run `-t 32` option generate an R script, doesn't make sense

Describe the bug
When I ran cmd below [To Reproduce section] it will generate a R script freqQuadVisualization.R and freqQuad.csv file (it has 6 columns as described in the tutorial). As the R script shown below, a couple things doesn't make sense to me:

  1. The R script request a csv file called freqQuadCorrected.csv, however it only produced freqQuad.csv file [I guess we can rename]

  2. Well, the freqQuad.csv only has is 6 columns, but the R script freqQuadVisualization.R also request column ofV7, V8 and V9, which eventually cause the R script failed.

Please explain:

  1. Is freqQuadCorrected.csvand freqQuad.csv should be the same?
  2. what these extra V7, V8 and V9columns in the md dataframe of the R script mean? How should I prepare them to make the freqQuadVisualization.R run?

To Reproduce
java -jar ./Astral/astral.5.7.4.jar -i genetrees.tre -q species.tre -t 32 -o test32.tre

** Version**
This is ASTRAL version 5.7.4

Additional context
The R script:

#!/usr/bin/env Rscript
red='#d53e4f';orange='#1d91c0';blue='#41b6c4';colormap = c(red,orange,blue)
require(reshape2);require(ggplot2);
dirPath = '.'; filePath = paste(dirPath,'/freqQuadCorrected.csv',sep=''); md<-read.csv(filePath,header=F,sep='\t'); md$value = md$V5/md$V6;
a<-length(levels(as.factor(md$V7)))*3.7; b<-4; sizes <- c(a,b);
md$V8<-reorder(md$V8,-md$value)
ggplot(data=md)+aes(x=V8,y=value,fill=V9)+geom_bar(stat='identity',color=1,width=0.8,position='dodge')+theme_bw()+theme(axis.text.x=element_text(angle=90))+scale_fill_manual(values=colormap,name='Topology')+geom_hline(yintercept=1/3,size=0.4,linetype=2)+ylab('relative freq.')+facet_wrap(~V7,scales='free_x')+xlab('')
pdfFile = paste(dirPath,'/relativeFreq.pdf',sep=''); ggsave(pdfFile,width = sizes[1], height= sizes[2]);

Thanks for your help!

Miao

Application to calculate consensus tree of RRHS from SNP data

Hi,

this is not a real issue, but rather a question on the applicability of ASTRAL-II:

I wonder if it is sensible to apply the coalescence approach implemented in ASTRAL to generate a consensus topology for replicate maximum likelhood trees generated with the RRHS (repeated random haplotype samples) following a procedure similar to https://academic.oup.com/mbe/article/31/4/817/1100394?

The data is genome-wide LD-filtered and represents 1000 unique random samples of the same sites (variability emerging from heterozygous sites).

Any comment or suggestion is greatly appreciated.

Best,
Daniel

Branch lengths visualisation

Hi,

thank you for a fantastic software. I know this is talked about in the tutorial and in some issues but I just cannot seem to visualise branch lengths.

I inputted trees to Astral and then annotated the output tree as follow:

java -jar astral.5.7.4.jar -i ALLRAxML_bestTree.tre -o ALLTHEGF_bestTree.tre

#annotate each branch
java -jar astral.5.7.4.jar -q ALLTHEGF_bestTree.tre -i ALLRAxML_bestTree.tre -o Astral_Scores_test.tre -t 2

I obtained the following tree:

(A,(B,(C,((D,E)'[q1=0.6492709670753897;q2=0.17078776877555552;q3=0.17994126414898182;f1=48883.61111110609;f2=12858.611111111575;f3=13547.77777777684;pp1=1.0;pp2=0.0;pp3=0.0;QC=18;EN=75290.0]':0.642251644494381,((J,K)'[q1=0.8141459689201473;q2=0.07307012883516813;q3=0.11278390224464857;f1=61297.04999999789;f2=5501.449999999808;f3=8491.49999999959;pp1=1.0;pp2=0.0;pp3=0.0;QC=20;EN=75290.0]':1.2772704042843872,(I,(F,(G,H)'[q1=0.763622327002258;q2=0.14648525700624254;q3=0.08989241599149954;f1=57493.125;f2=11028.875;f3=6768.0;pp1=1.0;pp2=0.0;pp3=0.0;QC=8;EN=75290.0]':1.0368164294930535)'[q1=0.7495190027133111;q2=0.07991575432138566;q3=0.17056524296530742;f1=56431.28571428519;f2=6016.857142857126;f3=12841.857142857996;pp1=1.0;pp2=0.0;pp3=0.0;QC=14;EN=75290.0]':0.9788673697712466)'[q1=0.4727759330588509;q2=0.22122946827823983;q3=0.30599459866294726;f1=35595.30000000088;f2=16656.366666668677;f3=23038.3333333333;pp1=1.0;pp2=0.0;pp3=0.0;QC=30;EN=75290.0]':0.23465262815767224)'[q1=0.3715267631823497;q2=0.27663589675476813;q3=0.3518373400628786;f1=27972.249999999105;f2=20827.916666666493;f3=26489.83333333413;pp1=1.0;pp2=0.0;pp3=3.3430618804486773E-81;QC=48;EN=75290.0]':0.058988874772521314)'[q1=0.4203828529685242;q2=0.3360024571656051;q3=0.24361468986586746;f1=31650.625000000186;f2=25297.62499999841;f3=18341.75000000116;pp1=1.0;pp2=0.0;pp3=0.0;QC=24;EN=75290.0]':0.1399127436801601)'[q1=0.8231488245450923;q2=0.08465267631823616;q3=0.09219849913667154;f1=61974.875;f2=6373.5;f3=6941.625;pp1=1.0;pp2=0.0;pp3=0.0;QC=8;EN=75290.0]':1.3269197907699364));

But when trying to visualise it in FigTree I cannot seem to see the branch lengths:

visualisations

I read again and again the tutorial and the issues but I cannot get my head around it and a tiny bit of help would be much appreciated!

Thank you

Ludo

Annotation formats are not standard

Branch annotations can be made into standard formats. Examples are:

  • Phyloch
  • ?

Also, annotations can be written out as .csv files.

Thanks
Siavash

very long running time

Hello,

we are trying to run Astral 4.10.12 for 70 taxa, 244 individual gene trees and 100 multilocus bootstrapping.
The program has been running for 30 hours and did only 7 bootstrapped trees...

The current stderr is

Number of gene trees: 244
244 trees have missing taxa
Calculating quartet distance matrix (for completion of X)
Species tree distances calculated ...
Will attempt to complete bipartitions from X before adding using a distance matrix.
Building set of clusters (X) from gene trees
Number of Default Clusters: 69733
calculating extra bipartitions to be added at level 1 ...
Number of Clusters after addition by distance: 70225
Adding to X using resolutions of greedy consensus ...

Is this normal?? Is there a way to improve it?

Our system is a 64 bit windows 7 with 64 GB RAM (32 GB is currently being used by Java/Astral) and Processor Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 2501 Mhz, 12 Core(s), 24 Logical Processor(s)

Thanks,

Nath & Nad

Search for the names of genes that correspond to quartet support

Hi Siavash,
I am glad to find that your software can solve my problem, when I used the option –t2. As explained in your protocol, I can get 5 sets of data from the results. Then came the question is that how can I get the names of genes which correspond to three values show quartet support(q1,q2,q3). So far I can only get the number of genes (f1,f2,f3) correspond to quartet support. If the number of genes can be calculated, it must be after the statistics of the classified genes, then the corresponding names can also be grouped.
I am eager to solve this matter and hope for good news. Thank you very much.
Best wishes.
Serene

newick trees with polytomies as input

Dear all, just a quick question:
can ASTRAL handle also trees (newick) with polytomies while other branching pattterns are present (with supoort values)? If so, does within Astral estimation of the species tree also the branch length does play a role or not? (asking because when I'd use as input a tree where I have previously contracted/collapsed clades due to a low support value , its actually a bit artificial isn't it?

  • Would you recommend it or rather not?

boostraping

When I am doing the bootstraping, however, there was no support value adding to the input tree. The test-file did not work either. I tried to use "-o", then I got the number of "-r"+2 trees. How to get the best species with support values? Thanks.

does it give BS support when number of gene is low?

I edited or removed my previous issue, ASTRAL was not working properly since I hadn't gave the file with a new line after every ; (I picked an old file to reanalysis from months ago), that was my fault.

fatal error by the Java Runtime Environmen

Hi,

Just downloaded ASTRAL and run java -D"java.library.path=lib/" -jar native_library_tester.jar

and got the following error:
hs_err_pid1249854.log

There are 2 threads used to run.
Native AVX library found.

A fatal error has been detected by the Java Runtime Environment:

SIGILL (0x4) at pc=0x00007f789ad7e34a, pid=1249854, tid=1249855

JRE version: OpenJDK Runtime Environment (11.0.9+11) (build 11.0.9+11-Ubuntu-0ubuntu1.20.04)
Java VM: OpenJDK 64-Bit Server VM (11.0.9+11-Ubuntu-0ubuntu1.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
Problematic frame:
C [libAstral.so+0x234a] Java_phylonet_coalescent_Polytree_00024PTNative_cppBatchCompute+0x10a

Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/evgeniy/soft/Astral/core.1249854)

An error report file with more information is saved as:
/home/evgeniy/soft/Astral/hs_err_pid1249854.log

If you would like to submit a bug report, please visit:
https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
The crash happened outside the Java Virtual Machine in native code.
See problematic frame for where to report the bug.

Aborted (core dumped)

The hs_err_pid1249854.log is attached.

best,

Evgeniy

Branch length for root and terminal branch lengths upon rerooting in FigTree

First, thanks very much for the software. I'm not sure if the following should be considered an ASTRAL issue or a FigTree issue, but I am having an issue with what I think are incorrect terminal branch lengths upon rerooting an ASTRAL tree in FigTree. I think this is because of a missing branch length for the root.

I'm using ASTRAL 5.6.1 from the zip file in the repo. I've got a few thousand gene trees for three individuals per species for 7 species, and a single individual for a pair of sister outgroup species which I use to manually root the tree, so I am estimating terminal branch lengths for 7 of 9 species.

A simplified version (truncating branch lengths for readability) of the output tree from ASTRAL 5.6.1 is:

(Sp1:0.8,(Sp2:0.4,(Sp3:0.7,(((Sp4:1.1,Sp5:0.9)0.99:0.02,(Sp6:0.6,Sp7:0.7)1:0.03)1:0.1,(Sp8,Sp9)1:3.2)1:0.6)1:0.1));

Opening this in FigTree v1.4.3 gives the following (with branch lengths turned on):
astral5.6.1.output.tree.beforeRooting.pdf

The red branches in that tree lack branch lengths in the Newick and so are assigned a value of 1 by FigTree. This seems fine for Sp8 and Sp9, since those are the outgroups with a single representative sample and so terminal branch lengths were not estimated, which is clear from the README. However, the red branch of length 1 between Sp1 and all of the other species becomes problematic upon rerooting on the (brown) branch leading to Sp8 and Sp9:

astral5.6.1.output.tree.afterRooting.pdf

It's clear upon rooting in FigTree that the long brown branch of length 3.2 is now pulled in half by the rooting, with 1.6 on each side. However, the terminal branch of Sp1 is now 1.8 instead of 0.8--the artificial branch length of value 1 that FigTree introduced for the root earlier has been added to the terminal branch length of Sp1.

If I instead add a 0 to the final branch length, then I think the tree shows the correct terminal branch lengths upon rooting. That would make this Newick:
(Sp1:0.8,(Sp2:0.4,(Sp3:0.7,(((Sp4:1.1,Sp5:0.9)0.99:0.02,(Sp6:0.6,Sp7:0.7)1:0.03)1:0.1,(Sp8,Sp9)1:3.2)1:0.6)1:0.1):0);

Upon rooting that tree, I get the following:

astral5.6.1.output.tree.root0.pdf

Is that approach (manually adding a :0 before the final parenthesis) the recommended approach to obtaining accurate terminal branch lengths in a manually rooted tree?

Thanks very much!

Running on a multi-individual datasets

Hi,

When running Astral on my datasets, I am creating a species file in the correct format and I am receiving an error saying that one of my individuals is not found in the gene trees. I've double checked that the individual is indeed in the gene trees yet I am still receiving the error. May I add that I have run this before with the same species file and had no issue. Do you know what may be causing this?

Thanks!

fail sample run, CLException

Describe the bug
Hi, I'm testing ASTRAL v5.15.0 under Windows 10 (PowerShell) using the zip file downloaded here.
I receive an error message when running the example file. This is out of my knowledge and I couldn't find anything helpful using google.
I appreciate if you could take a look.

Exception in thread "main" org.jocl.CLException: CL_INVALID_BUILD_OPTIONS
at org.jocl.CL.clBuildProgram(CL.java:11252)
at phylonet.coalescent.TurnTaskToScores$GPUCall.buildKernel(TurnTaskToScores.java:383)
at phylonet.coalescent.TurnTaskToScores$GPUCall.initCL(TurnTaskToScores.java:327)
at phylonet.coalescent.TurnTaskToScores$GPUCall.(TurnTaskToScores.java:310)
at phylonet.coalescent.TurnTaskToScores.(TurnTaskToScores.java:87)
at phylonet.coalescent.WQInferenceConsumer.setupMisc(WQInferenceConsumer.java:912)
at phylonet.coalescent.AbstractInference.setup(AbstractInference.java:235)
at phylonet.coalescent.CommandLine.runOnOneInput(CommandLine.java:631)
at phylonet.coalescent.CommandLine.runInference(CommandLine.java:622)
at phylonet.coalescent.CommandLine.main(CommandLine.java:531)

To Reproduce
java -jar astral.5.7.3.jar -i test_data/song_primates.424.gene.tre

Log file
Send us the log file (standard error of ASTRAL)
1.log is the log file.
1.log

** Version
v5.15.0

Additional context
I follow the suggestion here: https://github.com/smirarab/ASTRAL/blob/5.14.6/README.md#installation

Tried github and make.sh, but didn't work
ran:
java -D"java.library.path=lib/" -jar native_library_tester.jar

report:
Exception in thread "main" java.lang.NoClassDefFoundError: phylonet/coalescent/Logging
at phylonet.coalescent.NativeLibraryTester.main(NativeLibraryTester.java:22)

It seems like it has some problem with the phylonet?

Thank you very much for your attention.

Question: Do Parsing Errors cause ASTRAL to quit with a non-zero exit code?

Similar or related to #19 or #16

When I run ASTRAL-MP and there is a parsing error, e.g.

================== ASTRAL ===================== 

This is ASTRAL version 5.15.1
Using native AVX batch computing.
Gene trees are treated as unrooted
There are 8 threads used to run.
Exception in thread "main" java.lang.RuntimeException: Failed to Parse Tree number: 1
	at phylonet.coalescent.CommandLine.readInputTrees(CommandLine.java:750)
	at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:369)
	at phylonet.coalescent.CommandLine.main(CommandLine.java:517)
Caused by: phylonet.tree.io.ParseException: ')' expected
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:405)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
	at phylonet.tree.io.NewickReader.readTree(NewickReader.java:374)
	at phylonet.tree.io.NewickReader.readTree(NewickReader.java:95)
	at phylonet.coalescent.CommandLine.readInputTrees(CommandLine.java:717)
	... 2 more

the process does not seem to fail/exit automatically. This is true for me from the BASH shell (terminal program on Mac OS X), running it via the shell via the ! symbol in an IPython cell in a Jupyter notebook, or calling it via Python's os.system function -- the process just hangs indefinitely and does not exit by itself.

This is admittedly not a big deal, especially when ASTRAL is being used interactively. It is more inconvenient when ASTRAL is being run multiple times inside of a script (as I am currently doing), because when ASTRAL fails with a parse error the script keeps running indefinitely, and the only tipoff (if I don't actively monitor the contents of all of the log files piped from STDERR) that something is amiss is that the fans on my laptop don't start running.

Please let me know whether I should clarify anything because I feel like I did not succeed in describing the situation as clearly as I would have liked

Bootstrap analyses large data memory error

Hello,
Iam trying to run bootstrap analyses with astral on a large dataset (~14000 loci). Unfortunatly the run failed with error message from java which seems related to memory issue?

To Reproduce
Here is the command used:

#Multi-locus bootstrapping (MLBS) (use 1000 uboot2 output from iqtree)

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -r 1000 -s 1984 -o Results/Astral_MLBS_1000.tre 2>out_Astral_MLBS_1000.log

#version with Gene+Site resampling

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 500 -s 1984 -o Results/Astral_MLBS_GeneSite_500.tre 2>out_Astral_MLBS_GeneSite_500.log

Log file
And the log file of out_Astral_MLBS_1000.log:
================== ASTRAL =====================

This is ASTRAL version 5.7.7
Gene trees are treated as unrooted
13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:358)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:728)
at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374)
at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

log from out_Astral_MLBS_GeneSite_1000.log
================== ASTRAL =====================

This is ASTRAL version 5.7.7
Gene trees are treated as unrooted
13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541)
at java.lang.StringBuffer.append(StringBuffer.java:350)
at java.util.regex.Matcher.appendReplacement(Matcher.java:888)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at java.lang.String.replaceAll(String.java:2223)
at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:730)
at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374)
at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

** Version
astral 5.7.7
Additional context
I try to run it on a hpc requesting 10 cores x 50G memory (high memory nodes). Input bootstrap tree are from iqtree2 (ufboot).
Astral analyses (LPP) using the same input worked correctly.
Add any other context about the problem here.

Thank you in advance for the help,

regards
nicolas

Terminal branch lengths for downstream applications (comparative methods)

Hello,

Thank you for your very well documented and fast software! I realize that I'm covering some old ground with a question regarding terminal branch lengths, but I was wondering if you have any recommendations for a specific case.

I currently have a species tree, in which some of the terminal branches have no length because they are represented by a single individual. I understand why this is the case after reading https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#branch-length-and-support. Unfortunately, I would now like to look at coevolution of traits across this clade, while controlling for phylogeny, by using some form of phylogenetic independent contrasts. However, such an approach requires knowing the branch lengths between pairs of taxa. I can't imagine I'm the first to want to use an ASTRAL-generated tree for such a purpose, so I was wondering if you have any suggestions for dealing with the lack of terminal branch length information? Perhaps a method specifically tailored to an ASTRAL-like MSC tree, or some ad hoc solution to inserting branch lengths? Thanks in advance for your help, and please let me know if I can provide any more information.

Best,
Brock

quartet output when running ASTRAL with -q tree and -t 8 (or -t 2)

Dear Siavash,
I hope you are doing well there in overseas!!! A quick question accoring to the quartet output q1, q2 and q3 when running e.g. -t 8 (or -t 2):
looking at the output (and then plotting it to onto a tree with ete3),

  1. EXAMPLE: q1=0.7236040305984149;q2=0.1676789398878331;q3=0.1087170295137519
    -> are e.g. 72% of the quartets etc - correct?
  2. is q1 the topology then as diplayed in the tree and q2 and 3 the both alternative topologies? how can I find out what exact quartet topology is q2 and q3? or do I therefore have to run -t 16 and can I get the quartet topologies from there?

Best, Karen

PS: not sure if DISCOVISTA has a seperate github, but another question/idea: We trie to run it on a dataset (nucleotide level) where we only have the 1rst and 2nd codon pos. included (not the third). Is there an option to tell DiscoVista this? because when trying around we realized that DICSOVISTA assumes always that all 3 codon positions are included? If not can this be implemented? (we also run e.g. datasets on nt level with only the second position...)

MANY THANKS IN ADVANCE :)
Stay healthy all there!
Karen

error in freqQuadVisualization.R

Hi there,

I am running with the option -t 16 and trying to use the R script to plot the quad frequencies. It looks like there are a few errors in the script. The first is it is trying to load a file called "freqQuadCorrected.csv", whereas the output is called "freqQuad.csv".

I tried renaming the output file to "freqQuadCorrected.csv" and running the script, but to no avail:

Error in tapply(X = X, INDEX = x, FUN = FUN, ...) : arguments must have same length Calls: reorder -> reorder.default -> tapply Execution halted

I tried to decipher what the code was doing, but to no avail. Any help would be much appreciated!

Thanks so much and happy new year!

issue with output tree

I just updated to the most recent version and now the output tree cannot be read (either by FigTree or 'ape' in R).

I get this message "Error in .nodeDepthEdgelength(Ntip, Nnode, z$edge, Nedge, z$edge.length) :
NA/NaN/Inf in foreign function call (arg 6)"

and in FigTree either it won't open anything...or it's just a blank window (no tree).

Thanks!
Chris

Problem with species mapping file

Hi,

Hopefully this is just a simple mistake on my end and an easy fix. I am currently having problems running ASTRAL when incorporating a mapping text file to constrain multiple individuals of one species to be monophyletic. Everything runs fine with out it, but once I incorporate the text file the error reads that the taxa specified are not in the gene trees, even though I can see them in the tree file. I have tried formatting my species map file both ways and have had no success.

Thanks!

Is there a way of "arbitrarily" root at certain taxa?

When I am annotating I get this message

All output trees will be *arbitrarily* rooted at raton

I have no problem with manually reroot the output tree (with another "raton") but this tree lacks the quartet support and bootstrap support at to nodes close to the new (good) root. How can I fix this? Thanks for your help

Astral v.5.6.2 the resulting species tree has some branch lengths are negative

I use 91 gene tree to produce a species tree, and with this one, I would compare the difference between contenation and coalascence.
when I imported this species tree into R, wanting to transfer it to ultrametric format, but there are some output error for this tree. there it is:
default

I checked this tree, no found any negative value. How to explain this? Is it a bug for Astral?

Segmentation fault (core dumped)

Describe the bug
Execution interrupted with the following message:
"Segmentation fault (core dumped)"

To Reproduce
java -Djava.library.path=/home/grcolli/Astral-MP/lib/ -jar astral.5.15.4.jar -i locus75loci.treefile -o out.tre 2>astral.log

Log file
Send us the log file (standard error of ASTRAL)

** Version
astral.5.15.4.jar
astral.log

Additional context
Add any other context about the problem here.

Help in interpreting output log and species tree

I ran Astral 5.6.2.
My input file was a set of 72 gene-trees from RAxML "bestTrees".
I ran just with the standard options -i -o and 2> to save the log.

I got the normalized score and the final normalized quartet score as the same value: 0.686484593837535
How do I interpret this value in a biological point of view?

When I open the output tree of Astral in FigTree, it ask me to set a name for the values that are labelled on node/braches of the tree. Are these values the CU that Astral estimate?

I have the value 1.0 for the majority of nodes, and 0.4-0.6 for a few nodes.
What the lower values could be indicating me in a biological point of view?

Missing genes per species

I have BUSCO Metazoa and BUSCO Mollusca gene sets for ~50 cephalopod species. The number of genes per species for most species is around 700 and 3500, respectively - but ranges from 550 - 950 and 2250 - 4500 across species.

In reading your documentation and papers, I understand it is generally better not to filter out genes that are missing some species - but I'm I was wondering if that holds true even at this scale - as some species will only have around half of the genes used to build the species tree. I'm unsure if this would be an extreme case - or the norm - for working with ASTRAL.

Thank you!
Eric

java runtime error

Hi,

I am a beginner to Astral, and would like to run it using 902 gene trees. I have concatenated the best tree output files generated from RaxML into one file containing all the trees, which I use as input for Astral.

The command I have used: java -jar astral.4.10.12.jar -i trees -o output

However, I get this error:

Exception in thread "main" java.lang.RuntimeException: Failed to Parse Tree number: 1
at phylonet.coalescent.CommandLine.readInputTrees(CommandLine.java:768)
at phylonet.coalescent.CommandLine.main(CommandLine.java:334)
Caused by: phylonet.tree.io.ParseException: ')' expected
at phylonet.tree.io.NewickReader.readNode(NewickReader.java:405)
at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
at phylonet.tree.io.NewickReader.readNode(NewickReader.java:399)
at phylonet.tree.io.NewickReader.readTree(NewickReader.java:374)
at phylonet.tree.io.NewickReader.readTree(NewickReader.java:95)
at phylonet.coalescent.CommandLine.readInputTrees(CommandLine.java:735)
... 1 more

I guess there is something wrong with the way I have made the tree file? The program works fine on the test data.

Many thanks
Michelle

java.lang.OutOfMemoryError

Hi Siavash,
I got a problem like this:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at phylonet.coalescent.SimilarityMatrix.populateByQuartetDistance(SimilarityMatrix.java:269)
        at phylonet.coalescent.WQDataCollection.calculateDistances(WQDataCollection.java:799)
        at phylonet.coalescent.WQDataCollection.formSetX(WQDataCollection.java:585)
        at phylonet.coalescent.AbstractInference.setupSearchSpace(AbstractInference.java:300)
        at phylonet.coalescent.AbstractInference.setup(AbstractInference.java:278)
        at phylonet.coalescent.CommandLine.runOnOneInput(CommandLine.java:685)
        at phylonet.coalescent.CommandLine.runInference(CommandLine.java:658)
        at phylonet.coalescent.CommandLine.main(CommandLine.java:535)

My data seems to be big:


This is ASTRAL version 5.6.3
Gene trees are treated as unrooted
879 trees read from 800_ML_trees
All output trees will be *arbitrarily* rooted at Cmax

======== Running the main analysis
Number of taxa: 254762 (128 species)
.....
...
Number of gene trees: 879
879 trees have missing taxa

I tried to run java with -Xmx8G, but it did not work.
How much RAM I need for such analysis?

Thank you very much!

Best, Tao

choose root or outgroup ?

hi is there any away make me able to choose root taxon instead of automatic choice done by the tool ?

Questions on branch length output

Hello,

I don't really understand when you say in astral tutorial : "Branch lengths are in coalescent units and are a direct measure of the amount of discordance in the gene trees".
Is your branch length in coalescent unit time as described in "Gene tree discordance, phylogenetic inference and the multispecies coalescent", James H.Degnan, 2009 ? If so, how it measure the amount of discordance in the gene trees ?

Any help would be much appreciated :)

Thank you

NullPointerException with multiind:5.0.2 and test_data/song_primates.424.gene.tre

java -jar astral.5.0.2.jar -i test_data/song_primates.424.gene.tre

results in a NullPointerException when writing the resulting tree

This is ASTRAL version 5.0.2
Gene trees are treated as unrooted
424 trees read from test_data/song_primates.424.gene.tre
All output trees will be arbitrarily rooted at Marmoset
Running the main analysis
Number of taxa: 14 (14 species)
Taxa: [Marmoset, Orangutan, Human, Chimpanzee, Gorilla, Macaque, Galago, Mouse_Lemur, Tree_Shrew, Rat, Tarsier, Rabbit, Horse, Sloth]
Taxon occupancy: {Human=424, Rat=424, Tarsier=424, Galago=424, Rabbit=424, Macaque=424, Sloth=424, Marmoset=424, Tree_Shrew=424, Chimpanzee=424, Mouse_Lemur=424, Horse=424, Orangutan=424, Gorilla=424}
Number of gene trees: 424
0 trees have missing taxa
Calculating quartet distance matrix (for completion of X)
Species tree distances calculated ...
Building set of clusters (X) from gene trees

Round 0 of individual sampling ...
taxon sample [Marmoset, Orangutan, Human, Chimpanzee, Gorilla, Macaque, Galago, Mouse_Lemur, Tree_Shrew, Rat, Tarsier, Rabbit, Horse, Sloth]
Number of clusters after simple addition from gene trees: 339
calculating extra bipartitions to be added at level 1 ...
Number of Clusters after addition by distance: 339
Adding to X using resolutions of greedy consensus ...
Threshold 0.0:
Threshold 0.01:
Threshold 0.02:
Threshold 0.05:
Threshold 0.1:
Threshold 0.2:
Threshold 0.3333333333333333:
polytomy of size 3; rounds with additions with at least 0.01 support: 0; clusters: 339
Number of Clusters after addition by greedy: 339

Number of Default Clusters: 339
partitions formed in 0.274 secs
Dynamic Programming starting after 0.274 secs
Using tree-based weight calculation.
Number of quartet trees in the gene trees: 424424
Size of largest cluster: 14
Total Number of elements weighted: 878
Normalized score (portion of input quartet trees satisfied): 0.9182656965675834
Final optimization score: 389734
Optimal tree inferred in 0.711 secs.
(Gorilla,((Human,Chimpanzee),(Orangutan,(Macaque,(Marmoset,(Tarsier,((Galago,Mouse_Lemur),((Horse,Sloth),(Tree_Shrew,(Rat,Rabbit))))))))));
Quartet score is: 389734
Normalized quartet score is: 0.9182656965675834
.
Exception in thread "main" java.lang.NullPointerException
at phylonet.coalescent.WQInference.scoreBranches(WQInference.java:415)
at phylonet.coalescent.WQInference.scoreSpeciesTreeWithGTLabels(WQInference.java:197)
at phylonet.coalescent.CommandLine.processSolution(CommandLine.java:636)
at phylonet.coalescent.CommandLine.runOnOneInput(CommandLine.java:614)
at phylonet.coalescent.CommandLine.runInference(CommandLine.java:592)
at phylonet.coalescent.CommandLine.main(CommandLine.java:398)

instead of

Normalized quartet score is: 0.9182656965675834
(Marmoset,((Tarsier,((Galago,Mouse_Lemur)1:2.4018020540034675,((Horse,Sloth)1:2.6056351956741435,(Tree_Shrew,(Rat,Rabbit)1:0.8024369743576611)0.92:0.08540838439417373)1:1.9852014529353605)1:0.655284855661726)1:4.260329699696363,(Macaque,(Orangutan,((Human,Chimpanzee)1:0.6408490436022495,Gorilla)1:2.3251916476229617)1:2.5898315305010144)1:2.7005416959060815));
ASTRAL finished in 1.233 secs

with astral-4.10.7

terminal localPP

It's hard for users to know where to find localPP for terminal branches for multi-ind data. Clarify and/or output those to the log file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.