rambaut / seq-gen Goto Github PK
View Code? Open in Web Editor NEWSequence simulator
Sequence simulator
This package is the generic source-code version. This can be compiled and run on most UNIX/Linux/Mac OS X systems. It can also be compiled for Windows. For Mac OS X a pre-compiled version is available from the website: http://tree.bio.ed.ac.uk/software/Seq-Gen/ There is a manual in HTML format in the doc/ directory of this package. On most UNIX systems, to compile, type: cd source make A binary called 'seq-gen' will be created in the same directory as this README file. Any questions about Seq-Gen should be sent to: Andrew Rambaut <[email protected]>
...meaning subsequent dependent calls from the command will run when they shouldn't, e.g.
seq-gen -mGTR < MyTree > MySeqs && DoSomethingWith MySeqs
I'm not sure what was causing it, but I was getting a segfault when simulating sequences using the Seq-Gen version on the website (the segfault would occur in the "output" function). I tried compiling from the GitHub source with -g to try to dive deeper and see what might be the issue using gdb, but the GitHub version worked just fine. There must be some bug in the version on the website that was fixed on GitHub, so it would be nice if the version on the website had the bug fixed as it is the link Google finds (or just link to the GitHub repo on the website directly)
Hello, what is the license for this repository?
When supplying relaxed phylip-format input sequences to act as ancestral sequences, SeqGen allows whitespace inside sequences, but crashes - "Tree is missing from end of sequence file" - if there's whitespace between the end of the sequence and the end of the line. e.g. "ACG T" is fine but "ACG T " is not.
(Biopython inserts a space every 10 bases when outputting phylip, including one at the end if the number of bases is a multiple of 10.)
currently, only HKY, F84 & GTR are for nucleotides. @rambaut
I'm getting the error Error reading tree number 1: Closing bracket missing
, but the Newick tree I'm feeding Seq-Gen seems like a valid tree (I ran it through nw_distance
to get branch lengths just to make sure it works, and nw_distance
parsed it just fine). Can you help me find my issue? Here are the contents of the file I'm trying:
1 900
N2 AGGGCCAACGTGGACTGCTTGCTATGGAGGGTGTGTCGACTCCGAGTTCACCGCCTAGTCGCTTTCCTTCCCAAGGTCAGCTTCACACAATCGGGCTATAGGGAATGCCGATTTAAAAGTGGCACGTACCGCACTGGACGCTTACTTGTCTACACATTCCACACGACAAAAGGACCGTGTTGCTTTAAACTTCGTCCAGTCGGCCCCCCGTGTCATGAAGCTCGCCAATCATTGGAGGCTCAGTCGTATGCCAGTGCTTTTCTAACTCTGGTAGCATCTCTGGGATTTATTCGCGCGCCGGATCTAGACATTACCCAAAATAACGCCCACCAGAAGTCAAGGCGCATTGAGCTGGCGTGCAATTGGTATGGCCCTTATTTTAGGCTTTTAGCGCTCAGCCTAAGAGGAGGGCGTACGGTAATATATCAGCGTGGCCAATGGGTCAATGATCTGGTGCAAGGTCCGGCGGCCCAATCTCCAGCCATGGGAAAGGCGGATCGTATGGGAATATATCGCCGATGTAGCACACAGGTGGGACGACGTAGAAGACAGGTTCGGGTGCATAGGTCCGTGAACGCACAGTGTGCACCATTACGAGTACCGCTACGCGGCGTCGTTAGATACAAACTTCCTAGGGGGCCCGAAGCCAAGGGCAGTAAGTCAAGACTAAACATCGCTACAGAAGTTCCGCTTAAGAATACTGTGACGACGACCACCCTCCTATCCCTACATGCGTCGTCGCGACTAAAGATAACGCAGCTACTACTCGGGTACATCTTGCCTGCCGTTAGAATTTTGGTACCGTACAATGCAAGCCTAGCTACCTACGAGAGAATTGGAACAGTCTTGCTACTATCAAGTGGCAAACAACAACCAAGCATTTGGTACGCTTCTTATCGG
1
(((((((N10094|53|3.387644573626598:0.000000,N1335|53|3.4648137547803013:0.385846)N10093|53|3.387644573626598:2.843726,(N10096|53|3.387644573626598:0.000000,(N10098|53|6.290898771186661:0.000000,N1550|53|6.5055189042327735:1.073101)N10097|53|6.290898771186661:14.516271)N10095|53|3.387644573626598:2.843726)N1329|53|2.818899456866464:0.028835,((N1338|53|3.9619077297964496:2.390328,N10086|53|3.4838422003374294:0.000000)N10085|53|3.4838422003374294:0.527812,(N10088|53|3.387644573626598:0.000000,((N1536|53|6.545552153111298:1.273267,N10090|53|6.290898771186661:0.000000)N10089|53|6.290898771186661:12.535181,(N1610|53|6.525077022435883:1.170891,N10092|53|6.290898771186661:0.000000)N10091|53|6.290898771186661:12.535181)N1337|53|3.78386254970119:1.981090)N10087|53|3.387644573626598:0.046824)N1330|53|3.3782797020259157:2.825736)N1328|53|2.8131325115826025:3.249158,((N10082|53|3.4838422003374294:0.000000,N1333|53|4.51547779079867:5.158178)N10081|53|3.4838422003374294:4.890211,(N10084|53|3.4838422003374294:0.000000,N1325|53|5.335201493819431:9.256796)N10083|53|3.4838422003374294:4.890211)N1324|53|2.5057999939750486:1.712496)N10|53|2.1633008480336615:0.918445,(((((N10138|76|7.136413093174793:0.000000,N739|76|7.20349385579528:0.335404)N10137|76|7.136413093174793:14.832736,(N725|76|6.440686534739932:4.398717,(N770|76|7.098571866123575:3.289427,N10140|76|6.440686534739932:0.000000)N10139|76|6.440686534739932:4.398717)N719|76|5.5609431071049045:6.955386)N704|76|4.169865967577531:1.007382,(N762|76|7.136413093174793:5.185240,(N10132|63|7.026520998197115:0.000000,(N10134|63|7.678036911951752:0.000000,(N10136|63|7.750248481326306:0.000000,N746|63|7.759424299384176:0.045879)N10135|63|7.750248481326306:0.361058)N10133|63|7.678036911951752:3.257580)N10131|63|7.026520998197115:4.635779)N735|76|6.099365107581146:10.654877)N18|76|3.968389650172391:6.202349,(N833|76|6.440686534739932:13.568654,N10142|76|3.7269556498970466:0.000000)N10141|76|3.7269556498970466:4.995179)N16|76|2.727919813583346:0.975411,(((N10130|38|4.384954202491446:0.000000,N7703|38|7.692601176576037:16.538235)N10129|38|4.384954202491446:4.099248,(((N10124|38|4.710728536659602:0.000000,N7302|38|6.745072582849662:10.171720)N10123|38|4.710728536659602:3.871776,(N10126|38|4.710728536659602:0.000000,N7543|38|5.168418140802194:2.288448)N10125|38|4.710728536659602:3.871776)N35|38|3.936373278607399:1.326461,(N32|38|5.848390579996254:7.317182,N10128|38|4.384954202491446:0.000000)N10127|38|4.384954202491446:3.569366)N26|38|3.671081001320573:0.529882)N21|38|3.5651046749252604:1.548266,(((((((N10114|35|6.109433810752977:0.000000,(N7721|35|6.68314747809591:1.384597,N10116|35|6.406228150861876:0.000000)N10115|35|6.406228150861876:1.483972)N10113|35|6.109433810752977:0.793526,N7722|35|6.109433810752977:0.793526)N7717|35|5.950728531832838:2.905630,(N10112|35|6.406228150861876:0.000000,N7718|35|6.83715584702786:2.154638)N10111|35|6.406228150861876:5.183128)N7715|35|5.369602522730817:0.276969,N10110|35|5.314208700660342:0.000000)N10109|35|5.314208700660342:0.313356,(N10104|35|5.314208700660342:0.000000,((N10106|35|6.406228150861876:0.000000,N7725|35|7.415432613936449:5.046022)N10105|35|6.406228150861876:4.617694,(N7723|35|6.542907099066029:2.167366,N10108|35|6.109433810752977:0.000000)N10107|35|6.109433810752977:3.133722)N7716|35|5.482689326353514:0.842403)N10103|35|5.314208700660342:0.313356)N42|35|5.251537410866491:1.257687,(((N1313|79|7.394220575611178:3.002561,(N1309|79|7.898448763109277:5.303604,(N10120|79|7.394220575611178:0.000000,N1315|79|7.898448763109277:2.521141)N10119|79|7.394220575611178:2.782463)N1306|79|6.837728031727479:0.220098)N1304|79|6.793708375375154:4.812583,(N10122|79|7.394220575611178:0.000000,N1322|79|7.898448763109277:2.521141)N10121|79|7.394220575611178:7.815144)N41|79|5.831191817063941:2.217568,N10118|79|5.387678251235054:0.000000)N10117|79|5.387678251235054:1.938391)N34|35|5.0:7.804189,((N7261|38|7.760882722187589:0.341408,N10102|38|7.692601176576037:0.000000)N10101|38|7.692601176576037:20.276384,(N7338|38|4.384954202491446:2.374260,((N10100|38|4.710728536659602:0.000000,N7045|38|5.479073165297347:3.841723)N10099|38|4.710728536659602:0.829614,N7125|38|7.692601176576037:15.738977)N37|38|4.544805745499566:3.173517)N29|38|3.910102255157772:1.363889)N23|38|3.6373244314426554:0.990811)N22|38|3.4391622608511176:0.918554)N15|38|3.2554514626661493:3.613069)N13|76|2.5328376431325754:2.766129)N8|76|1.979611885156058:3.104836,(((N10080|76|3.7269556498970466:0.000000,N20|76|4.41755168315786:3.452980)N10079|76|3.7269556498970466:4.622154,N921|76|7.136413093174793:21.669442)N11|76|2.8025247719501962:4.061246,(N12|76|5.480171134108298:8.766077,N10078|76|3.7269556498970466:0.000000)N10077|76|3.7269556498970466:8.683401)N7|76|1.9902755489963695:3.158154)N3|76|1.3586447399943462:6.610871,((N10074|21|2.0361489799714976:0.000000,(N7755|21|4.546571082984857:0.325177,N10076|21|4.481535604876424:0.000000)N10075|21|4.481535604876424:12.226933)N10073|21|2.0361489799714976:8.236285,((((N10064|21|6.427776507669709:0.000000,N8423|21|7.801600945082707:6.869122)N10063|21|6.427776507669709:17.695347,((N10066|21|4.481535604876424:0.000000,(N8217|21|6.6360575289254715:1.041405,N10068|21|6.427776507669709:0.000000)N10067|21|6.427776507669709:9.731205)N10065|21|4.481535604876424:6.404303,(N10070|21|6.427776507669709:0.000000,N8142|21|6.574277590049945:0.732505)N10069|21|6.427776507669709:16.135508)N8109|21|3.2006749810166126:1.559840)N8104|21|2.8887070425746124:5.667660,(N10072|21|2.0361489799714976:0.000000,N8099|21|2.334175756485804:1.490134)N10071|21|2.0361489799714976:1.404870)N8098|21|1.7551749986922354:1.117354,(N10060|21|2.0361489799714976:0.000000,(N8147|21|5.723852597930773:6.211585,N10062|21|4.481535604876424:0.000000)N10061|21|4.481535604876424:12.226933)N10059|21|2.0361489799714976:2.522224)N6|21|1.531704112895537:5.714061)N4|21|0.38889192335194406:1.762107)N2|21|0.036470598824739756:0.182353;
For example:
300, 0.5;
400, 1.75;
300, 0.75;
how to know the rate is 0.5, 1.75, and 0.75 for the three trees?
Similar to #9 but I can't solve it with regex. I downloaded the nextstrain tree (Jan 4, 2021) for nCov and wanted to run TreeToReads.py with it (newick attached below). However the seq-gen part gives the closing bracket error. I have tried a variety of things including renaming the taxa and resolving multifurcations
perl -MBio::TreeIO -e '$tree=Bio::TreeIO->new(-file=>"nextstrain_ncov_global_tree.nwk")->next_tree; for($tree->get_nodes){$i++; if($_->is_Leaf){$_->id("TAXON$i");} else {$_->id("");} } print $tree->as_text("newick")."\n";' | gotree resolve > anonymized.nwk
And breaking apart long lines
cat anonymized.nwk | perl -plane 's/(.{50,}?,)/\1\n/g' > tmp.nwk
This is my seq-gen command (and change the stdin parameter accordingly)
seq-gen -l768000 -n1 -mGTR -a5.0 -r0.25,0.82,0.15,0.27,2.99,1.00 -f0.299236590102,0.183687135874,0.196176253934,0.32090002009 -or < tmp.nwk
But nothing seems to help so far. Any ideas?
Hi,
We have this project as a package in Debian currently
Whilst I agree that every file containing "code" has a BSD license on top of it, but due to the absence of a LICENSE file, the data and documentation becomes non-free according to the free software guideline followed here.
Could you please add in the same in a LICENSE file and commit?
That'd be great.
PS: Considering that PAML has also adopted a free software license, this can also go about doing the same w/o conflicts, I suppose.
I am generating the ancestral sequences, which come with node labels. How are these node labels assigned? Where does each internal node occur in the tree?
It would be handy if the program could read the node labels in the tree - that way we know which sequence belongs to which node...
Hi there,
It was announced that FASTA format output is a new feature in Version 1.3.4 but I can't figure how to set the output parameters to run it. Also, I downloaded version 1.3.4 and seems to be version 1.3.2x:
Seq-Gen-1.3.4$ seq-gen -h
Sequence Generator - seq-gen
Version 1.3.2x
Cheers,
It is not possible to echo a string into seq-gen
to use as a tree:
$ echo "(A:0.1,B:0.1,C:0.1);" | /home/sam/ware/Seq-Gen.v1.3.3/source/seq-gen -mGTR
Sequence Generator - seq-gen
Version 1.3.2x
(c) Copyright, 1996-2004 Andrew Rambaut and Nick Grassly
Department of Zoology, University of Oxford
South Parks Road, Oxford OX1 3PS, U.K.
Error reading tree number 1: .
Meanwhile, the following works:
/home/sam/ware/Seq-Gen.v1.3.3/source/seq-gen -mGTR <<< "(A:0.1,B:0.1,C:0.1);"
From what I can tell, this could be caused by multiple calls of feof(stdin)
in seq-gen.c
or treefile.c
?
I'm not really sure what the "first pass" of stdin
does, as this is also a valid input that produces sequences:
/home/sam/ware/Seq-Gen.v1.3.3/source/seq-gen -mGTR
Sequence Generator - seq-gen
Version 1.3.2x
(c) Copyright, 1996-2004 Andrew Rambaut and Nick Grassly
Department of Zoology, University of Oxford
South Parks Road, Oxford OX1 3PS, U.K.
<CTRL-D>
(A:0.1,B:0.1,C:0.1);
<CTRL-D>
For long time seq-gen was considered non-free due to non-free license of paml parts.
Currently the situation has changed and paml is released under GPL license.
I wonder if you can update the outdated paml code in seq-gen and make it this way completely free ?
This will enable me to include seq-gen to the main repository of Debian. Currently I can not do this as the old paml code doesn't comply with Debian Free Software Guidelines.
Hello,
I'm following a pipeline which used Seq-Gen as a subprocess within a python script, and it gives the tree in text with <<<
instead of as a file name. I've seen in other issues that this is not a problem, but for some reason, it is for me. I'm running Seq-Gen 1.3.4 installed from conda, but I also had the same problem with the latest version, compiled from source:
seq-gen -mGTR -q -a1000 -z1686042325.0655584 -l 1000 -f0.277,0.228,0.246,0.249 -r1,1.68369,1,1,1.91645,1 <<< "(B:0.3598052239,D:0.3425989485,(A:0.4731590178,C:0.46432
18278):0.0822832145);"
Error reading tree number 1: .
Not sure whether I'm doing something wrong. I tried with a simpler command, used by @SamStudio8 on Issue #4, but I get the same error:
seq-gen -mGTR <<< "(A:0.1,B0.1,C:0.1);"
Sequence Generator - seq-gen
Version 1.3.4
(c) Copyright, 1996-2017 Andrew Rambaut and Nick Grassly
Institute of Evolutionary Biology, University of Edinburgh
Originally developed at:
Department of Zoology, University of Oxford
Error reading tree number 1: .
However, saving the tree and passing it as a filename works perfectly:
echo "(B:0.3598052239,D:0.3425989485,(A:0.4731590178,C:0.4643218278):0.0822832145);" > tree.nwk
seq-gen -mGTR -q -a1000 -z1686042325.0655584 -l 1000 -f0.277,0.228,0.246,0.249 -r1,1.68369,1,1,1.91645,1 < tree.nwk
# Sequences produced...
Thus, it seems to not be a problem of the tree format (also, the pipeline I'm following uses the exact line I pasted in the first code block), but I'm at loss as to what else could I test.
Many thanks.
-carlos
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.