ubc-stat-ml / conifer Goto Github PK
View Code? Open in Web Editor NEWBayesian phylogenetic inference tools
Bayesian phylogenetic inference tools
It seems that SimplePhyloModel
now contains a series of experiments that make it hard to reference it as the base model. Would it be helpful to move this functionality into another class?
Running BlangSmallExample
results in the following warning:
"Warning: command ln returned non-zero code (1). Output so far:
ln: results/latest: No such file or directory"
Here the project's main directory is under directories with whitespace in their names; that's what's causing the problem.
The buildCommand
method in the binc
package (line 323) that ultimately handles executing commands splits the input string (args
) by white space regardless of it being escaped or not.
Two fixes that require minimal changes could be:
binc
by:(?<!\\\\)\\s+
\\s(?=(?:(?:[^\"]*+\"){2})*+[^\"]*+$)
Results
class's refreshSoftlink
method (line 70).It might be beneficial to include some simple example data in order for the user to run SimplePhyloModel
after cloning the repo. The code currently assumes that the data resides in ~/Documents/.../*.fasta
and ~/Documents/.../*.nwk
.
When I use EJMLUtils.class to do eigendecomposition of the rate matrix when the dimension of the rate matrix is 20_20 but it is represented by 36 features, during mcmc iterations, we might encounter two types of errors: the eigendecomposition of the rate matrix represented by V_D*Vinverse is not close to the rate matrix itself. This problem relies on the conditional number of eigenvector matrix V. The second error is when calculating the stationary distribution, it can not find an eigenvalue close to 1. However, I find out the code works with the initial rate matrix when the weights for the 36 features are all 1. This rate matrix can be found in accordance.txt. It does not work during the mcmc iterations. (I will print the rate matrix that does not work later.)
To fix it, I use the eigendecomposition from Jama library. The new code that works is copied as below.
package conifer.ctmc;
import org.ejml.data.DenseMatrix64F;
import org.ejml.data.Eigenpair;
import org.ejml.factory.DecompositionFactory;
import org.ejml.factory.EigenDecomposition;
import org.ejml.ops.EigenOps;
import org.ejml.simple.SimpleMatrix;
import tutorialj.Tutorial;
import bayonet.distributions.Multinomial;
import bayonet.math.EJMLUtils;
import bayonet.math.NumericalUtils;
import bayonet.math.EJMLUtils.SimpleEigenDecomposition;
import java.io.*;
import java.util.*;
import org.jblas.ComplexDoubleMatrix;
import org.jblas.DoubleMatrix;
import org.jblas.MatrixFunctions;
import org.junit.Assert;
import org.junit.Test;
import Jama.EigenvalueDecomposition;
import Jama.Matrix;
// Added by Tingting
import org.jblas.Eigen;
/**
* A continuous time Markov chain. The main functionalities consists
* in computing a marginal transition probability and a stationary distibution
* (see below).
*
* This implementation is based on caching the eigendecomposition
* of the provided rate matrix.
*
* @author Alexandre Bouchard ([email protected])
*
*/
public class EigenCTMC implements CTMC
{
private static final double THRESHOLD = 1e-6;
private final SimpleEigenDecomposition eigenDecomp;
private final double [] stationaryDistribution;
private final double [][] rates;
/**
* Note: if the RateMatrix is changed in place,
* these changes will not be mirrored by this class.
*
* It should be recreated each time a likelihood
* calculation is performed.
* @param rateMatrix
*/
public EigenCTMC(double [][] rates)
{
RateMatrixUtils.checkValidRateMatrix(rates);
this.eigenDecomp = SimpleEigenDecomposition(rates);
this.stationaryDistribution = computeStationary();
this.rates = rates;
}
public static SimpleEigenDecomposition SimpleEigenDecomposition(double [][] rates)
{
Matrix ratesMatrix = new Matrix(rates);
EigenvalueDecomposition ed = new EigenvalueDecomposition(ratesMatrix);
Matrix V = ed.getV();
Matrix D =ed.getD();
Matrix Vinverse = V.inverse();
Matrix resultMatrix = V.times(D).times(V.inverse());
//check if result and rates are close enough
SimpleMatrix trueMatrix = new SimpleMatrix(rates);
SimpleMatrix calculatedMatrix = new SimpleMatrix(resultMatrix.getArray()) ;
if(EJMLUtils.isClose(trueMatrix, calculatedMatrix, THRESHOLD))
{
return new SimpleEigenDecomposition(V, D, Vinverse);
}else{
throw new RuntimeException();
}
}
public static class SimpleEigenDecomposition
{
public final Matrix V, D, Vinverse, calculatedRateMtx;
public SimpleEigenDecomposition(Matrix v, Matrix d,
Matrix vinverse)
{
V = v;
D = d;
Vinverse = vinverse;
calculatedRateMtx=V.times(D).times(V.inverse());
}
}
/**
* Compute and cache the stationary (called by the constructor)
* @return
*/
private double[] computeStationary()
{
Matrix marginal = new Matrix(marginalTransitionProbability(1.0));
Matrix exponentiatedTranspose = marginal.transpose();
double [] result = null;
final int dim = marginal.getColumnDimension();
EigenvalueDecomposition eigenDecomp = new EigenvalueDecomposition(exponentiatedTranspose);
// Find an eigen value equal to one (up to numerical precision)
// result will hold the corresponding eigenvalue
double [] realEigenvalues=eigenDecomp.getRealEigenvalues();
double [] imageEigenvalues=eigenDecomp.getImagEigenvalues();
double [][] eigenVectorTranspose = eigenDecomp.getV().transpose().getArray();
for (int i = 0; i < dim; i++)
if (NumericalUtils.isClose(0.0, imageEigenvalues[i], NumericalUtils.THRESHOLD)&&
NumericalUtils.isClose(1.0, realEigenvalues[i], NumericalUtils.THRESHOLD))
{
result = eigenVectorTranspose[i];
}
if (result == null)
throw new RuntimeException("Could not find an eigenvalue equal to one. Not a proper rate matrix?");
double sum = NumericalUtils.getNormalization(result);
if (sum < 0.0)
{
for (int i = 0; i < result.length; i++)
{
result[i] *= -1;
}
}
Multinomial.normalize(result);
return result;
}
/**
*
* Fill in ``marginalTransitionProbability()``.
*
* Use the diagonalization method covered in class, using
* the eigen-decomposition functionalities provided by EJML.
*/
@Tutorial(showSource = false, showLink = true)
public double [][] marginalTransitionProbability(double branchLength)
{
DoubleMatrix rateMtx = new DoubleMatrix(eigenDecomp.calculatedRateMtx.getArray());
rateMtx.muli(branchLength);
double [][] result = MatrixFunctions.expm(rateMtx).toArray2();
NumericalUtils.checkIsTransitionMatrix(result);
for (int row = 0; row < result.length; row++)
{
Multinomial.normalize(result[row]);
}
return result;
}
public double [] stationaryDistribution()
{
return stationaryDistribution;
}
@Override
public double[][] getRateMatrix()
{
return rates;
}
// public static void main(String [] args)
// {
// SimpleRateMatrix tryMtx = RateMatrices.accordance();
// SimpleMatrix trySimpleMtx =new SimpleMatrix(tryMtx.getRateMatrix());
// RateMatrixUtils.checkValidRateMatrix(tryMtx.getRateMatrix());
// EJMLUtils.simpleEigenDecomposition(trySimpleMtx);
// System.out.println("finished");
// }
}
With the new changes to bayonet's management of the CODA plots, conifer can no longer produce readable plots when there are many hyperparameters.
I will commit a fix to this in bayonet that also adds additional mcmc diagnostics. If we decide later on that all of these diagnostics are overkill (in terms of HD space), we can easily limit the number of diagnostic plots.
It seems that one of the test case is not terminating.
Fails on some executions, succeeds in others.
Hello,
I am experiencing the following problems.
Let's say I made a custom probes.txt containing 101 probes in a chromosome.
chr start stop name
1 45794966 45795119 MUTYH
1 45796176 45796239 MUTYH
1 45796842 45797016 MUTYH
1 45797080 45797238 MUTYH
1 45797321 45797531 MUTYH
....
Whe I run "conifer.py analyze ..", I got the following message.
conifer.py:157: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 101 but corresponding boolean dimension is 100
And when I run "conifer.py export ..", the resultant bed file contained only 100 rows and the last region in the probe.txt was omitted.
Could you tell me about how to resolve this problem?
Thanks
Please run gradle build
before pushing changes to repository
The error is:
Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/PairProteinModel.java:14: package briefj.tomove does not exist
import briefj.tomove.Results;
^
/Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/PairProteinModel.java:52: cannot find symbol
symbol : variable Results
location: class conifer.PairProteinModel
private final PrintWriter treeWriter = BriefIO.output(Results.getFileInResultFolder("trees.newick"));
^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/io/newick/NewickParser.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
2 errors
:compileJava FAILED
Keeping Tree 1 the same and using the following for tree 2, will result in the error:
FES_4.fasta as multiple alignment file
FES.ape.4.nwk as starting phylogeny
Note
These data files are downloaded as part of the build process and could be found under the src/main/resources/conifer/sampleInput
directory.
The Error
Exception in thread "main" java.lang.RuntimeException: DiscreteFactors should not have negative entries
at bayonet.marginal.DiscreteFactorGraph$DiscreteUnaryFactor.<init>(DiscreteFactorGraph.java:703)
at bayonet.marginal.DiscreteFactorGraph$DiscreteUnaryFactor.<init>(DiscreteFactorGraph.java:623)
at bayonet.marginal.DiscreteFactorGraph.pointwiseProduct(DiscreteFactorGraph.java:380)
at bayonet.marginal.DiscreteFactorGraph$1.pointwiseProduct(DiscreteFactorGraph.java:337)
at conifer.moves.SPRMove.execute(SPRMove.java:105)
at blang.mcmc.MoveSet.sweep(MoveSet.java:78)
at blang.MCMCAlgorithm.run(MCMCAlgorithm.java:37)
at blang.MCMCRunner.run(MCMCRunner.java:25)
at conifer.HierarchicalPhyloModel.main(HierarchicalPhyloModel.java:110)
The following standard conventions should be followed:
pairProteinModel
for example, into PairProteinModel
.final static
) should be in ALL_CAPS.ExpFamMixture.Accordance45()
for example).Please use the refactoring options in eclipse to correct these.
The class RealVectorMHProposal
appears both in conifer.moves
and blang.mcmc
. Eventually, should be migrated to blang.mcmc
. Note that the current implementation (Aug 26, 2014) of blang.mcmc.RealVectorMHProposal
seems currently incorrect though.
If SimplePhyloModel is run with AllBranchesScaling
move excluded, i.e.,
runner.factory.excludeNodeMove(AllBranchesScaling.class);
it will crash with the following error:
Exception in thread "main" java.lang.RuntimeException: Max number of transitions exceeded 1000000
at conifer.ctmc.EndPointSampler.sampleNTransitions(EndPointSampler.java:142)
at conifer.ctmc.EndPointSampler.sample(EndPointSampler.java:58)
at conifer.models.MultiCategorySubstitutionModel.samplePosteriorPaths(MultiCategorySubstitutionModel.java:130)
at conifer.models.MultiCategorySubstitutionModel.samplePosteriorPaths(MultiCategorySubstitutionModel.java:84)
at conifer.moves.PhyloHMCMove.execute(PhyloHMCMove.java:55)
at blang.mcmc.MoveSet.sweep(MoveSet.java:78)
at blang.MCMCAlgorithm.run(MCMCAlgorithm.java:37)
at blang.MCMCRunner.run(MCMCRunner.java:25)
at conifer.TestPhyloModel.main(TestPhyloModel.java:80)
The data files used may be found here:
Multiple Alignment File
The tree
To fix the topology but have the branch-length as latent variables, is the following correct?
Exclude the moves that propose a new topology, namely the following:
factory.excludeNodeMove((Class<? extends Move>) (Object) SingleNNI.class);
factory.excludeNodeMove(SPRMove.class);
Related to issue #22.
TestSimplePhyloModel fails. Unclear when this got broken (many recent
changes seem to have been pushed without running the test suite).
Many blocks contain nearly duplicate versions of large chunks code, making the project harder to maintain.
For example:
ExpFamMixture.proteinGTR()
vs ``ExpFamMixture.dnaGTR()`ExpFamMixture.simpleProteinPairStateIndexer()
vs dna/protein counterpartPull the common blocks into helper private function (static if possible).
Unless there is something wrong on my side, there appears to be missing a RunFacility
file.
In the future, please test code using gradle test
to avoid compilation issues.
Is it possible that the bayonet version dependency in gradle is not updated? For example, NumericalUtils.getNormalization
does not appear in bayonet-2.0.3.jar
but is called on line 57 of EigenCTMC.java
.
I do, however, see this function in the corresponding github repo for bayonet.
This question is in regards to a basic phylo model, found as a gist here.
It is basic in the sense that it is a striped down version of SimplePhyloModel
(ie. does not exclude nodes etc), and I am using it to test the rate matrix sampling of ExpFamMixture.
My question is two part:
(I) Although a kimura1980 model is specified, all of the rate matrix entries are treated as parameters; whereas, K1980 is a one-parameter family of evolutionary models;
(II) The statio parameters are <0 for some of the states. How is this possible / what am I missing?
@sohrabsa : I thought you might have experience with this?
I assume that K1980 is only a template that the initialization is built off?
[Aside: to successfully run the gist with readable coda plots you need bayonet 2.3.5, but this has not yet been pushed to Alex's maven repo. As a workaround, build bayonet 2.3.5 and add this library in eclipse]
For example, the following could not be parsed:
((A:1,(X:1,E:1):1):1,(C:1,B:2):1);
Various other problems (stack overflow when duplicate leaf names, bad error message too for empty leaves, etc)
Only source code should go in the git repository (this include non-java source code such as JSON specification of models). To share data, the best solution is to commit scripts that download and preprocess the data.
A few small files could potentially be added so that the user can run the code right away, but only for this tutorial-type purpose. These files should not be in the root of the repository, which we should try to keep clean as much as possible.
Is the following the correct way to stop a variable from being sampled?
To have both the topology and branch-length variables fixed, remove the priors over them, for instance in SimplePhyloModel.java
, remove the treePrior
and branchLengthHyperParameter
factors by removing the following lines:
@DefineFactor
NonClockTreePrior<RateParameterization> treePrior =
NonClockTreePrior
.on(likelihood.tree);
@DefineFactor
Exponential<Exponential.MeanParameterization> branchLengthHyperPrior =
Exponential
.on(treePrior.branchDistributionParameters.rate)
.withMean(10.0);
Commit 53a1ef3 breaks compatibility with Java 1.6. The feature breaking compatibility (System.lineSeparator()
) does not seem to be worth the break. I would suggest to wait that we decide that we need 1.8 before breaking compatibility with 1.6.
It appears that the ggplot-like plots fail for small chains (<10)
Two different errors:
I. For chain length = 2
Error in exists(name, env) : argument "env" is missing, with no default
Calls: ggmcmc ... ggplot_build -> scales_add_missing -> find_global -> exists
II. For chain length = 6
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 1, 0
Calls: ggmcmc ... ggs_compare_partial -> rbind -> cbind -> cbind -> data.frame
Simple sol'n: change first plotting interval to >10., that is change this line
This is not a fatal error as it does not break compilation or successful running of the mcmc algo. I have asked this as a question in case there are unintended consequences (I've tested locally without issues).
tl;dr: the issue can be fixed by
GLMExample.bl
to GLMExample.bl.BAD
git clone [email protected]:alexandrebouchard/conifer.git
cd conifer
./gradlew --stop
This triggers downloading gradle 4.2.1, which after unpacking fails with
FAILURE: Build failed with an exception.
* What went wrong:
Could not determine java version from '11.0.7'.
11.0.7 is my java version
$ java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, sharing)
I am able to run ./gradlew clean
, which now fails with
FAILURE: Build completed with 2 failures.
1: Task failed with an exception.
-----------
* Where:
Build file '/home/mbiron/opt/conifer/build.gradle' line: 110
* What went wrong:
A problem occurred evaluating root project 'conifer'.
> No such property: archiveTask for class: org.gradle.api.internal.artifacts.dsl.LazyPublishArtifact
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
==============================================================================
2: Task failed with an exception.
-----------
* What went wrong:
A problem occurred configuring root project 'conifer'.
> Failed to notify project evaluation listener.
> 'java.lang.String org.gradle.api.tasks.compile.CompileOptions.getBootClasspath()'
> 'java.lang.String org.gradle.api.tasks.compile.CompileOptions.getBootClasspath()'
build.gradle
in blangSDK and inits reposNow I can successfully run ./gradlew clean
. However, ./gradlew installDist
fails with
> Task :generateXtext
ERROR:Type mismatch: cannot convert from MultiCategorySubstitutionModel<ExpFamMixture> to SequenceAlignment (file:/home/mbiron/opt/conifer/src/main/java/conifer/GLMExample.bl line : 20 column : 5)
ERROR:Type mismatch: cannot convert from SequenceAlignment to MultiCategorySubstitutionModel<ExpFamMixture> (file:/home/mbiron/opt/conifer/src/main/java/conifer/GLMExample.bl line : 20 column : 24)
WARNING:The import 'conifer.io.TreeObservations' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 3 column : 1)
WARNING:The import 'conifer.moves.PhyloHMCMove' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 6 column : 1)
> Task :generateXtext FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':generateXtext'.
> Xtext validation failed, see build log for details.
It looks like GLMExample.bl
is attempting to do something which has now been forbidden in Java 11.
GLMExample.bl.BAD
Now installDist successfully completes Task :generateXtext
but fails at
> Task :startScripts FAILED
FAILURE: Build failed with an exception.
* What went wrong:
A problem was found with the configuration of task ':startScripts'.
> No value has been specified for property 'mainClassName'.
Now it works!
$ ./gradlew installDist
> Task :generateXtext
WARNING:The import 'conifer.io.TreeObservations' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 3 column : 1)
WARNING:The import 'conifer.moves.PhyloHMCMove' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 6 column : 1)
> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
BUILD SUCCESSFUL in 30s
5 actionable tasks: 5 executed
You can find the working build.gradle here (I had to add .txt to it so that github allowed it).
In NonClockTreePrior.generate()
(similar bug in legacy), trees with balanced topologies are over-generated (histories, not topologies are uniform).
Currently the jebl 2.0
jar file that is used for majority rule tree construction is downloaded upon building from a git
repository. It would be more elegant and less code in the gradle build file if it had it's own repository (e.g., Maven).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.