claczny / vizbin Goto Github PK
View Code? Open in Web Editor NEWRepository of our application for human-augmented binning
Repository of our application for human-augmented binning
did changing the email notification work?
Hello Cedric,
I wanna re-plot my clusters with new colors instead of default blue. So I create the label file, in the label file, I left non-targeting sequences with 0, and used red, yellow, green, orange,black,pink,purple,cyan,and gold for different clusters. However, in the visualisation, there were much more red dots than I expected and some colors didn't show. If I didn't provide the label information, all sequences would be blue dots. But now it seems like red is the default color. Also, the yellow-green fusion region can be divided into 2 clusters by metabat (https://bitbucket.org/berkeleylab/metabat/wiki/Home) .
May I know the available colors and markers in vizbin? I referred to the color set in ggplot2 but clearly some colors names couldn't be used in VizBin.
Running VizBin in graphical mode initializes the $HOME/.vizbin
directory.
However, this step seems to be missing when VizBin is used for the first time on a machine and via command-line only.
Not sure how to solve this best, bu probably the intialization should happen automatically and print a message to the shell.
The message is likely to be missed if the subsequent steps progress quickly (the terminal will simply scroll down thus making the intialization message disappear) but it is there in the log.
Currently, VizBin stores the FASTA sequences in memory. This typically works well for current (i.e., short read-based, say Illumina) metagenomic assembly results.
However, third-generation sequencing (e.g., PacBio, ONT) may result in numerous AND long sequences. While this is supposedly not a problem once the composition of the sequences (k-mer profile) is computed, storing numerous AND long sequences in memory for later export (of the selected bins) is problematic.
Hence, integration of compression of the FASTA sequences is encouraged. Theoretically, current metagenomic assembly-based results should also benefit from this by a reduced memory footprint. This in turn would allow to use less memory or it would enable larger datasets to be run with restricted resources. In that sense, the CLR-transformation, dimension reduction, etc. are not the bottlenecks but rather the import and temporary storage of the input sequences.
Which algorithm to use for the compression remains to be seen!
It would be nice to have a way to run VizBin in "non-interactive" mode. That means that the user should be able to set the values of the individual fields (e.g., textfield_file
) at the command line (e.g., java -jar -Dtextfield_file=/path/to/file.fasta
) and then let the whole thing run through all steps without showing the GUI and without requiring to mouse-click on the "Start"-button to execute the run.
This is particularly useful when wanting to integrate VizBin into a pipeline (e.g., to visualize individual clusters as they are returned from an automated binning algorithm). Of course, then, the user would not immediately want do any polygonal selection or such but rather have the points.txt
file saved for later use.
This might be related to #11.
It would be nice if the user were to be able to access other bh_sne parameters such as "perplexity", etc.
-Shaman-
Dear Cedric,
I'm really excited to try VizBin, but I run into the following problem when performing the binning: The program fails to use the "Minimal contig length" option! This is very frustrating.
The "contigs.fa" file I'm using contains sequences of different length, from < 300 to > 200,000.
In the .log-file (see attachment), I get the following:
2017-11-22 16:51:31,001 WARN [AWT-EventQueue-0] (MainFrame.java:893) - Invalid minimal contig length value: 2 000. Using value: 1000
I can of course extract sequences above a certain length myself, but that would be a very slow process compared to specify the length as a parameter. Especially since I need to optimize the clustering by trying out different sequence lengths.
Do you have any suggestions for how to fix this?
Kind regards,
Even Sannes Riiser,
PhD canditate,
University of Oslo, Norway
Provide a "Menu" entry that allows to visualize a legend, probably in a separate window which then pops up.
Is there a way to adjust the learning rate parameter? Also, what is the default learning rate parameter? I checked the publication but I didn't see it in there.
We have realized that using VizBin on high-resolution screens (e.g., MacBook Pro with Retina display) can lead to artifacts in the visualization. These artifacts are purely visual and do not affect the functionality of VizBin.
We are currently working on fixing this issue. Should you experience such artifacts, please let us know so we can keep you posted about the fix as soon as it becomes available.
Stay tuned!
foo bar
When providing a labels file, the points get colored differentially as expected, but the legend is missing/wrong.
Best,
Cedric
Hi,
I am currently using Vizbin on a french keyboard and a space is automatically added between the thousand and hundred which is preventing me from setting a minimal contig length higher than 1000. I tried manually adding a comma in place of the space but this only makes the input number smaller (exemple : 3,500 becomes 3,5). I also tried changing my keyboard from Azerty to Qwerty but with no luck...
I have included a screenshot of the setting which shows the space automatically generated. Any help would be greatly appreciated!
Depending on the availability of additional information (such as coverage; length is trivially available), it would be good to have the functionality to respect this in the plotting.
Ideas include using point size, transparency, color, shape to reflect different things. Length could be point size, coverage could be transparency. Sequences with particular functions could be depicted as star-like shapes.
We will need to define how the user can/should provide this information and how the different information sources (coverage, function, etc.) will then get represented.
When performing a manual selection (or even once semi-automated clustering is integrated) and then exporting the selection, a screenshot should be created simultaneously for reproducibility reasons.
Currently, the user has to save a screenshot separately. While this is readily working, it is not particularly user-friendly.
On a dataset coprising around 70k sequences, I got the following error, despite using java -jar -Xmx3g VizBin-dist.jar
java.lang.OutOfMemoryError: Java heap space
at org.ejml.data.DenseMatrix64F.<init>(Unknown Source)
at org.ejml.alg.dense.decomposition.svd.SvdImplicitQrDecompose.getW(Unknown Source)
at org.ejml.alg.dense.decomposition.svd.SvdImplicitQrDecompose.getW(Unknown Source)
at lcsb.vizbin.service.utils.PrincipleComponentAnalysis.computeBasis(PrincipleComponentAnalysis.java:137)
at lcsb.vizbin.service.utils.DataSetUtils.computePca(DataSetUtils.java:260)
at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:153)
Suggested solution:
Implement a different way of computing the initial PCA-based dimension reduction. Maybe using something along the lines of http://math.nist.gov/javanumerics/jama/ and http://sujitpal.blogspot.com/2008/10/ir-math-in-java-cluster-visualization.html
[EDIT]
See also http://stackoverflow.com/questions/529457/performance-of-java-matrix-math-libraries for other alternatives, in particular JBLAS.
ComparingPrincipleComponentAnalysisEJML.java
of revision01
to PrincipleComponentAnalysis.java
in rc01
seems to have a superfluous samples field in revision01
. This leads to storing the entire genomic signature information again and can thus cause problems with the Java VM heap size, basically ending in an Out-of-memory
error, e.g. for -Xmx3g
and around 88k points.
The samples
field is currently not used in PrincipleComponentAnalysisEJML.java
as only double[] sampleToEigenSpace(double[] sampleData);
is called but not double[] sampleToEigenSpace(int sample);
which actually makes use of the samples
field. Should we want to use this functionality, probably getting A[i]
and adding the mean[]
is better as it saves quite a bit of memory then.
VizBin supports the visualization of multiple datasets at the same time. However, the user may easily loose track of which visualization belongs to which dataset.
Hence, it would be nice to have the name of the dataset (à la basename input.fa
) as the window title or such.
Moreover, it would be nice if VizBin would automatically suggest a filename for the export based on the input filename. This would help to minimize errors when working with multiple datasets.
Greetings
I used VizBin in the past, and it has never disappointed. It is a very good tool to visualize a spatial dispersion of the contigs, and very easy to install and use. However, I recently had to reformat my server, and when trying to reinstall VizBin as instructed in #15 , the ant build runs out of memory.
-verify-automatic-build:
-pre-pre-compile:
-pre-compile:
-copy-persistence-xml:
-compile-depend:
-do-compile:
[javac] Compiling 45 source files to /mnt/HDDStorage/jsequeira/VizBin/src/interface/VizBin/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 7
[javac] 1 warning
[javac]
[javac]
[javac] The system is out of resources.
[javac] Consult the following stack trace for details.
[javac] java.lang.OutOfMemoryError: Java heap space
[javac] at java.base/java.util.TimeZone.clone(TimeZone.java:753)
[javac] at java.base/sun.util.calendar.ZoneInfo.clone(ZoneInfo.java:639)
[javac] at java.base/java.util.TimeZone.getDefault(TimeZone.java:642)
[javac] at java.base/java.time.ZoneId.systemDefault(ZoneId.java:272)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipUtils.dosToJavaTime(ZipUtils.java:122)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.cen(ZipFileSystem.java:1960)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:1947)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getEntry(ZipFileSystem.java:1334)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:314)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipPath.getAttributes(ZipPath.java:727)
[javac] at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:293)
[javac] at java.base/java.nio.file.Files.readAttributes(Files.java:1763)
[javac] at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
[javac] at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
[javac] at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
[javac] at java.base/java.nio.file.Files.walkFileTree(Files.java:2760)
[javac] at jdk.compiler/com.sun.tools.javac.file.JavacFileManager$ArchiveContainer.<init>(JavacFileManager.java:520)
[javac] at jdk.compiler/com.sun.tools.javac.file.JavacFileManager.getContainer(JavacFileManager.java:316)
[javac] at jdk.compiler/com.sun.tools.javac.file.JavacFileManager.list(JavacFileManager.java:712)
[javac] at jdk.compiler/com.sun.tools.javac.code.ClassFinder.list(ClassFinder.java:734)
[javac] at jdk.compiler/com.sun.tools.javac.code.ClassFinder.scanUserPaths(ClassFinder.java:678)
[javac] at jdk.compiler/com.sun.tools.javac.code.ClassFinder.fillIn(ClassFinder.java:548)
[javac] at jdk.compiler/com.sun.tools.javac.code.ClassFinder.complete(ClassFinder.java:299)
[javac] at jdk.compiler/com.sun.tools.javac.code.ClassFinder$$Lambda$170/0x000000080018a040.complete(Unknown Source)
[javac] at jdk.compiler/com.sun.tools.javac.code.Symbol.complete(Symbol.java:642)
[javac] at jdk.compiler/com.sun.tools.javac.code.Symbol$PackageSymbol.members(Symbol.java:1131)
[javac] at jdk.compiler/com.sun.tools.javac.code.Symtab.listPackageModules(Symtab.java:832)
[javac] at jdk.compiler/com.sun.tools.javac.comp.Enter.visitTopLevel(Enter.java:345)
[javac] at jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:529)
[javac] at jdk.compiler/com.sun.tools.javac.comp.Enter.classEnter(Enter.java:286)
[javac] at jdk.compiler/com.sun.tools.javac.comp.Enter.classEnter(Enter.java:301)
[javac] at jdk.compiler/com.sun.tools.javac.comp.Enter.complete(Enter.java:576)
BUILD FAILED
/path/to/VizBin/src/interface/VizBin/nbproject/build-impl.xml:920: The following error occurred while executing this line:
/path/to/VizBin/src/interface/VizBin/nbproject/build-impl.xml:260: Compile failed; see the compiler error output for details.
Total time: 583 minutes 13 seconds
Since there have been no commits ever since I got this to work, I am wondering what could cause this problem. Has this been encountered when testing? I really would like to have this solved before integrating VizBin in my omics pipeline.
Thank you for your attention!
When I take a fasta file that contains sequences that are below the length threshold (e.g., 1 knt) and want to use a matching labels file, I get the following error:
2014-09-25 16:56:35,995 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Fitting performed in 0.00 seconds.
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Wrote the 60079 x 2 data matrix successfully!
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE:
2014-09-25 16:56:36,525 DEBUG [Thread-0] (ProcessInput.java:88) - Points created.
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
at java.util.ArrayList.rangeCheck(ArrayList.java:638)
at java.util.ArrayList.get(ArrayList.java:414)
at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)
2014-09-25 16:56:36,759 DEBUG [Thread-0] (ProcessInput.java:88) - Error! Check the logs.
2014-09-25 16:56:36,761 ERROR [Thread-0] (ProcessInput.java:250) - Index: 60079, Size: 60079
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
at java.util.ArrayList.rangeCheck(ArrayList.java:638)
at java.util.ArrayList.get(ArrayList.java:414)
at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)
When I do the length filtering before and include a matching labels file, VizBin runs through.
Probably that's a bug in connecting the labels with the sequences where the labels are not properly matched to the length-selected sequences.
Tried the new combo box on DaVis_testdat.fa
, DaVis_testdat.length.ann
, and DaVis_testdata.points.txt
but got giant shapes no matter what choosing Yes
or No
. Trying it without a points file did not work either.
Just downloaded the latest version
from GitHub:
https://github.com/claczny/VizBin
when running the setup I got these errors:
Downloads/VizBin/setupUbuntu.sh
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libgsl-dev' instead of 'libgsl0-dev'
Package openjdk-7-jre is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
Package libgsl0ldbl is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
libgsl2 libgsl2:i386
Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'openjdk-7-jre' has no installation candidate
E: Package 'libgsl0ldbl' has no installation candidate
E: Package 'openjdk-7-jdk' has no installation candidate
real 0m0.399s
user 0m0.376s
sys 0m0.012s
fatal: destination path 'osxcross' already exists and is not an empty directory.
Downloads/VizBin/setupUbuntu.sh: line 14: cd: tarballs: No such file
or directory
Downloads/VizBin/setupUbuntu.sh: line 18: ./build.sh: No such file or directory
real 0m0.000s
user 0m0.000s
sys 0m0.000s
Downloads/VizBin/setupUbuntu.sh: line 19: ./build_gcc.sh: No such file
or directory
real 0m0.000s
user 0m0.000s
sys 0m0.000s
mkdir: cannot create directory ‘boost’: File exists
--2019-12-03 12:06:39--
http://downloads.sourceforge.net/project/boost/boost/1.55.0/boost_1_55_0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fboost%2Ffiles%2Fboost%2F1.55.0%2F&ts=1402397648&use_mirror=softlayer-ams
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 216.105.38.13
Connecting to downloads.sourceforge.net
(downloads.sourceforge.net)|216.105.38.13|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2019-12-03 12:06:39 ERROR 403: Forbidden.
fatal: destination path 'mxe' already exists and is not an empty directory.
make: *** No rule to make target 'gcc'. Stop.
real 0m0.003s
user 0m0.000s
sys 0m0.000s
I already has openjdk-8 (jdk and jre) installed and cannot downgrade to
openjdk-7 because I am using Linux Mint Sarah 18
My (Mint) system:
4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64
x86_64 x86_64 GNU/Linux
Any tips ?
When zooming in deeply and doing a polygonal selection it seems that the decision if a point belongs to a cluster (contains()
) becomes problematic:
The polygon clearly contains more than 2 sequences.
This seems to happen only when zooming in very deeply so it may not be a frequent problem. Nevertheless, this needs to be checked/fixed.
Currently, the results of the dimensionality reduction are stored in a temporary folder, the log is stored in the home-folder, and the sequence-input in some arbitrary folder. To improve reproducibility and ease-of-use, functionality to load/save a run should be provided.
For saving, the entire log of the current session, the 2D coordinates (points.txt
), a copy of the sequences passing the threshold, and the associated (if available) annotation should be stored in a (via a dialog) user-specified folder. Loading should take the sequence file, the 2D coordinates and the annotation and display the resulting embedding in an according visualization without recomputing the embedding.
Hi claczny- thanks for creating such an interesting tool. I'm very curious to give it a try. I have tried downloading both the stand alone via the "Download app" on your homepage as well as cloned your github repository and am running into trouble.
Double-clicking the .jar does nothing; java -jar VisBin-dist.jar
produces
Exception in thread "main" java.lang.UnsupportedClassVersionError: lu/uni/lcsb/vizbin/Main : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: lu.uni.lcsb.vizbin.Main. Program will exit.
Any advice would be great.
Hi!
This error appear when uploading my contigs fasta file and annotation file with label and gc columns. I am attaching the annotation file.
2016-05-10 16:08:38,792 ERROR Thread-0 - Unknonw column header: 67
lcsb.vizbin.service.InvalidMetaFileException: Unknonw column header: 67
at lcsb.vizbin.service.DataSetFactory.createDataSetFromFastaFile(DataSetFactory.java:321)
at lcsb.vizbin.service.DataSetFactory.createDataSetFromFastaFile(DataSetFactory.java:212)
at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:134)
2016-05-10 16:08:38,795 DEBUG Thread-0 - Error! Check the logs.
annotation_for_VizBin_gc.csv.txt
When running the program without the annotation file, it runs correctly but warning about:
Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Help is very wecome!
I get the following message when running VizBin (revision01) on a Windows or Linux machine (w/ Java 7):
2014-12-01 08:39:08,619 DEBUG [Thread-0] (ProcessInput.java:88) - Running PCA... (Mtj)
Dec 01, 2014 8:39:09 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
Dec 01, 2014 8:39:09 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
2014-12-01 08:39:14,801 DEBUG [Thread-0] (DataSetUtils.java:249) - DONE: Computed the new basis.
2014-12-01 08:39:15,127 DEBUG [Thread-0] (DataSetUtils.java:256) - DONE: Projected from sample to eigen space.
The program returns a visualization, so apparently something is running and creating the initial projection. However, this behavior is not optimal.
I also came across the following posts but neither of them provided a clear solution for me, for now.
fommil/matrix-toolkits-java#38
fommil/matrix-toolkits-java#50
Below will be some snapshots/ideas that I found to be important to note down before the merge.
JFreeChart appears to be solving many of the other issues (e.g, #3 or #1).
Here is our first attempt to use a scatter plot in combination with the ability to draw a polygon:
import java.awt.BasicStroke;
import java.awt.Color;
import java.awt.Point;
import java.awt.Stroke;
import java.awt.event.MouseAdapter;
import java.awt.event.MouseEvent;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import javax.swing.JFrame;
import javax.swing.JOptionPane;
import javax.swing.JPanel;
import javax.swing.SwingUtilities;
import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.ChartRenderingInfo;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.chart.plot.XYPlot;
import org.jfree.chart.renderer.xy.XYLineAndShapeRenderer;
import org.jfree.data.xy.XYDataset;
import org.jfree.data.xy.XYSeries;
import org.jfree.data.xy.XYSeriesCollection;
import org.jfree.ui.RectangleEdge;
// NEW
import org.jfree.chart.annotations.XYPolygonAnnotation;
import org.jfree.chart.axis.ValueAxis;
import java.util.ArrayList;
public class MouseMarkerDemo extends JFrame {
/**
*
*/
private static final long serialVersionUID = 1L;
public MouseMarkerDemo(String title) {
super(title);
JPanel chartPanel = createDemoPanel();
chartPanel.setPreferredSize(new java.awt.Dimension(500, 270));
setContentPane(chartPanel);
}
private final static class MouseMarker extends MouseAdapter{
private ArrayList<Double> edges = new ArrayList<Double>();
private XYPolygonAnnotation a1;
private final XYPlot plot;
private final JFreeChart chart;
private final ChartPanel panel;
public MouseMarker(ChartPanel panel) {
this.panel = panel;
this.chart = panel.getChart();
this.plot = chart.getXYPlot();
this.plot.setDomainGridlinesVisible(false);
this.plot.setRangeGridlinesVisible(false);
this.plot.setBackgroundPaint(Color.white);
}
private void clearMarker(){
if( a1 != null)
{
plot.removeAnnotation(a1);
}
}
private void updateMarker(){
Stroke stroke1 = new BasicStroke(2.0f);
if (a1 != null)
{
plot.removeAnnotation(a1);
}
Double[] edgesArr = new Double[edges.size()];
edgesArr = edges.toArray(edgesArr);
double[] tempArray = new double[edges.size()];
int i = 0;
for(Double d : edges) {
tempArray[i] = (double) d;
i++;
}
a1 = new XYPolygonAnnotation(tempArray, stroke1, Color.red, new Color(200, 200, 200, 130));
plot.addAnnotation(a1);
}
public void mouseClicked(MouseEvent e) {
if(SwingUtilities.isMiddleMouseButton(e)){
int selectedValue = JOptionPane.showConfirmDialog(null,"Click 'Yes' to export current selection.\nClick 'No' to continue selecting.\nClick 'Cancel' to discard current selection.", "What to do with current selection?", JOptionPane.YES_NO_CANCEL_OPTION);
switch(selectedValue)
{
case JOptionPane.YES_OPTION: System.out.println("YES");
break;
case JOptionPane.NO_OPTION: System.out.println("NO");
break;
default: edges.clear();
clearMarker();
break;
}
}
}
@Override
public void mouseReleased(MouseEvent e) {
if(SwingUtilities.isLeftMouseButton(e)){
// Motivated by http://www.jfree.org/phpBB2/viewtopic.php?p=54140
int mouseX = e.getX();
int mouseY = e.getY();
// DEBUG
//System.out.println("x = " + mouseX + ", y = " + mouseY);
Point2D p = panel.translateScreenToJava2D(
new Point(mouseX, mouseY));
XYPlot plot = (XYPlot) chart.getPlot();
ChartRenderingInfo info = panel.getChartRenderingInfo();
Rectangle2D dataArea = info.getPlotInfo().getDataArea();
ValueAxis domainAxis = plot.getDomainAxis();
RectangleEdge domainAxisEdge = plot.getDomainAxisEdge();
ValueAxis rangeAxis = plot.getRangeAxis();
RectangleEdge rangeAxisEdge = plot.getRangeAxisEdge();
double chartX = domainAxis.java2DToValue(p.getX(), dataArea,
domainAxisEdge);
double chartY = rangeAxis.java2DToValue(p.getY(), dataArea,
rangeAxisEdge);
// DEBUG
//System.out.println("Chart: x = " + chartX + ", y = " + chartY);
edges.add(chartX);
edges.add(chartY);
System.out.println(edges.size());
updateMarker();
}
}
}
private static XYDataset createDataset() {
XYSeriesCollection dataset = new XYSeriesCollection();
/** A constant for the number of items in the sample dataset. */
int COUNT = 5000;
XYSeries dataXYSeries = new XYSeries("Data");
for (int i = 0; i < COUNT; i++) {
float x = (float) i;
float y = (float) Math.random() * COUNT;
dataXYSeries.add(x, y);
}
dataset.addSeries(dataXYSeries);
return dataset;
}
private static JFreeChart createChart(XYDataset dataset) {
JFreeChart chart = ChartFactory.createScatterPlot(
"Mouse Marker",
"X",
"Y",
dataset,
PlotOrientation.VERTICAL,
true,
true,
false
);
XYPlot plot = (XYPlot) chart.getPlot();
plot.setDomainPannable(true);
plot.setRangePannable(true);
XYLineAndShapeRenderer renderer
= (XYLineAndShapeRenderer) plot.getRenderer();
renderer.setBaseShapesVisible(true);
renderer.setBaseShapesFilled(true);
return chart;
}
public static JPanel createDemoPanel() {
final JFreeChart chart = createChart(createDataset());
final ChartPanel panel = new ChartPanel(chart);
panel.setRangeZoomable(true);
panel.setDomainZoomable(true);
panel.setMouseWheelEnabled(true);
panel.addMouseListener(new MouseMarker(panel));
return panel;
}
public static void main(String[] args) {
MouseMarkerDemo demo = new MouseMarkerDemo("JFreeChart: MouseMarkerDemo.java");
demo.pack();
demo.setVisible(true);
}
}
And the resulting plot looks like this
Since the edges of the polygon seem to have the right coordinates relative to the actual points, it should be feasible to integrate it with our current solution for selection points contained within a polygon.
Annotation file
After maxbin, the binned files were concatenated into one to be an input file for Vizbin. Headers being contig ids. (eg >k123, >k99, etc)
Annotation file was made starting with labels as the first line and then 1-11 as categorical variables. But after multiple tries, it still gives just blue and red color. Can you please help me figure it out ?
Thank you very much
When launching VizBin from the command line, e.g.,
time java -jar VizBin-dist.jar -i EqualSet01.fa -o EqualSet01.coords
I get the following output in the terminal:
bmp00223:Downloads cedric.laczny$ java -jar VizBin-dist.jar
log4j:ERROR setFile(null,false) call failed.
java.io.FileNotFoundException: /logs/lcsb-vizbin.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:117)
at lu.uni.lcsb.vizbin.settings.Settings.<clinit>(Settings.java:21)
at lu.uni.lcsb.vizbin.Main.main(Main.java:43)
Console usage:
usage: myapp [-a <number>] [-c <cut-off>] [-h] -i <input-file> [-k
<length>] -o <output-file> [-p <number>] [-t <number>]
-a,--pca <number> number of PCA columns [default=50]
-c,--cut-off <cut-off> minimum conting length [default=1000]
-h,--help print this help menu
-i,--input <input-file> input file in fasta format
-k,--k-mer <length> k-mer length [default=5]
-o,--output <output-file> output file containing coordinates
-p,--perplexity <number> perplexity parameter [default=30.0]
-t,--thread <number> number of threads [default=1]
2020-01-09 15:03:37,214 DEBUG [main] (MainFrame.java:87) - Init of Main application frame
2020-01-09 15:03:37,288 DEBUG [main] (MainFrame.java:92) - Property changed: defaultCloseOperation. Old: 1 New: 3
2020-01-09 15:03:38,513 DEBUG [main] (MainFrame.java:92) - Property changed: font. Old: null New: java.awt.Font[family=Lucida Grande,name=Lucida Grande,style=plain,size=13]
.
.
.
Interestingly, it still runs through:
2020-01-09 14:33:38,038 DEBUG [Thread-0] (ProcessInput.java:69) - Points created.
2020-01-09 14:33:38,038 DEBUG [Thread-0] (ObjectWithProperties.java:33) - Property changed: POINTS_FILE. Old: null New: /var/folders/mv/zv136yrd1_ng4d2w899qj4g0lgzc4c/T/map8586245099513520896/points.txt
2020-01-09 14:33:38,039 DEBUG [AWT-EventQueue-0] (ProcessInput.java:85) - [PROGRESS BAR] Points created. (35)
2020-01-09 14:33:38,107 DEBUG [Thread-0] (ProcessInput.java:69) - Done.
2020-01-09 14:33:38,107 DEBUG [AWT-EventQueue-0] (ProcessInput.java:85) - [PROGRESS BAR] Done. (100)
2020-01-09 14:33:38,124 DEBUG [Thread-0] (ObjectWithProperties.java:33) - Property changed: FINISHED. Old: false New: true
However, this error message is confusing and should not appear.
Best,
Cedric
I want to write a Python wrapper that calls the very fast implementation in https://github.com/claczny/VizBin/blob/master/src/backend/bh_tsne/ptsne.cpp in the backend and then load the embeddings into pandas
or numpy
.
My main question is how to call the ptsne.cpp script in an OSX terminal?
It looks in the input data just needs to be parameters on each line and then the PCA components for each row separated by spaces between columns. Apologies if this is out-of-scope but I can post my code (if it works) on here if anyone would find it useful.
Suggested new issue via #15
I'd like Vizbin to recognize masked sequences, i.e. ignore small letters. This would be useful to ignore e.g. 16S regions or other regions that obscure kmer profiles.
Usually, the user would supply the already masked sequence, but if you're mega cool, you could include a module that recognizes highly conserved/structural regions and does the masking internally.
The wiki needs update.
Howdy there,
I am using VizBin to visualize and manually bin mags for host associated bacterial populations. I have a single .fasta file that contains scaffolds corresponding to samples. In other words, multiple sample .fasta files were combined into a single .fasta file so that I can bin genomes across all samples of interest. My goal here was to create an annotation file to reflect this. Each sample, or label (#1-9), corresponding to a color. Any color, really doesn't matter. I created the annotation file from the combined fasta file in efforts to maintain the scaffold order and find other interesting properties such as length and gc content.
The issue I am having is that the annotation file is working, I think, but the labels are not being read. To explain further, it appears that the size of each point is changing based on length. That is helpful somewhat.
In the future I would like to add a reference genome of my "bacteria of interest" as a marker to aid my binning (receive more complete bins and avoid contamination as much as possible). I have yet to add this to my annotation file since the labels aren't being read.
Beyond that -- I have MANY scaffolds that I am inputting into VizBin (>4mil). I have set minimum contig length to 2Kb or 3Kb. The annotation file and fasta contain the same number of entries prior to input into VizBin. The minimum contig length does toss out plenty of scaffolds, but not so much that only one label is left.
Below is the head and tails of both my annotation file and the .fasta file.
A) head annotation file
label,length,gc
1,134077,45.42
1,87175,45.16
1,65686,45.71
1,52865,45.92
1,44948,34.86
1,42530,45.30
1,42475,46.38
1,40293,45.94
1,29404,48.00
B) tail annotation file
9,200,56.50
9,200,55.50
9,200,60.00
9,200,53.00
9,200,52.50
9,200,53.00
9,200,42.00
9,200,42.50
9,200,34.00
9,200,57.00
C) head fasta file
D0_SEK2_2_scaffold_1_c1
CAATCGATACGACCCCGGAGAGCGGCTTTTGCTAAAACTCGAGCAGTTTCTTGAAAACTT
GCTTCTGATATGAAACTTTGAGTATTTAGAGATGCTTTCGTTATTCCCAATAAGATGGCT
CGATAACAGATCGCTTCTTCCAAAGCACGCCCTGTTCGTTCCGCCCGCAACAATTCAATT
AGTTCTCCGGGTGAAAAAACATTAGACATTCTATCTTCTGAAACCAACACTTTTGATGTT
ATTTGACGCACAATAATCTCTATATGTCGATTATGAATCTGCACTCCCTGAGATCGATAA
ACTTTTTGGATCTTATTAACCAAAGAGATACGACTTTGCACTATAGTTAGCTCAGCACCA
ATCAAGAATCCCCAAGGAATTCCAAGAATTTTTGCTATACGCTCGTTCCAACCCTCAATC
CTCTTTTCTAGGTTCATCGATATTGAATCAATCGAACGAACTTCTAACACTTGTTCCACT
TTTGGAAGACCTTGCGTTATATCTCCAGATCTCTATTTTTCATATATAAATGTAACTAAC
D) tail fasta file
W_SEK2_D15_scaffold_211474_c1
ATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTCATTACACTTTATA
TGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTC
ATTACACTTTATATGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTT
GCAATAAACCTTCATTACAC
W_SEK2_D15_scaffold_113313_c1
CAGCAGACCGTGATGTCTTACGCCTGTGTTGCCCTCTACCGCTATGCGGTTGGTAAGCCA
GTGCCAGGGTTCGACCCAACGGCTATGCAGGGAGCGTTCCGAGTGAAGAAGCAGAAGTTC
ACCGGACAAGCCGGAGCCTAATTAGCGCCTAGGGCCACTCCGCGAACGAGAGCCTTCTGG
AAGTTCAGGTAAATGAACAC
note: D0_SEK2_2 is a sample name that I replaced with 1 in the label column of the annotation file. I thought potentially this software wasn't reading the labels correctly due to the underscores or the combination of letters and numbers. However, when it is just numbers, I am failing to get anything.
Any help would be great,
Jonathan
Post Doc, UCSD
The user can pretty much provide any file as input, even it is not a FASTA file. This should be checked more stringently and communicated to the user.
PrincipalComponentsAnalysisMtj.java
includes the following fields:
double[][] A_init;
DenseMatrix A;
DenseMatrix Y;
and A
is initialized by A_init
. A_init
is no longer used afterwards and should be cleared from the memory. Moreover, DenseMatrix extends AbstractDenseMatrix
and AbstractDenseMatrix
implements public void set(int row, int column, double value)
which could then be used to directly populate the matrices A
and Y
, thus saving one copy of the data.
Since A
is modified during the singular value decomposition, we need to create and keep Y
.
It would be good to have an option that allows to export the high-dimensional data on which the CLR-transformation was applied (512D -> CLR-512D -> 50D -> 2D; so basically the second step of these four). This should be in a menu-like entry, kind of under "Debug" or so.
It should not be in the "Advanced options" as this is more of a debug feature rather than something the general user would use.
VizBin has evolved organically over time and has multiple contributing authors. The underlying Java code could thus benefit from a rewrite, at least by the increased use of existing Java libraries (that have been tested exhaustively over time). Accordingly, some of the parts may be substituted by existing Java libraries instead of currently used custom-written code.
This issue will keep track of libraries that appear to be interesting and related points.
Operation | Suggestions |
---|---|
Input/Output of fasta files | http://biojava.org/wiki/BioJava:CookBook:Core:FastaReadWrite |
Compression/Decompression of fasta sequences to save memory during runtime | https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html -> Seems applicable to individual strings -> foreach seq: compute kmer frequency; seq = compress(seq) |
Feel free to extend/modify the list.
Best,
Cedric
Currently, the DataSetFactory
normalizes the GC and coverage values of a Sequence
with respect to the maximum in the DataSet
. However, this should not be done in this way. The annotations that are provided for every sequence should not be modified per Sequence
but rather only when needed, e.g., when preparing the plotting in ClusterPanel
.
Hi I am very new to this software. I asked the question because when I tried to add the annotation, there was an error and I couldn't locate the log file in my macbook.
Also I wanna know if I can just label the 1st sequence in my file? I made a quite simple annotation file with only two lines :
isMarker
1
Could this annotation file cause the error? I am not a man with lots of coding experience. So I decided to not fill the series of 0 after the 1.
Many thanks
to be removed
When visualizing sequences that are derived from a larger collection (say, 20) of labels, the symbols for "circle" and for "rectangle" are not consistent between the legend and the main plotting window.
It appears that in the main plotting window, they are indeed correctly displayed but in the legend, the circles appear as small rectangles.
Overall, this is not consistent and expected to not be intuitive.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.