claczny / vizbin Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 14.0 218.73 MB

Repository of our application for human-augmented binning

Makefile 1.38% Shell 0.46% C++ 29.52% Java 68.65%

binning bioinformatics java machine-learning metagenomics visualisation

vizbin's People

Contributors

Stargazers

Watchers

Forkers

mamu2002 martin-popp-fredslund bioinformaticsarchive piotr-gawron yjarosz uafshahid izaakmiller ctskennerton liupfskygre diegoibt arghya1611 ra2003 bikmi

vizbin's Issues

this is a test issue

did changing the email notification work?

how to define the color in vizbin?

Hello Cedric,
I wanna re-plot my clusters with new colors instead of default blue. So I create the label file, in the label file, I left non-targeting sequences with 0, and used red, yellow, green, orange,black,pink,purple,cyan,and gold for different clusters. However, in the visualisation, there were much more red dots than I expected and some colors didn't show. If I didn't provide the label information, all sequences would be blue dots. But now it seems like red is the default color. Also, the yellow-green fusion region can be divided into 2 clusters by metabat (https://bitbucket.org/berkeleylab/metabat/wiki/Home) .

May I know the available colors and markers in vizbin? I referred to the color set in ggplot2 but clearly some colors names couldn't be used in VizBin.

Many thanks

$HOME/.vizbin folder is not initialized when run for the first time on a machine and via the command-line only

Running VizBin in graphical mode initializes the $HOME/.vizbin directory.
However, this step seems to be missing when VizBin is used for the first time on a machine and via command-line only.

Not sure how to solve this best, bu probably the intialization should happen automatically and print a message to the shell.
The message is likely to be missed if the subsequent steps progress quickly (the terminal will simply scroll down thus making the intialization message disappear) but it is there in the log.

Integrate compression of FASTA sequences

Currently, VizBin stores the FASTA sequences in memory. This typically works well for current (i.e., short read-based, say Illumina) metagenomic assembly results.
However, third-generation sequencing (e.g., PacBio, ONT) may result in numerous AND long sequences. While this is supposedly not a problem once the composition of the sequences (k-mer profile) is computed, storing numerous AND long sequences in memory for later export (of the selected bins) is problematic.

Hence, integration of compression of the FASTA sequences is encouraged. Theoretically, current metagenomic assembly-based results should also benefit from this by a reduced memory footprint. This in turn would allow to use less memory or it would enable larger datasets to be run with restricted resources. In that sense, the CLR-transformation, dimension reduction, etc. are not the bottlenecks but rather the import and temporary storage of the input sequences.

Which algorithm to use for the compression remains to be seen!

Allowing to specify input files at the command-line

It would be nice to have a way to run VizBin in "non-interactive" mode. That means that the user should be able to set the values of the individual fields (e.g., textfield_file) at the command line (e.g., java -jar -Dtextfield_file=/path/to/file.fasta) and then let the whole thing run through all steps without showing the GUI and without requiring to mouse-click on the "Start"-button to execute the run.

This is particularly useful when wanting to integrate VizBin into a pipeline (e.g., to visualize individual clusters as they are returned from an automated binning algorithm). Of course, then, the user would not immediately want do any polygonal selection or such but rather have the points.txt file saved for later use.

This might be related to #11.

Access to more of bh_sne parameter

It would be nice if the user were to be able to access other bh_sne parameters such as "perplexity", etc.

-Shaman-

"Minimal conting length" not working

Dear Cedric,

I'm really excited to try VizBin, but I run into the following problem when performing the binning: The program fails to use the "Minimal contig length" option! This is very frustrating.

The "contigs.fa" file I'm using contains sequences of different length, from < 300 to > 200,000.

In the .log-file (see attachment), I get the following:

2017-11-22 16:51:31,001 WARN [AWT-EventQueue-0] (MainFrame.java:893) - Invalid minimal contig length value: 2 000. Using value: 1000

I can of course extract sequences above a certain length myself, but that would be a very slow process compared to specify the length as a parameter. Especially since I need to optimize the clustering by trying out different sequence lengths.

Do you have any suggestions for how to fix this?

Kind regards,

Even Sannes Riiser,
PhD canditate,
University of Oslo, Norway

log.txt

Provide legend

Provide a "Menu" entry that allows to visualize a legend, probably in a separate window which then pops up.

Display learning rate parameter

Is there a way to adjust the learning rate parameter? Also, what is the default learning rate parameter? I checked the publication but I didn't see it in there.

High-resolution screens lead to artifacts in the visualization windows

We have realized that using VizBin on high-resolution screens (e.g., MacBook Pro with Retina display) can lead to artifacts in the visualization. These artifacts are purely visual and do not affect the functionality of VizBin.

We are currently working on fixing this issue. Should you experience such artifacts, please let us know so we can keep you posted about the fix as soon as it becomes available.

Stay tuned!

Test issue: email

foo bar

Improve handling of labels

When providing a labels file, the points get colored differentially as expected, but the legend is missing/wrong.

Best,

Cedric

Issue with space between thousands separators

Hi,

I am currently using Vizbin on a french keyboard and a space is automatically added between the thousand and hundred which is preventing me from setting a minimal contig length higher than 1000. I tried manually adding a comma in place of the space but this only makes the input number smaller (exemple : 3,500 becomes 3,5). I also tried changing my keyboard from Azerty to Qwerty but with no luck...

I have included a screenshot of the setting which shows the space automatically generated. Any help would be greatly appreciated!

Allow plotting of additional information, e.g., sequence length or sequence coverage

Depending on the availability of additional information (such as coverage; length is trivially available), it would be good to have the functionality to respect this in the plotting.
Ideas include using point size, transparency, color, shape to reflect different things. Length could be point size, coverage could be transparency. Sequences with particular functions could be depicted as star-like shapes.
We will need to define how the user can/should provide this information and how the different information sources (coverage, function, etc.) will then get represented.

Automatically save a screenshot when exporting a selection

When performing a manual selection (or even once semi-automated clustering is integrated) and then exporting the selection, a screenshot should be created simultaneously for reproducibility reasons.
Currently, the user has to save a screenshot separately. While this is readily working, it is not particularly user-friendly.

Getting OutOfMemory error despite bigger heap size

On a dataset coprising around 70k sequences, I got the following error, despite using java -jar -Xmx3g VizBin-dist.jar

java.lang.OutOfMemoryError: Java heap space
    at org.ejml.data.DenseMatrix64F.<init>(Unknown Source)
    at org.ejml.alg.dense.decomposition.svd.SvdImplicitQrDecompose.getW(Unknown Source)
    at org.ejml.alg.dense.decomposition.svd.SvdImplicitQrDecompose.getW(Unknown Source)
    at lcsb.vizbin.service.utils.PrincipleComponentAnalysis.computeBasis(PrincipleComponentAnalysis.java:137)
    at lcsb.vizbin.service.utils.DataSetUtils.computePca(DataSetUtils.java:260)
    at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:153)

Suggested solution:
Implement a different way of computing the initial PCA-based dimension reduction. Maybe using something along the lines of http://math.nist.gov/javanumerics/jama/ and http://sujitpal.blogspot.com/2008/10/ir-math-in-java-cluster-visualization.html

[EDIT]
See also http://stackoverflow.com/questions/529457/performance-of-java-matrix-math-libraries for other alternatives, in particular JBLAS.

PrincipleComponentAnalysisEJML.java seems to have a superfluous samples field

ComparingPrincipleComponentAnalysisEJML.java of revision01 to PrincipleComponentAnalysis.java in rc01 seems to have a superfluous samples field in revision01. This leads to storing the entire genomic signature information again and can thus cause problems with the Java VM heap size, basically ending in an Out-of-memory error, e.g. for -Xmx3g and around 88k points.

The samples field is currently not used in PrincipleComponentAnalysisEJML.java as only double[] sampleToEigenSpace(double[] sampleData); is called but not double[] sampleToEigenSpace(int sample); which actually makes use of the samples field. Should we want to use this functionality, probably getting A[i] and adding the mean[] is better as it saves quite a bit of memory then.

Displaying/Using the name of the loaded sequence file (e.g., in the window title)

VizBin supports the visualization of multiple datasets at the same time. However, the user may easily loose track of which visualization belongs to which dataset.

Hence, it would be nice to have the name of the dataset (à la basename input.fa) as the window title or such.

Moreover, it would be nice if VizBin would automatically suggest a filename for the export based on the input filename. This would help to minimize errors when working with multiple datasets.

Ant build of devel branch fails

Greetings

I used VizBin in the past, and it has never disappointed. It is a very good tool to visualize a spatial dispersion of the contigs, and very easy to install and use. However, I recently had to reformat my server, and when trying to reinstall VizBin as instructed in #15 , the ant build runs out of memory.

-verify-automatic-build:

-pre-pre-compile:

-pre-compile:

-copy-persistence-xml:

-compile-depend:

-do-compile:
    [javac] Compiling 45 source files to /mnt/HDDStorage/jsequeira/VizBin/src/interface/VizBin/build/classes
    [javac] warning: [options] bootstrap class path not set in conjunction with -source 7

    [javac] 1 warning
    [javac]
    [javac]
    [javac] The system is out of resources.
    [javac] Consult the following stack trace for details.
    [javac] java.lang.OutOfMemoryError: Java heap space
    [javac]     at java.base/java.util.TimeZone.clone(TimeZone.java:753)
    [javac]     at java.base/sun.util.calendar.ZoneInfo.clone(ZoneInfo.java:639)
    [javac]     at java.base/java.util.TimeZone.getDefault(TimeZone.java:642)
    [javac]     at java.base/java.time.ZoneId.systemDefault(ZoneId.java:272)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipUtils.dosToJavaTime(ZipUtils.java:122)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.cen(ZipFileSystem.java:1960)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:1947)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getEntry(ZipFileSystem.java:1334)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:314)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipPath.getAttributes(ZipPath.java:727)
    [javac]     at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:293)
    [javac]     at java.base/java.nio.file.Files.readAttributes(Files.java:1763)
    [javac]     at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
    [javac]     at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
    [javac]     at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
    [javac]     at java.base/java.nio.file.Files.walkFileTree(Files.java:2760)
    [javac]     at jdk.compiler/com.sun.tools.javac.file.JavacFileManager$ArchiveContainer.<init>(JavacFileManager.java:520)
    [javac]     at jdk.compiler/com.sun.tools.javac.file.JavacFileManager.getContainer(JavacFileManager.java:316)
    [javac]     at jdk.compiler/com.sun.tools.javac.file.JavacFileManager.list(JavacFileManager.java:712)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.ClassFinder.list(ClassFinder.java:734)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.ClassFinder.scanUserPaths(ClassFinder.java:678)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.ClassFinder.fillIn(ClassFinder.java:548)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.ClassFinder.complete(ClassFinder.java:299)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.ClassFinder$$Lambda$170/0x000000080018a040.complete(Unknown Source)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.Symbol.complete(Symbol.java:642)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.Symbol$PackageSymbol.members(Symbol.java:1131)
    [javac]     at jdk.compiler/com.sun.tools.javac.code.Symtab.listPackageModules(Symtab.java:832)
    [javac]     at jdk.compiler/com.sun.tools.javac.comp.Enter.visitTopLevel(Enter.java:345)
    [javac]     at jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:529)
    [javac]     at jdk.compiler/com.sun.tools.javac.comp.Enter.classEnter(Enter.java:286)
    [javac]     at jdk.compiler/com.sun.tools.javac.comp.Enter.classEnter(Enter.java:301)
    [javac]     at jdk.compiler/com.sun.tools.javac.comp.Enter.complete(Enter.java:576)

BUILD FAILED
/path/to/VizBin/src/interface/VizBin/nbproject/build-impl.xml:920: The following error occurred while executing this line:
/path/to/VizBin/src/interface/VizBin/nbproject/build-impl.xml:260: Compile failed; see the compiler error output for details.

Total time: 583 minutes 13 seconds

Since there have been no commits ever since I got this to work, I am wondering what could cause this problem. Has this been encountered when testing? I really would like to have this solved before integrating VizBin in my omics pipeline.

Thank you for your attention!

Using labels in combination with length filter seems to cause problems

When I take a fasta file that contains sequences that are below the length threshold (e.g., 1 knt) and want to use a matching labels file, I get the following error:

2014-09-25 16:56:35,995 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Fitting performed in 0.00 seconds.
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE: Wrote the 60079 x 2 data matrix successfully!
2014-09-25 16:56:36,426 DEBUG [TSNERunner] (DataSetUtils.java:318) - TSNE:
2014-09-25 16:56:36,525 DEBUG [Thread-0] (ProcessInput.java:88) - Points created.
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
    at java.util.ArrayList.rangeCheck(ArrayList.java:638)
    at java.util.ArrayList.get(ArrayList.java:414)
    at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
    at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)
2014-09-25 16:56:36,759 DEBUG [Thread-0] (ProcessInput.java:88) - Error! Check the logs.
2014-09-25 16:56:36,761 ERROR [Thread-0] (ProcessInput.java:250) - Index: 60079, Size: 60079
java.lang.IndexOutOfBoundsException: Index: 60079, Size: 60079
    at java.util.ArrayList.rangeCheck(ArrayList.java:638)
    at java.util.ArrayList.get(ArrayList.java:414)
    at lcsb.vizbin.service.DataSetFactory.createDataSetFromPointFile(DataSetFactory.java:75)
    at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:184)

When I do the length filtering before and include a matching labels file, VizBin runs through.
Probably that's a bug in connecting the labels with the sequences where the labels are not properly matched to the length-selected sequences.

"Log coverage/length" combo-box choice not respected

Tried the new combo box on DaVis_testdat.fa, DaVis_testdat.length.ann, and DaVis_testdata.points.txt but got giant shapes no matter what choosing Yes or No. Trying it without a points file did not work either.

Dependencies errors

Just downloaded the latest version
from GitHub:

https://github.com/claczny/VizBin

when running the setup I got these errors:

Downloads/VizBin/setupUbuntu.sh
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libgsl-dev' instead of 'libgsl0-dev'
Package openjdk-7-jre is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

Package libgsl0ldbl is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
libgsl2 libgsl2:i386

Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'openjdk-7-jre' has no installation candidate
E: Package 'libgsl0ldbl' has no installation candidate
E: Package 'openjdk-7-jdk' has no installation candidate

real 0m0.399s
user 0m0.376s
sys 0m0.012s
fatal: destination path 'osxcross' already exists and is not an empty directory.
Downloads/VizBin/setupUbuntu.sh: line 14: cd: tarballs: No such file
or directory
Downloads/VizBin/setupUbuntu.sh: line 18: ./build.sh: No such file or directory

real 0m0.000s
user 0m0.000s
sys 0m0.000s
Downloads/VizBin/setupUbuntu.sh: line 19: ./build_gcc.sh: No such file
or directory

real 0m0.000s
user 0m0.000s
sys 0m0.000s
mkdir: cannot create directory ‘boost’: File exists
--2019-12-03 12:06:39--
http://downloads.sourceforge.net/project/boost/boost/1.55.0/boost_1_55_0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fboost%2Ffiles%2Fboost%2F1.55.0%2F&ts=1402397648&use_mirror=softlayer-ams
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 216.105.38.13
Connecting to downloads.sourceforge.net
(downloads.sourceforge.net)|216.105.38.13|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2019-12-03 12:06:39 ERROR 403: Forbidden.

fatal: destination path 'mxe' already exists and is not an empty directory.
make: *** No rule to make target 'gcc'. Stop.

real 0m0.003s
user 0m0.000s
sys 0m0.000s

I already has openjdk-8 (jdk and jre) installed and cannot downgrade to
openjdk-7 because I am using Linux Mint Sarah 18

My (Mint) system:

4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64
x86_64 x86_64 GNU/Linux

Any tips ?

Problems in polygonal selection with very deep zooming

When zooming in deeply and doing a polygonal selection it seems that the decision if a point belongs to a cluster (contains()) becomes problematic:
The polygon clearly contains more than 2 sequences.

This seems to happen only when zooming in very deeply so it may not be a frequent problem. Nevertheless, this needs to be checked/fixed.

Star shape outline gets incompletely saved to SVG

When saving a plot that includes Star shapes as SVG, the Star shape outline does not get save correctly. It appears as if the path is not finished. Exporting to PNG works fine, s. attachements

Provide functionality to "Save/Load" a run

Currently, the results of the dimensionality reduction are stored in a temporary folder, the log is stored in the home-folder, and the sequence-input in some arbitrary folder. To improve reproducibility and ease-of-use, functionality to load/save a run should be provided.

For saving, the entire log of the current session, the 2D coordinates (points.txt), a copy of the sequences passing the threshold, and the associated (if available) annotation should be stored in a (via a dialog) user-specified folder. Loading should take the sequence file, the 2D coordinates and the annotation and display the resulting embedding in an according visualization without recomputing the embedding.

Error on install

Hi claczny- thanks for creating such an interesting tool. I'm very curious to give it a try. I have tried downloading both the stand alone via the "Download app" on your homepage as well as cloned your github repository and am running into trouble.

Double-clicking the .jar does nothing; java -jar VisBin-dist.jar produces

Exception in thread "main" java.lang.UnsupportedClassVersionError: lu/uni/lcsb/vizbin/Main : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: lu.uni.lcsb.vizbin.Main. Program will exit.

Any advice would be great.

Unknonw column header: 67

Hi!

This error appear when uploading my contigs fasta file and annotation file with label and gc columns. I am attaching the annotation file.

2016-05-10 16:08:38,792 ERROR Thread-0 - Unknonw column header: 67
lcsb.vizbin.service.InvalidMetaFileException: Unknonw column header: 67
at lcsb.vizbin.service.DataSetFactory.createDataSetFromFastaFile(DataSetFactory.java:321)
at lcsb.vizbin.service.DataSetFactory.createDataSetFromFastaFile(DataSetFactory.java:212)
at lu.uni.lcsb.vizbin.ProcessInput$2.run(ProcessInput.java:134)
2016-05-10 16:08:38,795 DEBUG Thread-0 - Error! Check the logs.
annotation_for_VizBin_gc.csv.txt

When running the program without the annotation file, it runs correctly but warning about:

Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK

Help is very wecome!

MTJ-related loading warnings on Windows and Linux

I get the following message when running VizBin (revision01) on a Windows or Linux machine (w/ Java 7):

2014-12-01 08:39:08,619 DEBUG [Thread-0] (ProcessInput.java:88) - Running PCA... (Mtj)
Dec 01, 2014 8:39:09 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
Dec 01, 2014 8:39:09 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
2014-12-01 08:39:14,801 DEBUG [Thread-0] (DataSetUtils.java:249) - DONE: Computed the new basis.
2014-12-01 08:39:15,127 DEBUG [Thread-0] (DataSetUtils.java:256) - DONE: Projected from sample to eigen space.

The program returns a visualization, so apparently something is running and creating the initial projection. However, this behavior is not optimal.

I also came across the following posts but neither of them provided a clear solution for me, for now.
fommil/matrix-toolkits-java#38
fommil/matrix-toolkits-java#50

Why does the confirmation for number of kmers NOT always pop-up?

Below is the attached screenshot. Also, it seems that it the number of kmers shown is not equal to theoretical max: kmers=Total Length +1 -K (Kmer length).
Can you please check it? I can upload my test file if required.

Thanks for making VizBin.

Some snapshots from doing testing before merging `revision01` to `master`

Below will be some snapshots/ideas that I found to be important to note down before the merge.

Integration of JFreeChart

JFreeChart appears to be solving many of the other issues (e.g, #3 or #1).
Here is our first attempt to use a scatter plot in combination with the ability to draw a polygon:

import java.awt.BasicStroke;
import java.awt.Color;
import java.awt.Point;
import java.awt.Stroke;
import java.awt.event.MouseAdapter;
import java.awt.event.MouseEvent;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;

import javax.swing.JFrame;
import javax.swing.JOptionPane;
import javax.swing.JPanel;
import javax.swing.SwingUtilities;

import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.ChartRenderingInfo;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.chart.plot.XYPlot;
import org.jfree.chart.renderer.xy.XYLineAndShapeRenderer;
import org.jfree.data.xy.XYDataset;
import org.jfree.data.xy.XYSeries;
import org.jfree.data.xy.XYSeriesCollection;
import org.jfree.ui.RectangleEdge;
// NEW
import org.jfree.chart.annotations.XYPolygonAnnotation;
import org.jfree.chart.axis.ValueAxis;

import java.util.ArrayList; 


public class MouseMarkerDemo extends JFrame {


    /**
     * 
     */
    private static final long serialVersionUID = 1L;



    public MouseMarkerDemo(String title) {
        super(title);
        JPanel chartPanel = createDemoPanel();
        chartPanel.setPreferredSize(new java.awt.Dimension(500, 270));
        setContentPane(chartPanel);
    }


    private final static class  MouseMarker extends MouseAdapter{
        private ArrayList<Double> edges = new ArrayList<Double>();
        private XYPolygonAnnotation a1;
        private final XYPlot plot;
        private final JFreeChart chart;
        private final ChartPanel panel;


        public MouseMarker(ChartPanel panel) {
            this.panel = panel;
            this.chart = panel.getChart();
            this.plot = chart.getXYPlot();
            this.plot.setDomainGridlinesVisible(false);
            this.plot.setRangeGridlinesVisible(false);
            this.plot.setBackgroundPaint(Color.white);

        }

        private void clearMarker(){
            if( a1 != null)
            {
                plot.removeAnnotation(a1);
            }
        }
        private void updateMarker(){
            Stroke stroke1 = new BasicStroke(2.0f);
            if (a1 != null)
            {
                plot.removeAnnotation(a1);
            }
            Double[] edgesArr = new Double[edges.size()];
            edgesArr = edges.toArray(edgesArr);
            double[] tempArray = new double[edges.size()];
            int i = 0;
            for(Double d : edges) {
              tempArray[i] = (double) d;
              i++;
            }
            a1 = new XYPolygonAnnotation(tempArray, stroke1, Color.red, new Color(200, 200, 200, 130));
            plot.addAnnotation(a1);
        }

        public void mouseClicked(MouseEvent e) {
            if(SwingUtilities.isMiddleMouseButton(e)){
                int selectedValue = JOptionPane.showConfirmDialog(null,"Click 'Yes' to export current selection.\nClick 'No' to continue selecting.\nClick 'Cancel' to discard current selection.", "What to do with current selection?", JOptionPane.YES_NO_CANCEL_OPTION);
                switch(selectedValue)
                {
                case JOptionPane.YES_OPTION:    System.out.println("YES");
                                    break;
                case JOptionPane.NO_OPTION: System.out.println("NO");
                                break;
                default:    edges.clear();
                            clearMarker();
                            break;
                }
            }
        }

        @Override
        public void mouseReleased(MouseEvent e) {
            if(SwingUtilities.isLeftMouseButton(e)){
                // Motivated by http://www.jfree.org/phpBB2/viewtopic.php?p=54140
                int mouseX = e.getX();
                int mouseY = e.getY();
                // DEBUG
                //System.out.println("x = " + mouseX + ", y = " + mouseY);       
                Point2D p = panel.translateScreenToJava2D(
                        new Point(mouseX, mouseY));
                XYPlot plot = (XYPlot) chart.getPlot();
                ChartRenderingInfo info = panel.getChartRenderingInfo();
                Rectangle2D dataArea = info.getPlotInfo().getDataArea();

                ValueAxis domainAxis = plot.getDomainAxis();
                RectangleEdge domainAxisEdge = plot.getDomainAxisEdge();
                ValueAxis rangeAxis = plot.getRangeAxis();
                RectangleEdge rangeAxisEdge = plot.getRangeAxisEdge();
                double chartX = domainAxis.java2DToValue(p.getX(), dataArea,
                        domainAxisEdge);
                double chartY = rangeAxis.java2DToValue(p.getY(), dataArea,
                        rangeAxisEdge);
                // DEBUG
                //System.out.println("Chart: x = " + chartX + ", y = " + chartY);
                edges.add(chartX);
                edges.add(chartY);

                System.out.println(edges.size());
                updateMarker();
            }
        }
    }

    private static XYDataset createDataset() {
        XYSeriesCollection dataset = new XYSeriesCollection();
        /** A constant for the number of items in the sample dataset. */
        int COUNT = 5000;

        XYSeries dataXYSeries = new XYSeries("Data");
        for (int i = 0; i < COUNT; i++) {
            float x = (float) i;
            float y = (float) Math.random() * COUNT;
            dataXYSeries.add(x, y);
        }
        dataset.addSeries(dataXYSeries);
        return dataset;

    }


    private static JFreeChart createChart(XYDataset dataset) {

        JFreeChart chart = ChartFactory.createScatterPlot(
            "Mouse Marker",
            "X",
            "Y",
            dataset,
            PlotOrientation.VERTICAL,
            true,
            true,
            false
        );
        XYPlot plot = (XYPlot) chart.getPlot();
        plot.setDomainPannable(true);
        plot.setRangePannable(true);
        XYLineAndShapeRenderer renderer
                = (XYLineAndShapeRenderer) plot.getRenderer();
        renderer.setBaseShapesVisible(true);
        renderer.setBaseShapesFilled(true);
        return chart;
    }

    public static JPanel createDemoPanel() {
        final JFreeChart chart = createChart(createDataset());
        final ChartPanel panel = new ChartPanel(chart);
        panel.setRangeZoomable(true);
        panel.setDomainZoomable(true);
        panel.setMouseWheelEnabled(true);
        panel.addMouseListener(new MouseMarker(panel));
        return panel;
    }



    public static void main(String[] args) {
        MouseMarkerDemo demo = new MouseMarkerDemo("JFreeChart: MouseMarkerDemo.java");
        demo.pack();
        demo.setVisible(true);
    }

}

And the resulting plot looks like this

Since the edges of the polygon seem to have the right coordinates relative to the actual points, it should be feasible to integrate it with our current solution for selection points contained within a polygon.

Annotation file

Annotation file
After maxbin, the binned files were concatenated into one to be an input file for Vizbin. Headers being contig ids. (eg >k123, >k99, etc)
Annotation file was made starting with labels as the first line and then 1-11 as categorical variables. But after multiple tries, it still gives just blue and red color. Can you please help me figure it out ?

Thank you very much

Error message when launching VizBin on the command line; but it finishes successfully

When launching VizBin from the command line, e.g.,

time java -jar VizBin-dist.jar -i EqualSet01.fa -o EqualSet01.coords

I get the following output in the terminal:

bmp00223:Downloads cedric.laczny$ java -jar VizBin-dist.jar
log4j:ERROR setFile(null,false) call failed.
java.io.FileNotFoundException: /logs/lcsb-vizbin.log (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
	at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
	at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
	at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
	at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
	at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
	at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
	at org.apache.log4j.Logger.getLogger(Logger.java:117)
	at lu.uni.lcsb.vizbin.settings.Settings.<clinit>(Settings.java:21)
	at lu.uni.lcsb.vizbin.Main.main(Main.java:43)
Console usage:

usage: myapp [-a <number>] [-c <cut-off>] [-h] -i <input-file> [-k
       <length>] -o <output-file> [-p <number>] [-t <number>]
 -a,--pca <number>           number of PCA columns [default=50]
 -c,--cut-off <cut-off>      minimum conting length [default=1000]
 -h,--help                   print this help menu
 -i,--input <input-file>     input file in fasta format
 -k,--k-mer <length>         k-mer length [default=5]
 -o,--output <output-file>   output file containing coordinates
 -p,--perplexity <number>    perplexity parameter [default=30.0]
 -t,--thread <number>        number of threads [default=1]
2020-01-09 15:03:37,214 DEBUG [main] (MainFrame.java:87) - Init of Main application frame
2020-01-09 15:03:37,288 DEBUG [main] (MainFrame.java:92) - Property changed: defaultCloseOperation. Old: 1 New: 3
2020-01-09 15:03:38,513 DEBUG [main] (MainFrame.java:92) - Property changed: font. Old: null New: java.awt.Font[family=Lucida Grande,name=Lucida Grande,style=plain,size=13]

.
.
.

Interestingly, it still runs through:

2020-01-09 14:33:38,038 DEBUG [Thread-0] (ProcessInput.java:69) - Points created.
2020-01-09 14:33:38,038 DEBUG [Thread-0] (ObjectWithProperties.java:33) - Property changed: POINTS_FILE. Old: null New: /var/folders/mv/zv136yrd1_ng4d2w899qj4g0lgzc4c/T/map8586245099513520896/points.txt
2020-01-09 14:33:38,039 DEBUG [AWT-EventQueue-0] (ProcessInput.java:85) - [PROGRESS BAR] Points created. (35)
2020-01-09 14:33:38,107 DEBUG [Thread-0] (ProcessInput.java:69) - Done.
2020-01-09 14:33:38,107 DEBUG [AWT-EventQueue-0] (ProcessInput.java:85) - [PROGRESS BAR] Done. (100)
2020-01-09 14:33:38,124 DEBUG [Thread-0] (ObjectWithProperties.java:33) - Property changed: FINISHED. Old: false New: true

However, this error message is confusing and should not appear.

Best,

Cedric

Question: How to call ptsne.cpp from the commandline? (Working on a Python wrapper)

I want to write a Python wrapper that calls the very fast implementation in https://github.com/claczny/VizBin/blob/master/src/backend/bh_tsne/ptsne.cpp in the backend and then load the embeddings into pandas or numpy.

My main question is how to call the ptsne.cpp script in an OSX terminal?

It looks in the input data just needs to be parameters on each line and then the PCA components for each row separated by spaces between columns. Apologies if this is out-of-scope but I can post my code (if it works) on here if anyone would find it useful.

Feature request: sequencing masking

I'd like Vizbin to recognize masked sequences, i.e. ignore small letters. This would be useful to ignore e.g. 16S regions or other regions that obscure kmer profiles.

Usually, the user would supply the already masked sequence, but if you're mega cool, you could include a module that recognizes highly conserved/structural regions and does the masking internally.

Update the wiki

The wiki needs update.

Annotation file: Label (in numeric form) not displaying default color options

Howdy there,

I am using VizBin to visualize and manually bin mags for host associated bacterial populations. I have a single .fasta file that contains scaffolds corresponding to samples. In other words, multiple sample .fasta files were combined into a single .fasta file so that I can bin genomes across all samples of interest. My goal here was to create an annotation file to reflect this. Each sample, or label (#1-9), corresponding to a color. Any color, really doesn't matter. I created the annotation file from the combined fasta file in efforts to maintain the scaffold order and find other interesting properties such as length and gc content.

The issue I am having is that the annotation file is working, I think, but the labels are not being read. To explain further, it appears that the size of each point is changing based on length. That is helpful somewhat.

In the future I would like to add a reference genome of my "bacteria of interest" as a marker to aid my binning (receive more complete bins and avoid contamination as much as possible). I have yet to add this to my annotation file since the labels aren't being read.

Beyond that -- I have MANY scaffolds that I am inputting into VizBin (>4mil). I have set minimum contig length to 2Kb or 3Kb. The annotation file and fasta contain the same number of entries prior to input into VizBin. The minimum contig length does toss out plenty of scaffolds, but not so much that only one label is left.

Below is the head and tails of both my annotation file and the .fasta file.

A) head annotation file
label,length,gc
1,134077,45.42
1,87175,45.16
1,65686,45.71
1,52865,45.92
1,44948,34.86
1,42530,45.30
1,42475,46.38
1,40293,45.94
1,29404,48.00

B) tail annotation file
9,200,56.50
9,200,55.50
9,200,60.00
9,200,53.00
9,200,52.50
9,200,53.00
9,200,42.00
9,200,42.50
9,200,34.00
9,200,57.00

C) head fasta file

D0_SEK2_2_scaffold_1_c1
CAATCGATACGACCCCGGAGAGCGGCTTTTGCTAAAACTCGAGCAGTTTCTTGAAAACTT
GCTTCTGATATGAAACTTTGAGTATTTAGAGATGCTTTCGTTATTCCCAATAAGATGGCT
CGATAACAGATCGCTTCTTCCAAAGCACGCCCTGTTCGTTCCGCCCGCAACAATTCAATT
AGTTCTCCGGGTGAAAAAACATTAGACATTCTATCTTCTGAAACCAACACTTTTGATGTT
ATTTGACGCACAATAATCTCTATATGTCGATTATGAATCTGCACTCCCTGAGATCGATAA
ACTTTTTGGATCTTATTAACCAAAGAGATACGACTTTGCACTATAGTTAGCTCAGCACCA
ATCAAGAATCCCCAAGGAATTCCAAGAATTTTTGCTATACGCTCGTTCCAACCCTCAATC
CTCTTTTCTAGGTTCATCGATATTGAATCAATCGAACGAACTTCTAACACTTGTTCCACT
TTTGGAAGACCTTGCGTTATATCTCCAGATCTCTATTTTTCATATATAAATGTAACTAAC

D) tail fasta file

W_SEK2_D15_scaffold_211474_c1
ATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTCATTACACTTTATA
TGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTC
ATTACACTTTATATGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTT
GCAATAAACCTTCATTACAC
W_SEK2_D15_scaffold_113313_c1
CAGCAGACCGTGATGTCTTACGCCTGTGTTGCCCTCTACCGCTATGCGGTTGGTAAGCCA
GTGCCAGGGTTCGACCCAACGGCTATGCAGGGAGCGTTCCGAGTGAAGAAGCAGAAGTTC
ACCGGACAAGCCGGAGCCTAATTAGCGCCTAGGGCCACTCCGCGAACGAGAGCCTTCTGG
AAGTTCAGGTAAATGAACAC

note: D0_SEK2_2 is a sample name that I replaced with 1 in the label column of the annotation file. I thought potentially this software wasn't reading the labels correctly due to the underscores or the combination of letters and numbers. However, when it is just numbers, I am failing to get anything.

Any help would be great,

Jonathan
Post Doc, UCSD

Integrate/improve input-verification

The user can pretty much provide any file as input, even it is not a FASTA file. This should be checked more stringently and communicated to the user.

Unexpected end of ZLIB input stream

Hello, I was gonna open a newly saved workspace but then this happened:

I didn't encounter any problem to open a saved workspace last night though.
Also I found the size of workspace was smaller when using kmer=5 than that when kmer=4

Improving efficient of PrincipalComponentsAnalysisMtj.java

PrincipalComponentsAnalysisMtj.java includes the following fields:

double[][]  A_init;
DenseMatrix A;
DenseMatrix Y;

and A is initialized by A_init. A_init is no longer used afterwards and should be cleared from the memory. Moreover, DenseMatrix extends AbstractDenseMatrix and AbstractDenseMatrix implements public void set(int row, int column, double value) which could then be used to directly populate the matrices A and Y, thus saving one copy of the data.
Since A is modified during the singular value decomposition, we need to create and keep Y.

Integrate option to "export" CLR-transformed frequencies

It would be good to have an option that allows to export the high-dimensional data on which the CLR-transformation was applied (512D -> CLR-512D -> 50D -> 2D; so basically the second step of these four). This should be in a menu-like entry, kind of under "Debug" or so.
It should not be in the "Advanced options" as this is more of a debug feature rather than something the general user would use.

Using more existing Java libraries

VizBin has evolved organically over time and has multiple contributing authors. The underlying Java code could thus benefit from a rewrite, at least by the increased use of existing Java libraries (that have been tested exhaustively over time). Accordingly, some of the parts may be substituted by existing Java libraries instead of currently used custom-written code.

This issue will keep track of libraries that appear to be interesting and related points.

Operation	Suggestions
Input/Output of fasta files	http://biojava.org/wiki/BioJava:CookBook:Core:FastaReadWrite
Compression/Decompression of fasta sequences to save memory during runtime	https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html -> Seems applicable to individual strings -> foreach seq: compute kmer frequency; seq = compress(seq)

Feel free to extend/modify the list.

Best,

Cedric

Refactor code to set maxGC, maxCoverage (etc.) for the DataSet and not for the Sequence

Currently, the DataSetFactory normalizes the GC and coverage values of a Sequence with respect to the maximum in the DataSet. However, this should not be done in this way. The annotations that are provided for every sequence should not be modified per Sequence but rather only when needed, e.g., when preparing the plotting in ClusterPanel.

where to find the log file?

Hi I am very new to this software. I asked the question because when I tried to add the annotation, there was an error and I couldn't locate the log file in my macbook.

Also I wanna know if I can just label the 1st sequence in my file? I made a quite simple annotation file with only two lines :
isMarker
1

Could this annotation file cause the error? I am not a man with lots of coding experience. So I decided to not fill the series of 0 after the 1.

Many thanks

test

to be removed

Circles and rectangles in legend not conistent to main plotting window

When visualizing sequences that are derived from a larger collection (say, 20) of labels, the symbols for "circle" and for "rectangle" are not consistent between the legend and the main plotting window.

It appears that in the main plotting window, they are indeed correctly displayed but in the legend, the circles appear as small rectangles.

Overall, this is not consistent and expected to not be intuitive.

claczny / vizbin Goto Github PK

vizbin's People

Contributors

Stargazers

Watchers

Forkers

vizbin's Issues

Recommend Projects

Recommend Topics

Recommend Org