Coder Social home page Coder Social logo

micans / mcl Goto Github PK

View Code? Open in Web Editor NEW
89.0 5.0 12.0 2.47 MB

MCL, the Markov Cluster algorithm, also known as Markov Clustering, is a method and program for clustering weighted or simple networks, a.k.a. graphs.

Home Page: https://micans.org/mcl

License: Other

Makefile 0.45% M4 0.81% Shell 7.41% Perl 5.46% PostScript 11.61% R 0.70% C 73.55%
clustering markov-clustering network-analysis network-clustering graph-clustering markov-cluster-algorithm mcl

mcl's Introduction

Installation and MCL versions
Applications and bioinformatics
Quick pointers
RCL, fast multi-resolution consensus clustering
Status and plans

MCL

Visualisation of MCL

Markov CLustering or the Markov CLuster algorithm, MCL is a method for clustering weighted or simple networks, a.k.a. graphs. It is accompanied in this source code by other network-related programs, one of which is RCL (restricted contingency linkage) for fast multi-resolution consensus clustering (see below). If you use this software, please cite

van Dongen, Stijn, Graph clustering via a discrete uncoupling process, Siam Journal on Matrix Analysis and Applications 30-1, p121-141, 2008. https://doi.org/10.1137/040608635 .

The algorithm was conceived in 1998 and first published in a technical report in 1998. A PhD thesis and three more technical reports followed in 2000. The paper above is the result of a long-winded review process that started in 2000 and lay dormant for a long time, for reasons not entirely untypical within the realms of scientific publishing. A lot more (too much) information and documentation is available at micans.org/mcl .

This MCL implementation is fast, threaded, and uses sparse matrices. It runs on a single machine and can use multiple CPUs. On capable hardware it can cluster networks with millions of nodes and billions of edges within hours.

Installation and MCL versions

Releases 14-137 and 22-282 of MCL are available on Bioconda and many flavours of Linux and BSD, including Debian, Ubuntu and OpenBSD. Release 14-137 is a fine version; this MCL implementation has not noticeably changed over the past decade, so for using the clustering program mcl it does not matter which of these versions you have. See Status and Plans below for more detail and why the software is still being developed nonetheless. Many thanks to Joost van baal, Kusalananda, Andreas Tille and other maintainers who package(d) MCL for Debian and other Linux and BSD releases. Example package installation commands:

conda install bioconda::mcl

apt-get install mcl           # Debian, Ubuntu

Installing MCL software without a package manager requires a compilable source tree. The code in this repository requires processing by autotools to produce such a tree. Hence, to use MCL software this repository is not the right source unless you are interested in development. This code additionally needs the C library in the github repository micans/cimfomfa (previously cimfomfa was included within the mcl distribution). The build procedure has been changed accordingly and coordinates installation of both cimfomfa and mcl.

For installing the current MCL release from micans.org/mcl use the script install-this-mcl.sh. On Linux and MacOS (if you have development tools installed on MacOS) the following lines pasted in a terminal (or saved to file and sourced) will install MCL.

mkdir installmcl
cd installmcl
wget https://raw.githubusercontent.com/micans/mcl/main/install-this-mcl.sh -o install-this-mcl
chmod u+x install-this-mcl.sh
./install-this-mcl.sh
mcl --version        # test install

By default programs are installed in $HOME/local/bin and the multi-resolution consensus clustering program rcl is enabled (see below). Edit the file install-this-mcl.sh before executing it if you want to make changes.

MCL's build environment was created by Debian developer Joost van Baal - many thanks Joost!

The current release is 22-282, which is without open issues that relate to mcl. You'll need this release if you want to experiment with RCL (consensus clustering integrating results for different granularities / inflation values / resolution values). The release also fixes some mcxarray issues.

Applications and bioinformatics

MCL has been used a lot in the field of bioinformatics, starting with the TribeMCL method published by Enright, van Dongen and Ouzounis. For bioinformatic applications, please cite additionally

Enright A.J., Van Dongen S., Ouzounis C.A. An efficient algorithm for large-scale detection of protein families, Nucleic Acids Research 30(7):1575-1584 (2002). https://pubmed.ncbi.nlm.nih.gov/11917018/

Quick pointers

The quickest way to try out MCL is to provide it with a file that has three tab-separated columns, where each line is of the form LABEL1<tab>LABEL2<tab>SIMILARITY-VALUE. Such a line represents an edge from LABEl1 to LABEL2 with weight SIMILARITY-VALUE. Values should quantify similarities between objects. Examples are measures of overlap (e.g. Jaccard index), a correlation coefficient such as Pearson or Spearman, or a negative log E-value. In network clustering / community detection edge weights should be higher if objects are more similar. In contrast, pair-wise relationships in classical feature-space clustering algorithms are nearly always a distance. Beware of this dichotomy. If the file is called MYFILE you can run MCL like this:

mcl MYFILE --abc -I 2.0

Output will be in the file out.MYFILE.I20, where each line is cluster written as a list of labels. It is recommended to try a few different inflation values (the -I parameter), e.g.

mcl MYFILE --abc -I 1.4
mcl MYFILE --abc -I 2.0
mcl MYFILE --abc -I 3.0
mcl MYFILE --abc -I 5.0

How you construct the network is important. Some recipes can be found here, and some Frequently Answered Questions here (the latter is a bit over the top). For large data the 'abc' format just described becomes very slow to load. Use these instructions on the recipe page to convert 'abc' format to a binary format that is orders of magnitude faster to load.

RCL, fast multi-resolution consensus clustering

RCL, (f)or Restricted Contingency Linkage, is a fast and parameterless method for integrating multiple flat clusterings at different levels of resolutions. There is no requirement on these clusterings; they can be made by any method or combination of methods, for example by Leiden with different resolution parameters, or by MCL with different inflation values. The implementation provided here in the RCL directory just requires the input to be in the native mcl matrix format. For Seurat results this is facilitated by the scripts rcl/srt2tab.sh (this establishes a mapping from labels to indexes) and rcl/srt2cls.sh (this translates a Seurat <LABEL><CLSID> file to mcl matrix format).

This preprint quite extensively describes RCL, including application of RCL to a large-scale single-cell kidney data set of 27k cells.

Status and plans

The program MCL has been very stable or nearly unchanging for well over 15 years now. The last speed optimisations happened in 2010. Note that the underlying MCL algorithm is very simple and consists of alternation of two sparse matrix operations; regular matrix multiplication and element-wise matrix power-raising plus scaling. Hence, there is little to optimise beyond the pruning/approximation scheme where some of the smallest elements are set to zero. I aim to do some development in its sibling programs, including improving those that implement (currently somewhat inelegant) mini-formats such as mcx alter and mcxsubs.

RCL (see above) was recently added, so some focus will be to improve/extend its implementation, as well as support and documentation.

A second area of development, tied to the first, will be low-level fast loading, filtering and subsetting of large networks and matrices. I have found occasional use for this in past projects, and under certain conditions an mcl-edge recipe, if possible, will be a few times faster than a recipe using Python panda or R. Challenge of this type are sometimes a reason to expand mcl-edge's capabilities.

Another reason for new releases is that new compilers and compiler settings have unearthed two or three blemishes in the code base that needed fixing.

If you have questions, suggestions, or problems please open an issue or discussion.

mcl's People

Contributors

mbertagna avatar micans avatar ryandesign avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mcl's Issues

Question about memory consumption

Hi,
First thanks for providing such a good tool. I am now using orthofinder to infer single copy genes. But one of the MCL step takes too much memory so I decided to run this step on a big memory node with 1T memory. I would like to know if this 1T memory is enough. Now my matrix file in mci format has dimensions 31965032x31965032, and the file size is about 1.5T. Here is my mcl -z information.

$ mcl -z

[mcl] cell size: 8
[mcl] cell contents: int and float
[mcl] largest index allowed: 2147483647
[mcl] smallest index allowed: 0
Prune number                               10000        [-P n]
Selection number                            1100        [-S n]
Recovery number                             1400        [-R n]
Recovery percentage                           90        [-pct n]
warn-pct                                      10        [-warn-pct k]
warn-factor                                 1000        [-warn-factor k]
dumpstem                                                [-dump-stem str]
Initial loop length                            0        [-l n]
Main loop length                           10000        [-L n]
Initial inflation                              2.0      [-i f]
Main inflation                                 2.0      [-I f]

I would be appreciate for your answers to my question.

Thanks in advance!

compile on Apple M-series ARM64 processors

I would like to use MCL in combination with orthofinder on my MacBook Pro.
I tried to compile is but the configuration fails on a test of the gcc
At the configuration it doesn't seem to detect the actual architecture: arm64

~/Downloads/mcl % bash install-this-mcl.sh
--2024-06-04 21:48:39-- http://micans.org/mcl/src/mcl-22-282.tar.gz
Resolving micans.org (micans.org)... 2a00:1098:82::6:1, 46.235.227.111
Connecting to micans.org (micans.org)|2a00:1098:82::6:1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2748263 (2,6M) [application/x-gzip]
Saving to: ‘mcl-22-282.tar.gz’

mcl-22-282.tar.gz 100%[====================================================================>] 2,62M 6,79MB/s in 0,4s

2024-06-04 21:48:39 (6,79 MB/s) - ‘mcl-22-282.tar.gz’ saved [2748263/2748263]

--2024-06-04 21:48:39-- http://micans.org/mcl/src/cimfomfa-22-273.tar.gz
Resolving micans.org (micans.org)... 2a00:1098:82::6:1, 46.235.227.111
Connecting to micans.org (micans.org)|2a00:1098:82::6:1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 455547 (445K) [application/x-gzip]
Saving to: ‘cimfomfa-22-273.tar.gz’

cimfomfa-22-273.tar.gz 100%[====================================================================>] 444,87K --.-KB/s in 0,1s

2024-06-04 21:48:40 (3,60 MB/s) - ‘cimfomfa-22-273.tar.gz’ saved [455547/455547]

checking for a BSD-compatible install... /opt/local/bin/ginstall -c
checking whether build environment is sane... yes
/Users/strom/Downloads/mcl/cimfomfa-22-273/missing: Unknown --is-lightweight' option Try /Users/strom/Downloads/mcl/cimfomfa-22-273/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... /opt/local/bin/gmkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-apple-darwin23.5.0
checking host system type... x86_64-apple-darwin23.5.0
checking how to print strings... printf
checking whether make supports the include directive... yes (GNU style)
checking for gcc... /opt/homebrew/bin/gcc
checking whether the C compiler works... no
configure: error: in /Users/strom/Downloads/mcl/cimfomfa-22-273': configure: error: C compiler cannot create executables See config.log' for more details

Two issues: "nx.to_scipy_sparse_matrix" not exist and the hyperparameter "inflation" not working?

Hi MCL Community,

I found two issues:

  1. In the sample code, nx.to_scipy_sparse_matrix doesn't exist in networkx 3.0. The document needs to be updated to nx.to_scipy_sparse_array.
  2. I tried the doc: https://markov-clustering.readthedocs.io/en/latest/readme.html and found different hyperparameter inflation =1.5-2.6 also gave the same results and modularity are different from the doc page:
    image

Also inflation=1.1 and inflation=2.5 gave the same clustering outcome. I think there is a bug. Can you check? Thanks.

installation of mcl-22-282.tar.gz with fatal error: tingea/types.h: No such file or directory

make[2]: Entering directory `/home/badrazat/mcl-22-282/src/clew'
gcc -DHAVE_CONFIG_H -I. -I../..  -I../../src -I../..   -g -O2 -pthread -MT scan.o -MD -MP -MF .deps/scan.Tpo -c -o scan.o scan.c
In file included from ../../src/impala/matrix.h:42:0,
                 from scan.h:61,
                 from scan.c:16:
../../src/impala/ivp.h:16:10: fatal error: tingea/types.h: No such file or directory
 #include "tingea/types.h"
          ^~~~~~~~~~~~~~~~

make install for the previous version mcl-14-137 is successful.

Introducing new node to existing network?

Hi,

Thanks so much for developing this amazing method. I have used mcl on my very large protein networks (~10milion proteins) and It works pretty. I have two questions:

  1. I want to expand to large proteins, however the .abc file (~2T) has exceed my system RAM, and seems to to kill the maxload process.

  2. Is there a way to add new proteins into the existing network? If so, I can solve the first problem and potentially skip the bulk of mcl, which took days to run.

Any input would be greatly appreciated.
Best, Menghan

Cannot open out stream Error

Hello,

I'm getting a strange error message from mcl. code and output can be found below:

$ mcl mclInput --abc -I 1.5 -o mclOutput
.........[mcl] new tab created
[mcl] pid 15159
ite ------------------- chaos time hom(avg,lo,hi) m-ie m-ex i-ex fmv
1 ................... 25.41 0.12 0.99/0.25/5.16 1.46 1.42 1.42 3
2 ................... 43.22 0.17 0.90/0.19/3.67 1.35 1.08 1.53 5
3 ................... 46.47 0.18 0.84/0.08/3.49 1.29 0.95 1.46 5
4 ................... 39.57 0.16 0.80/0.10/2.66 1.22 0.93 1.35 5
5 ................... 44.13 0.14 0.77/0.10/2.91 1.14 0.91 1.23 4
6 ................... 27.14 0.12 0.75/0.13/1.73 1.08 0.88 1.08 3
7 ................... 16.20 0.10 0.73/0.09/1.57 1.07 0.86 0.93 3
8 ................... 13.40 0.08 0.71/0.25/1.28 1.03 0.88 0.82 2
9 ................... 11.88 0.07 0.71/0.21/1.09 1.01 0.86 0.71 1
10 ................... 8.79 0.04 0.73/0.32/1.18 1.00 0.83 0.58 0
11 ................... 5.15 0.04 0.76/0.35/1.06 1.00 0.85 0.49 0
12 ................... 5.97 0.03 0.81/0.35/1.00 1.00 0.82 0.40 0
13 ................... 4.66 0.02 0.86/0.43/1.00 1.00 0.78 0.31 0
14 ................... 2.88 0.02 0.90/0.41/1.08 1.00 0.78 0.24 0
15 ................... 2.43 0.02 0.93/0.43/1.00 1.00 0.78 0.19 0
16 ................... 2.29 0.02 0.95/0.51/1.00 1.00 0.80 0.15 0
17 ................... 2.04 0.02 0.97/0.56/1.00 1.00 0.85 0.13 0
18 ................... 1.68 0.02 0.98/0.63/1.00 1.00 0.88 0.11 0
19 ................... 0.90 0.01 0.98/0.53/1.00 1.00 0.90 0.10 0
20 ................... 0.99 0.01 0.99/0.48/1.00 1.00 0.93 0.10 0
21 ................... 0.88 0.01 0.99/0.60/1.00 1.00 0.95 0.09 0
22 ................... 0.43 0.01 1.00/0.73/1.00 1.00 0.97 0.09 0
23 ................... 0.38 0.01 1.00/0.75/1.00 1.00 0.98 0.09 0
24 ................... 0.35 0.01 1.00/0.76/1.00 1.00 0.99 0.09 0
25 ................... 0.38 0.01 1.00/0.76/1.00 1.00 0.98 0.08 0
26 ................... 0.34 0.01 1.00/0.76/1.00 1.00 1.00 0.08 0
27 ................... 0.25 0.01 1.00/0.77/1.00 1.00 0.99 0.08 0
28 ................... 0.25 0.01 1.00/0.76/1.00 1.00 1.00 0.08 0
29 ................... 0.21 0.01 1.00/0.79/1.00 1.00 1.00 0.08 0
30 ................... 0.12 0.01 1.00/0.88/1.00 1.00 1.00 0.08 0
31 ................... 0.04 0.01 1.00/0.96/1.00 1.00 1.00 0.08 0
32 ................... 0.01 0.01 1.00/0.99/1.00 1.00 1.00 0.08 0
33 ................... 0.00 0.01 1.00/1.00/1.00 1.00 1.00 0.08 0
34 ................... 0.00 0.01 1.00/1.00/1.00 1.00 1.00 0.08 0
[mcl] jury pruning marks: <99,99,99>, out of 100
[mcl] jury pruning synopsis: <99.0 or perfect> (cf -scheme, -do log)
[mcl parlour] cannot open out stream
[mcl parlour] trying to fall back to default <out.mcl>
___ [mcxIOopen] w stream <out.mcl> cannae be opened

Anyone know what causes this kind of error? I've used this command with those arguments many times in the past and never had issues. Thank you.

tag/release version

Hi!
It would be great if you could tag and/or release the versions. Such that users know which version they are downloading.
Thanks a lot

adding proteins to existing clusters

Hi,
I have created clusters from blast output with a subset of proteins (all versus all blastp from ~800 genomes) that I am working with. Now I need to add more proteins (I expanded to ~2000 genomes including the original 800) to the existing clusters (plus of course add more clusters if needed) using new blast outputs. I have created 2 blast output: the ~1200 extra genomes proteins versus all the ~2000 genome proteins, as well as the ~800 original against the extra ~1200. The three blast outputs can of course be combined in one huge file and be formatted to the abc format. But my question is: can I keep adding to the original clustering file, or do I just need to start all over? Thanks for your time

Future single cell applications

Many thanks, I read your RCL preprint with great interest. If there are any plans to add more wrappers, could there be a wrapper for RCL implementation in scanpy too?

mcxload won't work with sif format

mcxload won't work with sif files. This used to work just fine in older mcl versions. In the newer ones, a simple file like this:

A -> B:0.4
A -> C:0.5

will result in this error:

___ [mcxload] symmetric mode not compatible with multi-column input formats

when running this command:

mcxload -sif file.sif --expect-values --stream-mirror -write-tab file.tab -o file.mci

It works with abc format, though.

It seems commit 6fd2b6b introduced that behavior, but the changelog says:

*  Fixed bug in mcxload -etc / -235 and others. It is no longer possible
   to combine these with --stream-mirror. Use -ri instead.

As it doesn't say anything about the sif format, thus I cannot tell whether this behavior with sif files is intentional or not. If I use -ri max it says:

___ [mcxload] two domain mode precludes all symmetric tab options

and I cannot guess how to say mcxload that there is a single domain. The example command in the man page of mcxload says this command should be possible:

mcxload --stream-mirror -sif data3.txt -o data3.mci -write-tab data3.tab

But that, again, results in the first error reported here.

'SIGALRM' undeclared when installing on Windows

Hello. I'm using cygwin on Windows to install the MCL.

The first problem I encountered was 'checking build system type... autofoo/config.guess: unable to guess system type'. I solved it by replacing 'autofoo/config.guess' by a new one from https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD.

After configuring I tried to run 'make install' command and got the following:

proc.c: In function 'mclSigCatch':
proc.c:65:18: error: 'SIGALRM' undeclared (first use in this function)
{ if (sig == SIGALRM)
^
proc.c:65:18: note: each undeclared identifier is reported only once for each function
it appears in
make[2]: *** [Makefile:239: proc.o] Error 1
make[2]: Leaving directory '/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/mcl'
make[1]: *** [Makefile:233: install-recursive] Error 1
make[1]: Leaving directory '/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src'
make: *** [Makefile:279: install-recursive] Error 1

How can I solve this problem?

Multiple definitions when installing on Windows via Cygwin

It configured fine, but during 'make install' I got multiple definitions error. As it's large enough, I'll copy a part of it:

gcc  -g -O2  -lm  -o clmformat.exe clmformat.o report.o ../mcl/libmcl.a ../clew/libclew.a  ../gryphon/libgryphon.a ../impala/libimpala.a ../../util/libutil.a -lm
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:27: multiple definition of `mclx_n_thread_g'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:27: first defined here
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:25: multiple definition of `nu_magic'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:25: first defined here
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:24: multiple definition of `nu_diff_zip'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:24: first defined here
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:23: multiple definition of `nu_diff_sl'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:23: first defined here
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:22: multiple definition of `nu_diff_can'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:22: first defined here
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: report.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:21: multiple definition of `nu_meet_zip'; clmformat.o:/cygdrive/c/Users/raish/Downloads/22.08.2022 Работа/mcl-14-137/src/shcl/../../src/impala/iface.h:21: first defined here

mcl version 22-282 failing to build under archlinux

i am attempting to package mcl version 22-282 for Archlinux using suggested source tarballs from mcl homepage. while building using suggested commands the build process stops with following error output

$ ./configure --prefix /usr
$ make
make  all-recursive
make[1]: Entering directory '/build/mcl/src/mcl-22-282'
Making all in .
make[2]: Entering directory '/build/mcl/src/mcl-22-282'
make[2]: Leaving directory '/build/mcl/src/mcl-22-282'
Making all in img
make[2]: Entering directory '/build/mcl/src/mcl-22-282/img'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/build/mcl/src/mcl-22-282/img'
Making all in graphs
make[2]: Entering directory '/build/mcl/src/mcl-22-282/graphs'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/build/mcl/src/mcl-22-282/graphs'
Making all in doc
make[2]: Entering directory '/build/mcl/src/mcl-22-282/doc'
zoem -d roff -i mcl.azm -o mcl.1
___ [\dofile#2] failed to open file <pud/man.zmm>
___ [zoem] unwound on error/exception
___ [\dofile#2] error (9) occurred while reading file <mcx.zmm>
___ [zoem] unwound on error/exception
make[2]: *** [Makefile:689: mcl.1] Error 9
make[2]: Leaving directory '/build/mcl/src/mcl-22-282/doc'
make[1]: *** [Makefile:383: all-recursive] Error 1
make[1]: Leaving directory '/build/mcl/src/mcl-22-282'
make: *** [Makefile:324: all] Error 2

i have both cimfomfa and zoem installed

clm info output question

Greetings,

Not an issue, but a question about the output from clm info: I understand that 'eff' is efficiency, 'mf' is mass fraction, and 'af' is area fraction, but I'm not sure what 'mod' is short for. Based on the values, I infer that this might be 'modularity', but is this correct? I couldn't find a description of 'mod' in the clm info documentation online. Apologies if I missed it somewhere. Below is an example of the output:

eff=0.69641 mod=0.99225 mf=0.99317 af=0.00044 src=out.seq.mci.I12 ncl=33952 max=2016 ctr=162.3 avg=10.9 min=1 DGI=2016 TWI=1067 TWL=99 sgl=19635 qrt=26608

I'm using mcl version 22-282.

Regards,
Josh

Inconsistent result with identical graphs

Hi, thanks for MCL, we have been using it for years! However, recently I discovered something weird: when I change the order of the input (leaving the similarity and connected nodes identical; thus, an identical graph), I get different output.

An example graph A:

1236 84764 252155497.24
1236 14234 221358383.20
1236 47276 252155497.24
1236 155554 590941952.65
1236 147494 590941952.65
1236 30514 244140625.00
1236 68751 252155497.24
1236 137873 27118504.64
1236 52557 252155497.24
14234 52557 228746602.32
14234 155554 543087837.45
14234 137873 27118504.64
14234 68751 228746602.32
14234 147494 543087837.45
14234 47276 228746602.32
14234 30514 221358383.20
14234 84764 228746602.32
30514 84764 252155497.24
30514 155554 590941952.65
30514 47276 252155497.24
30514 137873 27118504.64
30514 147494 590941952.65
30514 52557 252155497.24
30514 68751 252155497.24
47276 155554 580015194.63
47276 84764 244140625.00
47276 147494 580015194.63
47276 52557 244140625.00
47276 137873 26284829.57
47276 68751 244140625.00
52557 68751 244140625.00
52557 147494 580015194.63
52557 137873 26284829.57
52557 84764 244140625.00
52557 155554 580015194.63
68751 137873 26284829.57
68751 147494 580015194.63
68751 155554 580015194.63
68751 84764 244140625.00
84764 147494 580015194.63
84764 137873 26284829.57
84764 155554 580015194.63
137873 147494 5698635.25
137873 155554 5698635.25
147494 155554 244140625.00

And an example graph B:

1236 137873 27118504.64
1236 14234 221358383.20
1236 147494 590941952.65
1236 155554 590941952.65
1236 30514 244140625.00
1236 47276 252155497.24
1236 52557 252155497.24
1236 68751 252155497.24
1236 84764 252155497.24
137873 147494 5698635.25
137873 155554 5698635.25
14234 137873 27118504.64
14234 147494 543087837.45
14234 155554 543087837.45
14234 30514 221358383.20
14234 47276 228746602.32
14234 52557 228746602.32
14234 68751 228746602.32
14234 84764 228746602.32
147494 155554 244140625.00
30514 137873 27118504.64
30514 147494 590941952.65
30514 155554 590941952.65
30514 47276 252155497.24
30514 52557 252155497.24
30514 68751 252155497.24
30514 84764 252155497.24
47276 137873 26284829.57
47276 147494 580015194.63
47276 155554 580015194.63
47276 52557 244140625.00
47276 68751 244140625.00
47276 84764 244140625.00
52557 137873 26284829.57
52557 147494 580015194.63
52557 155554 580015194.63
52557 68751 244140625.00
52557 84764 244140625.00
68751 137873 26284829.57
68751 147494 580015194.63
68751 155554 580015194.63
68751 84764 244140625.00
84764 137873 26284829.57
84764 147494 580015194.63
84764 155554 580015194.63

I checked that both are identical using awk '$1<$2{print $1,$2,$3; next;} {print $2,$1,$3;}' ${graph} | sort. However, these are the results when running MCL (mcl ${graph} --abc -I 8.4 -o ${output}):

Output for graph A:

1236	84764	14234	47276	155554	30514	68751	137873	52557
147494

Whereas the output for graph B is:

1236	137873	14234	147494	30514	47276	52557	68751	84764
155554

Is this expected? I was under the impression that the ABC format specified an undirected graph and thus should have identical output when changing the order of the input.

Edit: I forgot to mention the sort in the awk command above.

12T data to process

Many thanks for your job! I have almost 12T data that needs to be clustered by the Markov Cluster algorithm, do you have any suggestion?

[mclIO full] reading <data.mci>
.......................................
[mclIO] read native binary 14393317x14393317 matrix with 1788352905 entries
[mcl] pid 4019741
ite ------------------- chaos time hom(avg,lo,hi) m-ie m-ex i-ex fmv
1
___> Vector with idx [115], maxval [0.000133] and [1185220] entries
-> initially reduced to [2] entries with combined mass [0.000250].
-> Consider increasing the -P value and increasing the -S value.
-> (before rescaling) Finished with [1400] entries and [0.044286] mass.
ls

___> Vector with idx [635], maxval [0.000139] and [1215076] entries
-> initially reduced to [2] entries with combined mass [0.000270].
-> Consider increasing the -P value and increasing the -S value.
-> (before rescaling) Finished with [1400] entries and [0.067914] mass.
killed

unable to download software

Hi I'm attempting to install using the install-this-mcl.sh script today and getting a "forbidden" message when it attempts to download the software.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.