Coder Social home page Coder Social logo

wyp1125 / mcscanx Goto Github PK

View Code? Open in Web Editor NEW
216.0 15.0 59.0 102.04 MB

MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!

Home Page: http://chibba.pgml.uga.edu/mcscan2/

C 0.01% C++ 0.28% Java 0.27% Perl 0.03% Makefile 0.01% CAP CDS 99.39% Raku 0.01%
c-plus-plus java perl dynamic-programming visualization

mcscanx's People

Contributors

benjaminschwessinger avatar huizhejin avatar jingping avatar martin-g avatar tanghaibao avatar wyp1125 avatar xtan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mcscanx's Issues

Standardize input file format to bed

The gff format is confusing (which is my fault, since at the time there was no bed format). The bed format only has the data in different order, and 0-based. The bed format is defined here, and is especially useful since there are a ton of tools that can handle bed files, most notably BEDTOOLS.

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

You can change from gff format to bed through:

awk '{print $1"\t"$3"\t"$4"\t"$2}' grape.gff >grape.bed

File format gff or bed and what format?

I am a PhD student doing a collaborative project between Trinity College Dublin and UCLondon and am using MCScanX. I am just getting to know the tool and have been trying to use the test data available in the package ( at_vv.gff, at_vv.blast etc.) and am running into a few problems.

Some of the information online says to use .gff and others say .bed, and the format of these files is also conflicting in the documentation. I was wondering if you could tell me how exactly the files needed for MCScan to run should be formatted and is it advisable to have just two (the .bed/.gff and .blast) in a single folder when running ./MCScan command.

I hope you can help me out.

Increase the default length of species ID (from gff/bed)

@tanghaibao @wyp1125 Hi Haibao,hi Yupeng,

Regards! Sorry to bother, I was wondering how to modify the code to make MCS take more letters as the species ID. Now the software take a two-letter short name for the species ID. Because for my analysis, sometimes I have many genomes from the same genus or family, so it's a pain for me to come up with unique two-letter IDs for these genomes. So it would be very handy if the software can take into account, say 4/5 letters as the species ID.
It seems to me maybe I should adjust the function of "read_gff" in "read_data.cc" (maybe somewhere else as well), but I know little about c. Thanks very much!
https://github.com/wyp1125/MCScanX/blob/master/read_data.cc
Best, Tao

request for your help to understand the syntheny plot

Hi,

I am new to comparative genomics and I am going to ask a very stupid question. I am so sorry for that. I am working on two vertebrates and assembled a genome for two close species. When I performed synteny analysis using MCSCANX and plotted, I could see a inverted synteny block for Gg_chr5
image

Does this means that this is an assembly error as the rest of all chromosomes are linear and nice, or could it mean that this is a biological discovery? If this is an error, how can I find the coordinates at which I could change the orientation for this scaffold?

I am again so sorry for this stupid question and will be grateful if you could advise me.

regards
Amit

error: ‘chdir’ was not declared in this scope !

jitendra@jitendra-UNLOCK-INSTALL[MCScanX] make []
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
msa.cc: In function ‘void msa_main(const char*)’:
msa.cc:289:22: error: ‘chdir’ was not declared in this scope
if (chdir(html_fn)<0)
^
makefile:2: recipe for target 'mcscanx' failed
make: *** [mcscanx] Error 1

when using MCScanX_h 0 homologous pairs imported (316812 discarded)

Dear Yupeng,

when I run MCScanX_h data/at_gm_pt_vv, where at_gm_pt_vv.homology only contains two columns (I deleted column 3 manually), I get the following output:

Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (316812 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to ./at_gm_pt_vv.collinearity [1.861 seconds elapsed]
Writing multiple syntenic blocks to HTML files
at1.html
at2.html
...

Actually, I am expecting that I get similar results compared to the run where I have the third column with numerical values since in the manual (http://chibba.pgml.uga.edu/mcscan2/documentation/manual.pdf) it is described that you can pass a .homology file with two or three columns to MCScanX_h.

Family_circle_plotter...

Hi, @wyp1125
When I'm trying to run family_ circle_plotter I'm getting the circular plot but the collinear genes are not getting highlighted in red. I would be very grateful to you if you could suggest on this issue

MCscanX BLASTALL ERROR

@wyp1125 @XTan hi,i hope you will be fine.i need your help, i am new user of MCScanX.i have successfully installed MCScanX. after that i have installed ncbi-blast+.now i am confuse in "at.blast" file.my result output "at.blast" not similar with your example data "at.blast" file.kindly guide where i am wrong.i am using for "database generating" protein sequence of arabidopsis thaliana and for query sequence ,i am using same protein sequence that i was use for "database generating".if i use two genome like arabidopsis and grape,i will put arabidopsis protein sequence as a querry and for database "grape protein sequence"??kindly guide me i will be very thankful.

missing homologs

Hi, there,
I found missing homologs in my results, though they show 100% homolog in my blastp results.
I tried different parameters, checked all my command, nothing wrong, and I found this issue occurred more than 5 times in my overall results.
Please help me solve this:

`
1 V.g1576 tig00000012_np1212.g1452
1 V.g1577 tig00000012_np1212.g1453
1 V.g1578 tig00000012_np1212.g1454
1 V.g1579 | |
1 V.g1580 | |
1 V.g1581 tig00000012_np1212.g1457
1 V.g1582 tig00000012_np1212.g1458
1 V.g1583 tig00000012_np1212.g1459

tig00000012_np1212.g1452 V.g1576 100.000 121 0 0 1 121 1 121 6.80e-86 243
tig00000012_np1212.g1453 V.g1577 100.000 129 0 0 1 129 1 129 1.98e-93 263
tig00000012_np1212.g1454 V.g1578 100.000 420 0 0 1 420 1 420 0.0 868
tig00000012_np1212.g1455 V.g1579 100.000 528 0 0 1 528 1 528 0.0 1078
tig00000012_np1212.g1456 V.g1580 100.000 530 0 0 1 530 1 530 0.0 1072
tig00000012_np1212.g1457 V.g1581 100.000 442 0 0 1 442 1 442 0.0 906
tig00000012_np1212.g1458 V.g1582 100.000 250 0 0 1 250 1 250 0.0 513
tig00000012_np1212.g1459 V.g1583 100.000 218 0 0 1 218 1 218 3.29e-157 432`

I used these commands:
blastp -query JYF80.pp.fa -db Sc2.pp -out 80_Sc2.blast -evalue 1e-5 -num_threads 16 -outfmt 6 -num_alignments 5 MCScanX 80_Sc2 -e 1e-5 -s 10

Newick format

Hi, I followed the MCScanX tutorial and got stuck at this command:

java family_tree_plotter -t tree_file -s synteny_file -o output_PNG_file

How to create a tree in Newick format?

Thank you in advance

0 matches imported (0 discarded). No genes detected.

Hi Yupeng,
Thanks for making MCScanX available. I'm trying to run the tool on my concatenated blast results and "gff" but I keep getting 0 matches imported and 0 discarded. Am I missing something trivial? My gene ids in blast and "gff" are matching. For my "gff" I've renamed the chromosome column to the "sp#" format as instructed in the manual. I'm not getting any errors either. Tips?

> Reading BLAST file and pre-processing
> Generating BLAST list
> 0 matches imported (0 discarded)
> 0 pairwise comparisons
> 0 alignments generated
> Pairwise collinear blocks written to data/mcscanx/input/.collinearity [0.004 seconds elapsed]
> Writing multiple syntenic blocks to HTML files
> Done! [0.006 seconds elapsed]

.collinearity :

> ############### Parameters ###############
> # MATCH_SCORE: 50
> # MATCH_SIZE: 5
> # GAP_PENALTY: -1
> # OVERLAP_WINDOW: 5
> # E_VALUE: 1e-05
> # MAX GAPS: 25
> ############### Statistics ###############
> # Number of collinear genes: 0, Percentage: -nan
> # Number of all genes: 0
> ##########################################

Thank you for your time 😄

Question on Duplicate gene classifier

Hello,

You mention in the manual that the duplicate_gene_classifier tool should not be used on data from multiple genomes. I could not find the reason why in your paper. Is it compute time or something else. Can you clarify?

Thanks
Abhijit

Define single entry point for various actions

This is just a suggestion, to simplify usage. For example, typing samtools will give you

Usage:   samtools  [options]

Command: view        SAM<->BAM conversion
         sort        sort alignment file
         pileup      generate pileup output
         mpileup     multi-way pileup
         faidx       index/extract FASTA

I suggest grouping each command item under the same mcscanx command, then dispatch to function calls. Much easier to use this way.

collinearity.pl

(base) appledeMacBook-Pro-4:MCScanX fatima$ perl add_ka_and_ks_to_collinearity.pl -i ../data/at.collinearity -d ../data/at.cds -o at.collinearity.kaks

Can't open perl script "add_ka_and_ks_to_collinearity.pl": No such file or directory

i am facing this problem. anybody guide me. how i can solve this problem

Question regarding blast

Hi, thank you for developing MCScanX. I wonder if it is possible to use output of blastn instead of blastp for the analysis. I have a huge genome encountering many errors (memory and time) to perform balstp jobs. I tried to parallel the analyses but it was not efficient.
Thanks

Restarting with -a option

Hi, thanks for developing MCScanX! I've found it of great use.

I have already ran one analysis using the -a option to stop after writing the .collinearity file, but now I realized that I could benefit from the .html files that are created downstream.

Is there a way to restart MCSCanX from the .collinearity file and only create the html files? Or do I need to redo the entire analysis to create the .html files?

Thanks!
Justin

No result after run commands

Greetings, I'm trying to make use of MCScanX_h, i've prepared the necessaries files following the manual yet my data neither example data is working.

My gff with 5 species gff is edited follwing the CH# gene start end
Ta1 TA20_000001 40390 41754
...

my .homology file achieved by running OrthoFinder, and extracting the pair-wise data as follows for each species withou the third optional collumn
TH179_000002 TH3844_011373
...

Reading other issues on git, the solution of tab delimited files and moving them to the program folder doesn't resolved it. Also the example data returns the same no output.

"using example data"
/home/h.paulocampiteli/MCScanX-master/MCScanX /home/h.paulocampiteli/MCScanX-master/data/
Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/data/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Done! [0.000 seconds elapsed]

"using my own on another folder"
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /storage4/h.paulocampiteli/synteny/mscscan_analysis/
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /storage4/h.paulocampiteli/synteny/mscscan_analysis/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage

"using my data on the MCscan folder
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /home/h.paulocampiteli/MCScanX-master/MCScanX
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/MCScanX.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage
Done! [0.001 seconds elapsed]

I could not find any other response regarding this problem. Anyone knows what sorcery I must make to put the program to work?

Thanks in advance

How to generate xyz.blast file for MCScanX using the target nuclotide sequences and genome ?

Hi, Yupeng

I have successfully installed MCScanX.
The xyz.blast file was generated by command
blastall -i sim.cluster.fasta -d DSim_pilon.fasta -p blastn -e 1e-10 -b 5 -v 5 -m 8 -a 15 -o xyz.blast

The DSim_pilon.fasta was the database generated by command
formatdb -i DSim_pilon.fasta -p F

The gff file like this
tig00000001|arrow|pilon tig00000001|arrow|pilon:3569293-3582540(-) 3569293 3582540
tig00002065|arrow|pilon tig00002065|arrow|pilon:11837-22762(-) 11837 22762

but after running command< MCScanX xyz>, there was no alignment result in xyz.collinearity file and the html file was also wrong.
image

What mistake did I take when I was generating the .blast file?
Thank you so much!

Kai

collinearity file does not have any collinear genes

Hi,

I am trying to compare two snake genomes for identifying collinear genes but currently having some issues with MCScanX. Even I have generated the gff and blast files having exact same genes ids in both files, but the output collinearity file is empty.

############### Parameters ###############

MATCH_SCORE: 50

MATCH_SIZE: 2

GAP_PENALTY: -1

OVERLAP_WINDOW: 3

E_VALUE: 1e-10

MAX GAPS: 15

############### Statistics ###############

Number of collinear genes: 0, Percentage: -nan

Number of all genes: 0

##########################################
Can you please suggest, how can I fix this?
My blast and gff input files, have the following content types:

$ head nn_nk.blast
nn14A57224 nn14A57224 100.000 10294 0 0 1 10294 1 10294 0.0 18565
nn14A57224 nn14A57224 95.876 291 10 2 7740 8028 5213 4923 1.09e-127 464
nn14A57224 nn14A57224 95.876 291 10 2 4923 5213 8028 7740 1.09e-127 464
nn14A57224 nn14A57224 76.336 524 78 13 2697 3184 1623 2136 6.46e-99 369
nn14A57224 nn14A57224 76.336 524 78 13 1623 2136 2697 3184 6.46e-99 369
nn14A57224 nn14A57224 84.615 312 31 7 5818 6120 1833 2136 1.62e-87 331
nn14A57224 nn14A57224 84.615 312 31 7 1833 2136 5818 6120 1.62e-87 331
nn14A57224 nn14A57224 80.872 298 51 4 5819 6111 2879 3175 1.63e-68 268
nn14A57224 nn14A57224 80.872 298 51 4 2879 3175 5819 6111 1.63e-68 268
nn14A57224 nn14A57224 89.130 138 13 2 5677 5813 1997 2133 2.75e-40 174

$ head nn_nk.gff
nn14 nn14A57224 29426 39720
nn14 nn14A57225 41412 54382
nn14 nn14A57226 58913 65705
nn14 nn14A57227 70249 82715
nn14 nn14A57228 83915 85686
nn14 nn14A57229 93024 239710
nn14 nn14A57230 248031 267724
nn14 nn14A57231 267964 268794
nn14 nn14A57232 293986 419804
nn14 nn14A57233 422076 434387

Collinear block coordinates

Hi,
I am not sure if I am missing something, but could it be possible to extract the entire block coordinates? So instead of the genes in that block, we have a start (1st gene start position) and stop (last gene final position) coordinates in that corresponding contig/scaffold.

Ideally, if we have the collinearity file:

`##########################################

Alignment 0: score=8871.0 e_value=0 N=188 at1&at1 plus

0- 0: AT1G17240 AT1G72300 0
0- 1: AT1G17290 AT1G72330 0
0- 2: AT1G17310 AT1G72350 5e-41
0- 3: AT1G17350 AT1G72420 2e-113
0- 4: AT1G17380 AT1G72450 7e-63
0- 5: AT1G17400 AT1G72490 2e-82
0- 6: AT1G17420 AT1G72520 0
......
0-183: AT1G22270 AT1G78190 6e-45
0-184: AT1G22280 AT1G78200 2e-107
0-185: AT1G22300 AT1G78220 1e-72
0-186: AT1G22330 AT1G78260 1e-63
0-187: AT1G22340 AT1G78270 3e-174

Alignment 1: score=2935.0 e_value=2.7e-245 N=64 at1&at1 plus

1- 0: AT1G10640 AT1G60590 0
...`

If we can convert it to an output file like .gff or .bed with the block start-stop positions, and including both columns. Something like:

at1-block0(1st column) start (1st gene coordinates AT1G17240) stop(last gene AT1G22340 final pos.) at1-block0(2nd column) start (1st gene coordinates AT1G72300) stop(last gene AT1G78270)

Extracting the first coordinate (coming from the 1st gene) for each block (and column) seems easy but I find it hard to implement the last gene into it.

Thank you

circle_plotter.java:232: error: cannot find symbol

Hi, I was compile the code downloaded from MCScanX home page. But it failed in making the programs under 'downstream_analyses' directory. Here is the message:

javac -g circle_plotter.java
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: variable Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: variable Cubic
location: class circle_plotter
6 errors
make: *** [circle_plotter.class] Error 1

Could you help to solve this?
Thank you very much!

Best,
ZC

The problem with drawing with the JAVA MCscanX

Hello @BenjaminSchwessinger @jannafierst
Recently I have read the literature (https://onlinelibrary.wiley.com/doi/10.1111/jse.12850) and conducted research on LTR- retrotransposon.
In this article, the author made the following image using MCscanX. The image seems to be drawn by the Python version of MCscanX, but the method seems to be the old MCscanX method.

图片3453

Method:
"2.6 Syntenic LTR retrotransposons analysis
We firstly extracted the coding sequences of all LTR retrotransposons from three
chromosome-level genomes, PN40024, V. ripara and V. amurensis genomes. Then
the coding sequences were translated into amino acid sequences by using TBtools
v1.098 (Chen et al., 2020). Next, multiple sequence alignment inter- and intra-species was performed using blastp of BLAST+ v2.10.1 with ‘-evalue 1e-5,-outfmt 6’ (Camacho et al., 2009), and finally the alignment results were entered
into MCScanX (Wang et al., 2012) with default parameters for syntenic analysis.
The downstream script ‘duplicate_gene_classifier’ was used to classify origins
duplicate LTR retrotransposons into tandem, proximal, dispersed, segmental or
singleton. The number of intra-species syntenic LTR retrotransposons and the
number of inter-species syntenic LTR retrotransposons was dissected by the script
‘dissect_multiple_alignment’. We clustered the LTR retrotransposons shared
among these three genomes according to their syntenic LTR retrotransposons
within each genome (results from MCScanX) using Yifan Hu multilevel layout
algorithm of Gephi v0.9.2 (https://gephi.org/)."

I used the method used by the authors of this article and got something like this:

dual_synteny

What's the matter, please? How can I make figures of MCscanX in Python while using MCscanX in Java?

help for old version of MCScanX for parameter U

Dr Wang:
In the early vserions of MSCanX, the parameter “u” designated the intergenetic distance (defualt value:10kb). This may be true for angiosperms, but recently, we sequenced a fern genome, and found the average genetic distance about 100kb, We have used the updated version of MCSCanX, the propotions of synteny blocks was extremly small, we wonder know what the default parameter for the prevous 'U'. W're appreciated if you provides us the old versions with paramter setting for U
Thanks for your help.
Wei

got error while installing MCScanX on Ubuntu 14.04

I did make to install MCScanX and stuck in this problem...........please help...........
I am using corei7, ubuntu 14.04

g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
g++ struct.cc mcscan_h.cc read_homology.cc out_homology.cc dagchainer.cc msa.cc permutation.cc -o MCScanX_h
g++ struct.cc dup_classifier.cc read_data.cc out_utils.cc dagchainer.cc cls.cc permutation.cc -o duplicate_gene_classifier
g++ dissect_multiple_alignment.cc -o downstream_analyses/dissect_multiple_alignment
dissect_multiple_alignment.cc: In function ‘int main(int, char**)’:
dissect_multiple_alignment.cc:252:44: error: ‘getopt’ was not declared in this scope
while ((c = getopt(argc, argv, "g:c:o:")) != -1)
^
dissect_multiple_alignment.cc:257:32: error: ‘optarg’ was not declared in this scope
sprintf(gpath,"%s",optarg);
^
dissect_multiple_alignment.cc:269:17: error: ‘optopt’ was not declared in this scope
if (optopt!='g' || optopt!='c' || optopt!='o')
^
make: *** [mcscanx] Error 1

Cluster enabled version of MCScanX

Hello Dr. Yupeng Wang,

I am using the MCScanX algorithm for computing synteny among 48 genomes. I have a total of 1.36 million protein sequences and a 4.3G blast file. Since this takes a long time, I was wondering if there is a cluster or MPI enabled version of the tool to speed up the process.

Thanks
Abhijit

Mac installation MCScanX Eroor

appledeMacBook-Pro-3:MCScanX Abdullah$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
struct.cc:48:5: error: use of undeclared identifier 'exit'; did you mean
'_exit'?
exit(1);
^~~~
_exit
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/unistd.h:429:7: note:
'_exit' declared here
void _exit(int) __dead2;
^
1 error generated.
make: *** [mcscanx] Error 1
appledeMacBook-Pro-3:MCScanX Abdullah$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
struct.cc:48:5: error: use of undeclared identifier 'exit'; did you mean
'_exit'?
exit(1);
^~~~
_exit
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/unistd.h:429:7: note:
'_exit' declared here
void _exit(int) __dead2;
^
1 error generated.
make: *** [mcscanx] Error 1

family_circle_plotter error: cannot find symbol

I have not seen this answered elsewhere.

Hello, when I run family_circle_plotter I get this output error and I can't find information on the web to solve my problem. I would be greatful for any advice.

family_circle_plotter.java:303: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic

Am I missing a required library or something?

Thanks very much in advance

Wrong bed like file

image
Hi, I was using MCScanX, thanks for your contribution to this marvelous software.
But I found a little problem that the order of column might be mistaken that would mislead users.
The gene name should be in the second column but not the last column?

issus installing MCScanX

Hi,
i downloaded the zip file from git hub and followed instructions,
after make command i received this error:

g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
make: g++: No such file or directory
make: *** [makefile:2: mcscanx] Error 127

also, i tried to download the zip file from http://chibba.pgml.uga.edu/mcscan2/
and got to a 404 not found page
what do i need to do?
thanks in advance
Alon

Error in family_tree_plotter

Hi
I want to display segmental and tandem duplications in a gene family tree. I ran the following commands :

javac family_tree_plotter.java
java family_tree_plotter -t ../../mcscanx_input/tree.nwk -s ../../mcscanx_input/rice.collinearity  –d ../../mcscanx_input/rice.tandem -o tandem_segmental_genes_tree.png

I get this error:

Exception in thread "main" java.lang.NullPointerException
        at family_tree_plotter.compute(family_tree_plotter.java:263)
        at family_tree_plotter.paint(family_tree_plotter.java:299)
        at family_tree_plotter.main(family_tree_plotter.java:397)

Please help me resolve this error. I am doing it on ubuntu in WSL. All other MCScanX commands I ran were successfull except this one.

Thank You

Example DATA not working

I have tried to use the sample data and it doesn't seem to work? Perhaps I am doing something wrong but I seem not to be able to get any collinearity.

Please fix the bug.

compilation error

" msa.cc: In function ‘void msa_main(const char*)’:
msa.cc:289:22: error: ‘chdir’ was not declared in this scope
if (chdir(html_fn)<0)
^
make: *** [mcscanx] Error 1" 
"""

Solution

http://ubuntuforums.org/showthread.php?t=2205151

if you are building on 64-bit you may need to add

#include <unistd.h>

to msa.h, dissect_multiple_alignment.h, and detect_collinear_tandem_arrays.h

Mscanx Alignment e-value

Hey, I got a quick question about the e-value cutoff for collinear genes.

For example, here is a list of syntenic blocks. All the genes have very low e-value from blast. How do you calculate the score and e-value from for the syntenic blocks?

Best,
Li

Alignment 2: score=150.0 e_value=0.057 N=3 HZ1&JO11 minus

2- 0: HZgenemark-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.107 JOYER156C 3e-111
2- 1: HZsnap_masked-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.62 JOYER155C 4e-172
2- 2: HZgenemark-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.130 JOYER154W 1e-47

Installing error

I got the following error while trying to install MCScanX. Not sure where the problem comes. Hope someone can point out. Thx!
yh362@mocha:~/tools/MCScanX$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
g++ struct.cc mcscan_h.cc read_homology.cc out_homology.cc dagchainer.cc msa.cc permutation.cc -o MCScanX_h
g++ struct.cc dup_classifier.cc read_data.cc out_utils.cc dagchainer.cc cls.cc permutation.cc -o duplicate_gene_classifier
g++ dissect_multiple_alignment.cc -o downstream_analyses/dissect_multiple_alignment
g++ detect_collinear_tandem_arrays.cc -o downstream_analyses/detect_collinear_tandem_arrays
cd downstream_analyses/ && make
make[1]: Entering directory '/data/home/yh362/tools/MCScanX/downstream_analyses'
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/data/home/yh362/tools/MCScanX/downstream_analyses'

MCScanX truncated results.

Dear developer and users, I am facing an issue while running MCScanX between two very closely related Drosophila species. I used blastp to blast proteins of each genome with itself and the target genome using " blastp -query -db -evalue 1e-10 -max_target_seqs 5 -outfmt 6 " and concatenated all 4 outputs generated using cat cmd. I concatenated the .gff from two genomes and ran MCScanX. The run completes without any errors but the collinearity file is incomplete with only 3 alignments. I have attached the output with this query. Kindly suggest where am I going wrong. I expect a high degree of collinearity between the genomes as they are sibling species. My inputs are in following formats.
I tried running blastall with the same protein files only to find the collinearity file with only 1 alignment.

$ head Dnas_Dalb_mcscanX.blast
XP_034100989.1 XP_034100989.1 100.000 18460 0 0 1 18460 1 18460 0.0 36057
XP_034100989.1 XP_034104103.1 27.285 1444 753 60 114 1461 195 1437 8.20e-67 255
XP_034100989.1 XP_034104103.1 27.795 1324 716 59 229 1461 148 1322 9.48e-64 245
XP_034100989.1 XP_034104103.1 29.673 1102 575 63 110 1172 420 1360 4.33e-60 233
XP_034100989.1 XP_034104103.1 26.980 745 362 32 90 697 738 1437 5.19e-36 153

$ head Dnas_Dalb_mcscanX.gff
NC_047628.1 XP_034099297.1 24519288 24535641
NC_047628.1 XP_034100555.1 24538132 24554148
NC_047628.1 XP_034100562.1 24623105 24626154
NC_047628.1 XP_034099816.1 24628513 24629322
NC_047628.1 XP_034097326.1 24633653 24637715
collinearity_file.txt

Looking forward for a positive response.

thank you

missing information about how to specify information of the third column in the homology file in MCScanX_h

Dear Yupeng,

in the manual (http://chibba.pgml.uga.edu/mcscan2/documentation/manual.pdf) you are stating that the user can specify if higher or lower values are preferred when using MCScanX_h ("When the third column is used, users need to specify whether higher or lower values are preferred.").

However, in the manual you are not stating how to state this information. Could you please mention explicitly in the manual how values are interpreted and how this information can be changed. Also, could you please describe in more detail how these values are interpreted.

Question on missing homologous pairs in the .colinearity file

Hello,

I have a question regarding the default implementation of MCScanX,
As the manual suggested, I first run blastp and obtained the .gff and .blast file that were used as input for MCScan. In the .blastp file, I have noticed gene pairs with alignment e-value as low as 1e-110. However, these matched gene pairs were not shown in the .colinearity file after I run MCScanX. I have read through the manual and the paper, but still couldn't find any explanation for that. Can anyone please clarify?

Thanks in advance,
Mandy,

MCScanX Error - Can't plot the graphs (Downstream analysis)

Hi,
I am trying to use the MCScanX tool(the latest release) for ploting circular displays of syntenic genomic blocks and I am using CentOS 7 with Java version:

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

When I tried running $ java circle_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file
[tt@localhost downstream_analyses]$ java circle_plotter -g ../MADS/MADS.gff -s ../MADS/MADS.collinearity -c circle.ctl -o MADS.png
I got the following error message:

Exception in thread "main" java.lang.NullPointerException
        at circle_plotter.paint(circle_plotter.java:142)
        at circle_plotter.main(circle_plotter.java:262)

Then I tried another drawing tool dual_synteny_plotter and I still got the error message:

[tt@localhost downstream_analyses]$ java dual_synteny_plotter -g ../MADS/MADS.gff -s ../MADS/MADS.collinearity -c dual_synteny.ctl -o dual_synteny.PNG
Exception in thread "main" java.lang.NullPointerException
        at dual_synteny_plotter.paint(dual_synteny_plotter.java:148)
        at dual_synteny_plotter.main(dual_synteny_plotter.java:265)

Do you know how to solve it? All data are stored in my repo.
Many thanks in advance.

MCScanX_h

Hello, your software is very good. I saw in your article that the paired homologues that can be analyzed by orthomcl software are used as the input file of MCScanX_h. I would like to ask if it can only be generated by orthomcl and can be generated by orthofinder? Do files have to be in pairs? If a gene has multiple homologs, it is scored as multiple lines, right?

Epoch

How should value of epoch number be specified for MC-ScanX runs?
Will it change depending on the taxonomic / phylogenetic distribution of species being analyzed: within genus vs across genera vs across families etc.? Thanks in advance

format of input bed file

hello,

I am preparing the inputs for MCScanX and I find this incongruence:
you say you want a .bed file, but the format you give in the readme is different:
The xyz.gff file holds gene positions, following a tab-delimited format:
"sp&chr_NO gene starting_position ending_position"
while in the page you point at for the bed format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) the coordinates are in col 2 and 3, not 3 and 4.

Also, is it OK to have other columns in the bed file, like

lm1     2617    2650    Lmu01_1T0000010.1:three_prime_utr       .       -       maker   three_prime_UTR .       ID=Lmu01_1T0000010.1:three_prime_utr;Parent=Lmu01_1T0000010.1
lm1     2617    2679    Lmu01_1T0000010.1:exon:6        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:6;Parent=Lmu01_1T0000010.1
lm1     2617    6339    Lmu01_1G0000010 .       -       maker   gene    .       ID=Lmu01_1G0000010;Name=Lmu01_1G0000010;Alias=scaffold1-snap-gene-0.16
lm1     2617    6339    Lmu01_1T0000010.1       .       -       maker   mRNA    .       ID=Lmu01_1T0000010.1;Parent=Lmu01_1G0000010;Name=Lmu01_1T0000010.1;Alias=scaffold1-snap-gene-0.16-mRNA-1;_AED=0.32;_eAED=0.32;_QI=0|0.5|0.4|0.6|0.25|0.4|5|33|124
lm1     2650    2679    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:exon:5        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:5;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:exon:4        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:4;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:cds   .       -       maker   CDS     1       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:exon:3        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:3;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:exon:2        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:2;Parent=Lmu01_1T0000010.1

One last thing: it is not clear which is the "reference" and query in the blast and in general: if I have a draft "cabbage" assembly and annotation to aling to arabidopsis, do I keep arabidopsis as the database? (I guess so). and the .bed file, is it from the cabbage annotation? not clear to me, but this is what I guess. Is the draft genome scaffold length taken into account somewhere? E.g. if I have genes just on half of a scaffold, how does it get drawn?
Thanks,

Dario

cannot open cds file

(base) padanas-MBP:downstream_analyses padana$ perl add_ka_and_ks_to_collinearity.pl -i ./data/crst-psr.collinearity -d ./data/crstpsr.cds -o kkkkkk
Cannot open cds_file!
i am facing this error.
can you guide me ?
my CDS file header

EVM0023101.1
header have .1 due to this error?/
i am interested to find kaks of orthologus of 2 species i have merge cds and have one file of cds

Circle_plotter prints half circle with higher number of chromosomes

Hi,

Fisrt of all, thanks for developing such a nice tool.

I'm having an issue when trying to create circle plots for higher number of chromosomes. For the complete set of chromosomes, I'm having the following "half" printed circle:

mm_lf circle 23

This was generated with the following command:

java circle_plotter -g mm_lf.gff -s mm_lf.collinearity -c mm_lf.ctl -o mm_lf.circle.png

The CTL file looked like this:

800     //plot width and height (in pixels)
mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8,mm9,mm10,mm11,mm13,mm14,mm15,mm16,mm18,lf1,lf2,lf3,lf4,lf5,lf6,lf7,lf8,lf9,lf10,lf12,lf14,lf17  //chromosomes in the circle

Please find attached the COLLINEARITY, GFF and CTL files:
mm_lf.gff.gz
mm_lf.collinearity.gz
mm_lf.ctl.gz

I've noticed that if I decrease the number of chromosomes to 24 or 23 (depending on the chromosomes removed) the circle_plotter script works just fine:

mm_lf reduced circle 05

I was wondering if you have experienced a similar issue and if yes, what could be a possible workaround.

Best wishes,

Docker container and working example data

Hi,

I am really excited to try out this set of tools, and want to do it through Docker. However, I cannot get the example data to work, I guess because I have installed it badly (seems others have this issue too). Does anyone have a working Docker container to run the programs?

blastall: command not found

I have installed MCSCANX but the command blastall is not there even if blast+ is installed on my device, Is there any other alternative for blastall.
Also running the normal blastp gives e value error as it is not recognized by the BLAST+ program package.

detect_collinear_tandem_arrays segmentation fault

Hello,
I am not able to get the downstream_analyses executable detect_collinear_tandem_arrays to run properly. I downloaded MCScanX from the MCScanX homepage (Dec 2017), and have compiled on two different systems (Mac OS High Sierra 10.13.1 and CentOS 6.6 Linux). In both cases, MCScanX and MCScanX_h seem to run correctly, but I get the following error with detect_collinear_tandem_arrays:

Reading BLAST file and pre-processing
Generating BLAST list
38556 matches imported (6746 discarded)
Detecting tandem arrays...
Segmentation fault: 11

Is there a fix for this problem, or any thoughts as to why it is happening? I have also tried on three different datasets, each returns the same error.

One issue I have discovered so far that also seems important is that the vector alltandempair is not populating, at least prior to the sort command in the compute_tandem() module of detect_collinear_tandem_arrays.cc and so the corresponding detect_collinear_tandem_arrays executable.

Latest: as far as I can tell, the segmentation fault is returned when there are no collinear tandem arrays identified, so it looks like this is a feature of the data I am using and not a bug in the script.

Thanks,

Matt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.