Coder Social home page Coder Social logo

wyp1125 / mcscanx Goto Github PK

View Code? Open in Web Editor NEW
254.0 14.0 65.0 102.04 MB

MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!

Home Page: http://chibba.pgml.uga.edu/mcscan2/

License: BSD 2-Clause "Simplified" License

C 0.01% C++ 0.28% Java 0.27% Perl 0.03% Makefile 0.01% CAP CDS 99.39% Raku 0.01% Visual Basic 6.0 0.01%
c-plus-plus java perl dynamic-programming visualization

mcscanx's Introduction

MCScanX

License:BSD

The MCScanX package has two major components: a modified version of MCscan algorithm allowing users to handle MCScan more conveniently and to view multiple alignment of syntenic blocks more clearly, and a variety of downstream analysis tools to conduct different biological analyses based on the synteny data generated by the modified MCScan algorithm.

All programs are executed using command line options on Linux systems or Mac OS. Usage or help information are well built into the programs. To show them on the screen, users just need to run the program without giving any options:

$./program_name

MCScanX flow chart

All code is copiable, distributable, modifiable, and usable without any restrictions. Contact: Yupeng Wang, [email protected]; Xu Tan, [email protected]

Simply put MCscanX.zip into a directory and run:

$unzip MCscanx.zip
$cd MCScanx
$make

Main programs (in the main folder)

  • MCScanX
  • MCScanX_h
  • Duplicate_gene_classifier

Downstream analysis programs (in the downstream_analyses folder)

  • Tool 1. detect_syntenic_tandem_arrays
  • Tool 2. dissect_multiple_alignment
  • Tool 3. dot_plotter.java
  • Tool 4. dual_synteny_plotter.java
  • Tool 5. circle_plotter.java
  • Tool 6. bar_plotter.java
  • Tool 7. add_ka_and_ks_to_synteny.pl
  • Tool 8. group_collinear_genes.pl
  • Tool 9. detect_collinearity_within_gene_families.pl
  • Tool 10. family_circle_plotter.java
  • Tool 11. family_tree_plotter.java
  • Tool 12. origin_enrichment_analysis.pl

This program, implementing a modified MCScan algorithm, detects syntenic blocks and progressively aligns multiple syntenic blocks against reference genomes (PIVOT).

  • Usage

MCscan2 reads in two data files: xyz.blast and xyz.gff. The xyz.blast file is simply the direct BLASTP output of m8 format as following:

AT1G50920   AT1G50920    100.00  671     0       0       1       671     1       671     0.0     1316

Here is a typical parameter setting for generating the xyz.blast file:

$blastall  -i  query_file  -d database -p blastp -e 1e-10 -b 5 -v 5 -m 8 -o xyz.blast

The xyz.bed file holds gene positions, following a tab-delimited format:

chr#    starting_position       ending_position gene

Note: for chr#, a two-letter short name is used as prefix for the species; # is the chromosome number. (For example, the second chromosome of Arabidopsis thaliana should be denoted as at2.) The bed format is defined here, and is especially useful since there are a ton of tools that can handle bed files, most notably BEDTOOLS. The xyz.bed file can be generated by parsing the .gff3 file released by the sequencing initiatives. Repeat of the same gene is not allowed in the .bed file. When comparing multiple genomes, simply concatenate all inter-/intra-species m8 blast output into xyz .blast file and concatenate all gene positions of different species into xyz.bed file.

It is advised that to make MCscanX generate more reasonable results, the number of BLASTP hits for a gene should be restricted to around top 5. When you have xyz.blast and xyz.bed ready, put them in the same folder. Then you can simply use:

$ ./MCScanx  dir/xyz
  • Output

The execution of MCScanX outputs one text file xyz.syteny, containing pairwise syteny blocks as follows:

## Alignment 0: score=9171.0 e_value=0 N=187 at1&at1 plus
  0-  0:        AT1G17240       AT1G72300       0
  0-  1:        AT1G17290       AT1G72330       0
  ...
  0-185:        AT1G22330       AT1G78260       1e-63
  0-186:        AT1G22340       AT1G78270       3e-174
##Alignment 1: score=5084.0 e_value=5.6e-251 N=106 at1&at1 plus

and one directory xyz.html , containing html files that display multiple alignment of syntenic blocks against each chromosome. The HTML files must be viewed through a web browser. In a HTML file, the first column shows the number of syntenic blocks at each gene locus, the second column shows the genes in PIVOT (reference chromosome) where tandem genes are marked in red, and the following is aligned syntenic blocks where only match genes are displayed.

  • MCScanX parameters (for advanced users)

[Usage]:

        ./MCScanX prefix_fn [options]

-k  MATCH_SCORE, final score=MATCH_SCORE+NUM_GAPS*GAP_PENALTY
    (default: 50)
-g  GAP_PENALTY, gap penalty (default: -1)
-s  MATCH_SIZE, number of genes required to call synteny
    (default: 5)
-e  E_VALUE, alignment significance (default: 1e-05)
-u  UNIT_DIST, average intergenic distance (default: 10000)
-m  MAX_GAPS, maximum gaps(one gap=UNIT_DIST) allowed (default: 20)
-a  only builds the pairwise blocks (.synteny file)
-b  patterns of syntenic blocks. 0:intra- and inter-species (default); 1:intra-species; 2:inter-species
-h  print this help page

The BLASTP input of MCScanX can be replaced by a tab-delimited file containing more reliable pairwise homologous relationships. In this case, users should use MCScanX_h instead. The executation of MCScanX_h is very similar to that of MCScanX, except that the "xyz.blast" file should be replaced by "xyz.homology" file. At the bottom of screen output, statistics on numbers / percentages of collinear homolog pairs are shown.

Users may use this program, which incorporate the MCScanX algorithm, to classify origins of the duplicate genes of ONE genome into whole genome /segmental (match genes in syntenic blocks), tandem (continuous repeat), proximal (in nearby chromosomal region but not adjacent) or dispersed (other modes than segmental, tandem and proximal) duplications.

  • Usage:

    $ ./duplicate_gene_classifier  dir/xyz
    

The input of duplicate_gene_classifier is the same with MCscanX, except an additional option for defining the maximum distance (# of genes) between 2 proximal duplicates.

  • Output

The output is a text file in the same directory with input files named xyz.gene_type. It contains origin information for all the genes in xyz.gff file with a tab-delimited format:

Gene    gene_type(0/1/2/3/4)

Note: 0, 1, 2, 3, 4 stand for singleton, dispersed, proximal, tandem, segmental respectively. It is not reasonable to apply this program to data of multiple genomes.

Tandem duplications often complicate synteny detection. To enhance the power of synteny detection, MCScan algorithms use the gene with best BLASTP hit to represent a tandem array. This program transforms match genes in syntenic blocks into tandem arrays if tandem duplications exist there.

  • Usage:

    $ ./detect_syntenic_tandem_arrays -g gff_file -b blast_file -s synteny_file -o output_file
    
  • Output

The path of output_file should be specified by the user. If any gene of a syntenic pair is located in a tandem array, the syntenic pair will be written into the output_file.

This program dissects the number of syntenic blocks at each gene locus of the reference genome(s) into the number of intra-species syntenic blocks and the number of inter-species syntenic blocks.

  • Usage:

    $ ./dissect_multiple_alignment -g gff_file -s synteny_file -o output_file
    
  • Output

The path of output_file should be specified by the user. The first and second columns of output_file show the chromosomes and genes in reference genome(s). The 3rd, 4th and 5th columns show the numbers of intra-species syntenic blocks, inter-species syntenic blocks and outgroup species respectively.

This java script generates a dot plot for all the syntenic blocks on two sets of chromosomes given by the user. Note that JDK is needed for executing Java programs.

  • Usage:

    $ java dot_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the dot.ctl file:

800     //dimension (in pixels) of x axis
800     //dimension (in pixels) of y axis
sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10        //chromosomes in x axis
os1,os2,os3,os4,os5,os6,os7,os8,os9,os10,os11,os12      //chromosomes in y axis

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each dot is a sytenic gene pair between the two sets of chromosomes. Different colors of dots, generated randomly, represent different syntenic blocks.

This java script generates a dual synteny plot which links all the synteny blocks between two sets of chromosomes using straight lines.

  • Usage:

    $ java dual_synteny_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the column.ctl file:

200     //plot width (in pixels)
800     //plot height (in pixels)
sb1,sb2 //chromosomes in the left column
os1,os2,os3     //chromosomes in the right column

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each line links a pair of syntenic genes between the two sets of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

This Java scripts generates a circular plot which links all the syntenic blocks with curved lines between and within the chromosome set given by users.

  • Usage:

    $ java circle_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the circle.ctl file:

800     //plot width and height (in pixels)
sb1,sb2,os1,os2,os3     //chromosomes in the circle

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each curved line links a pair of syntenic genes between or within the given set of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

This Java scripts generates a bar plot displaying chromosome rearrangement between reference and target chromosome sets given by users.

  • Usage:

    $ java bar_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the bar.ctl file:

800     //dimension (in pixels) of x axis
800     //dimension (in pixels) of y axis
sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10        //reference chromosomes
os1,os2,os3,os4,os5,os6,os7,os8,os9,os10,os11,os12      //target chromosomes

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each curved line links a pair of syntenic genes between or within the given set of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

This program calculates the Ka & Ks value of each syntenic gene pair shown in the MCscan2 output (.synteny file). Bio-perl is needed for executing this program.

  • Usage:

    $ perl add_kaks_to_synteny.pl -i synteny_file -d cds_file -o output_file
    

The input is a xyz.syteny file generated by MCScanX and a coding sequence file of corresponding gene set in fasta format.

  • Output

Users should specify the path of output file. The output file is a modified version of xyz.syteny file with each line containing a syntenic gene pair and its ka & ks values.

This program groups genes through connecting collinear genes until any gene in each group has no collinear gene outside the group. This analysis can be used to construct gene families based on syntenic relationships.

  • Usage:

    $ perl group_collinear_genes.pl -i synteny_file -o output_file
    

Input includes a xyz.syteny file generated by MCScanX.

  • Output

The output file displays each group in one line in a tab-delimited format. Note, the first group (the largest size) usually contains much more genes than other groups, should be regarded as non-informative.

This program detects collinear gene pairs within gene families.

  • Usage

Input includes a xyz.syteny file generated by MCScanX and a gene family file in tab-delimited format with gene family name in the first column:

Gene_family_1   gene1   gene2   gene3   ...     genex
Gene_family_2   gene1   gene2   gene3   ...     genex
  • Output

The output file gives the syntenic pairs of the given gene families in tab-delimited format:

Gene_family_1   gene_pair1      gene_pair2      ...     gene_pairx
Gene_family_2   gene_pair1      gene_pair2

This java script generates a circular plot which links all sytenic genes within a gene family with red curved lines, and places the gene family synteny into a genomic synteny background.

  • Usage:

    $ java family_circle_plotter -g gff_file -s synteny_file -c control_file -f gene_family_file -o output_jpeg_file
    

The input files include a .gff file containing all gene positions, a .synteny file generated by MCScanX, a control file (.ctl) containing the plot size and chromosome IDs and a gene family file containing only one gene family with the aforementioned format. The control file can be easily made by modifying the family.ctl file:

800     //plot width and height (in pixels)
at1, at2, at3, at4, at5 //chromosomes in the circle

Note: users can input just the chromosomes of interest into the family.ctl file. This will generate a circular plot within the given chromosomes set.

  • Output

Output is an image file which can be viewed with any image. Each red curved line links a pair of syntenic genes within the given gene family. The grey lines stand for genomic synteny background.

This java script generates a gene family tree on which syntenic gene pairs and tandem gene groups are linked with red and blue curves respectively.

  • Usage:

    $ javac family_tree_plotter.java (compile the first time it is used)
    $ java family_tree_plotter -t tree_file -s synteny_file -o output_PNG_file (show syntenic gene pairs only)
    $ java family_tree_plotter -t tree_file -s synteny_file –d tandem_pair_file -o output_PNG_file (show both tandem and syntenic gene pairs)
    

The input files include a .synteny file generated by MCScanX and a tree file for the gene family in newick format (bracket tree).

Users can set up the plot width, plot height, and font_size with the following options: -x plot_width -y plot height -f font_size

  • Output

The output is an image file (PNG format) which can be viewed with an image viewer;

Note: this script aims to show the synteny and tandem overview for a gene family. The branch lengths are disregarded, thus do not reflect the true value.

This program identifies potential enrichment of duplicate gene origins for input gene families according to the result of Duplicate_gene_classifier.

  • Usage:

    $ perl origin_enrichment_analysis.pl -i gene_family_file -d gene_origin_file  -o output_file
    

This perl program takes in a gene family file with the same format as the above ones and the gene origin file generated by Duplicate_gene_classifier.

  • Output

The output is the p-values of different origins for the given gene families

mcscanx's People

Contributors

wyp1125 avatar jingping avatar huizhejin avatar martin-g avatar tanghaibao avatar benjaminschwessinger avatar xtan avatar xiaodli avatar

Stargazers

Gaorui Gong avatar Joe Colgan avatar Alexander Sandercock avatar  avatar liheng avatar Li Feng avatar haochen li avatar Yuqi avatar Adam Taranto avatar  avatar Lucas Rocha avatar DunJin-Fan avatar Liangliang Gao avatar LiuHL avatar XenoGnome avatar Konstantinos Papachristos avatar yanpeng avatar Kocheriologies avatar Quanyu Chen avatar Allinone avatar Nicholas avatar  avatar yuhang avatar  avatar  avatar Alice LAIGLE avatar Fernando Duarte avatar Maciej (Magic) Balura avatar  avatar Masaki Takagawa avatar  avatar Shengchen Shan avatar Tonglu avatar bgirma avatar  avatar Takuya Fukuju avatar Changchuanjun avatar  avatar  avatar Sun Yile avatar  avatar Jun Zhang avatar BioEvo_HaominLyu avatar Camilo H. Parad Rojas avatar Paula Ruiz Rodriguez avatar nik gaffney avatar  avatar  avatar Quentin Andres avatar Gabriel Rodrigues avatar  avatar Giovanni Delgado avatar Febrina Margaretha avatar Michael Foster avatar  avatar  avatar  avatar Ramses Rosales avatar  avatar XieL avatar Qiongqiong Dai avatar Eric Edsinger avatar Brice Letcher avatar Yuqian Jiang avatar  avatar  avatar Tang JIWEI avatar Qin Lin avatar  avatar  avatar Zhuo CHEN avatar  avatar  avatar Rui Li avatar Chujia Chen avatar 徐超群 avatar  avatar James P B Lloyd avatar Unknown Tokyo avatar  avatar Yang Haoyan avatar liudonghua avatar  avatar  avatar  avatar yanan zhang avatar  avatar  avatar  avatar  avatar  avatar Atilio O. Rausch avatar  avatar Rafael M Piergiorge avatar  avatar  avatar Samuel Lampa avatar  avatar Lars Höök avatar Julen Gamboa avatar

Watchers

 avatar  avatar Tae-Ho Lee avatar  avatar  avatar  avatar young avatar Yun-Long Liu avatar  avatar  avatar Hengfu Yin avatar zhangwenda avatar  avatar  avatar

mcscanx's Issues

Cluster enabled version of MCScanX

Hello Dr. Yupeng Wang,

I am using the MCScanX algorithm for computing synteny among 48 genomes. I have a total of 1.36 million protein sequences and a 4.3G blast file. Since this takes a long time, I was wondering if there is a cluster or MPI enabled version of the tool to speed up the process.

Thanks
Abhijit

The problem with drawing with the JAVA MCscanX

Hello @BenjaminSchwessinger @jannafierst
Recently I have read the literature (https://onlinelibrary.wiley.com/doi/10.1111/jse.12850) and conducted research on LTR- retrotransposon.
In this article, the author made the following image using MCscanX. The image seems to be drawn by the Python version of MCscanX, but the method seems to be the old MCscanX method.

图片3453

Method:
"2.6 Syntenic LTR retrotransposons analysis
We firstly extracted the coding sequences of all LTR retrotransposons from three
chromosome-level genomes, PN40024, V. ripara and V. amurensis genomes. Then
the coding sequences were translated into amino acid sequences by using TBtools
v1.098 (Chen et al., 2020). Next, multiple sequence alignment inter- and intra-species was performed using blastp of BLAST+ v2.10.1 with ‘-evalue 1e-5,-outfmt 6’ (Camacho et al., 2009), and finally the alignment results were entered
into MCScanX (Wang et al., 2012) with default parameters for syntenic analysis.
The downstream script ‘duplicate_gene_classifier’ was used to classify origins
duplicate LTR retrotransposons into tandem, proximal, dispersed, segmental or
singleton. The number of intra-species syntenic LTR retrotransposons and the
number of inter-species syntenic LTR retrotransposons was dissected by the script
‘dissect_multiple_alignment’. We clustered the LTR retrotransposons shared
among these three genomes according to their syntenic LTR retrotransposons
within each genome (results from MCScanX) using Yifan Hu multilevel layout
algorithm of Gephi v0.9.2 (https://gephi.org/)."

I used the method used by the authors of this article and got something like this:

dual_synteny

What's the matter, please? How can I make figures of MCscanX in Python while using MCscanX in Java?

format of input bed file

hello,

I am preparing the inputs for MCScanX and I find this incongruence:
you say you want a .bed file, but the format you give in the readme is different:
The xyz.gff file holds gene positions, following a tab-delimited format:
"sp&chr_NO gene starting_position ending_position"
while in the page you point at for the bed format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) the coordinates are in col 2 and 3, not 3 and 4.

Also, is it OK to have other columns in the bed file, like

lm1     2617    2650    Lmu01_1T0000010.1:three_prime_utr       .       -       maker   three_prime_UTR .       ID=Lmu01_1T0000010.1:three_prime_utr;Parent=Lmu01_1T0000010.1
lm1     2617    2679    Lmu01_1T0000010.1:exon:6        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:6;Parent=Lmu01_1T0000010.1
lm1     2617    6339    Lmu01_1G0000010 .       -       maker   gene    .       ID=Lmu01_1G0000010;Name=Lmu01_1G0000010;Alias=scaffold1-snap-gene-0.16
lm1     2617    6339    Lmu01_1T0000010.1       .       -       maker   mRNA    .       ID=Lmu01_1T0000010.1;Parent=Lmu01_1G0000010;Name=Lmu01_1T0000010.1;Alias=scaffold1-snap-gene-0.16-mRNA-1;_AED=0.32;_eAED=0.32;_QI=0|0.5|0.4|0.6|0.25|0.4|5|33|124
lm1     2650    2679    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:cds   .       -       maker   CDS     2       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     2801    2855    Lmu01_1T0000010.1:exon:5        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:5;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3657    3694    Lmu01_1T0000010.1:exon:4        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:4;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:cds   .       -       maker   CDS     1       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     3799    4049    Lmu01_1T0000010.1:exon:3        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:3;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:cds   .       -       maker   CDS     0       ID=Lmu01_1T0000010.1:cds;Parent=Lmu01_1T0000010.1
lm1     6334    6339    Lmu01_1T0000010.1:exon:2        .       -       maker   exon    .       ID=Lmu01_1T0000010.1:exon:2;Parent=Lmu01_1T0000010.1

One last thing: it is not clear which is the "reference" and query in the blast and in general: if I have a draft "cabbage" assembly and annotation to aling to arabidopsis, do I keep arabidopsis as the database? (I guess so). and the .bed file, is it from the cabbage annotation? not clear to me, but this is what I guess. Is the draft genome scaffold length taken into account somewhere? E.g. if I have genes just on half of a scaffold, how does it get drawn?
Thanks,

Dario

Installing error

I got the following error while trying to install MCScanX. Not sure where the problem comes. Hope someone can point out. Thx!
yh362@mocha:~/tools/MCScanX$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
g++ struct.cc mcscan_h.cc read_homology.cc out_homology.cc dagchainer.cc msa.cc permutation.cc -o MCScanX_h
g++ struct.cc dup_classifier.cc read_data.cc out_utils.cc dagchainer.cc cls.cc permutation.cc -o duplicate_gene_classifier
g++ dissect_multiple_alignment.cc -o downstream_analyses/dissect_multiple_alignment
g++ detect_collinear_tandem_arrays.cc -o downstream_analyses/detect_collinear_tandem_arrays
cd downstream_analyses/ && make
make[1]: Entering directory '/data/home/yh362/tools/MCScanX/downstream_analyses'
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/data/home/yh362/tools/MCScanX/downstream_analyses'

Circle_plotter prints half circle with higher number of chromosomes

Hi,

Fisrt of all, thanks for developing such a nice tool.

I'm having an issue when trying to create circle plots for higher number of chromosomes. For the complete set of chromosomes, I'm having the following "half" printed circle:

mm_lf circle 23

This was generated with the following command:

java circle_plotter -g mm_lf.gff -s mm_lf.collinearity -c mm_lf.ctl -o mm_lf.circle.png

The CTL file looked like this:

800     //plot width and height (in pixels)
mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8,mm9,mm10,mm11,mm13,mm14,mm15,mm16,mm18,lf1,lf2,lf3,lf4,lf5,lf6,lf7,lf8,lf9,lf10,lf12,lf14,lf17  //chromosomes in the circle

Please find attached the COLLINEARITY, GFF and CTL files:
mm_lf.gff.gz
mm_lf.collinearity.gz
mm_lf.ctl.gz

I've noticed that if I decrease the number of chromosomes to 24 or 23 (depending on the chromosomes removed) the circle_plotter script works just fine:

mm_lf reduced circle 05

I was wondering if you have experienced a similar issue and if yes, what could be a possible workaround.

Best wishes,

issus installing MCScanX

Hi,
i downloaded the zip file from git hub and followed instructions,
after make command i received this error:

g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
make: g++: No such file or directory
make: *** [makefile:2: mcscanx] Error 127

also, i tried to download the zip file from http://chibba.pgml.uga.edu/mcscan2/
and got to a 404 not found page
what do i need to do?
thanks in advance
Alon

How to generate xyz.blast file for MCScanX using the target nuclotide sequences and genome ?

Hi, Yupeng

I have successfully installed MCScanX.
The xyz.blast file was generated by command
blastall -i sim.cluster.fasta -d DSim_pilon.fasta -p blastn -e 1e-10 -b 5 -v 5 -m 8 -a 15 -o xyz.blast

The DSim_pilon.fasta was the database generated by command
formatdb -i DSim_pilon.fasta -p F

The gff file like this
tig00000001|arrow|pilon tig00000001|arrow|pilon:3569293-3582540(-) 3569293 3582540
tig00002065|arrow|pilon tig00002065|arrow|pilon:11837-22762(-) 11837 22762

but after running command< MCScanX xyz>, there was no alignment result in xyz.collinearity file and the html file was also wrong.
image

What mistake did I take when I was generating the .blast file?
Thank you so much!

Kai

MCScanX truncated results.

Dear developer and users, I am facing an issue while running MCScanX between two very closely related Drosophila species. I used blastp to blast proteins of each genome with itself and the target genome using " blastp -query -db -evalue 1e-10 -max_target_seqs 5 -outfmt 6 " and concatenated all 4 outputs generated using cat cmd. I concatenated the .gff from two genomes and ran MCScanX. The run completes without any errors but the collinearity file is incomplete with only 3 alignments. I have attached the output with this query. Kindly suggest where am I going wrong. I expect a high degree of collinearity between the genomes as they are sibling species. My inputs are in following formats.
I tried running blastall with the same protein files only to find the collinearity file with only 1 alignment.

$ head Dnas_Dalb_mcscanX.blast
XP_034100989.1 XP_034100989.1 100.000 18460 0 0 1 18460 1 18460 0.0 36057
XP_034100989.1 XP_034104103.1 27.285 1444 753 60 114 1461 195 1437 8.20e-67 255
XP_034100989.1 XP_034104103.1 27.795 1324 716 59 229 1461 148 1322 9.48e-64 245
XP_034100989.1 XP_034104103.1 29.673 1102 575 63 110 1172 420 1360 4.33e-60 233
XP_034100989.1 XP_034104103.1 26.980 745 362 32 90 697 738 1437 5.19e-36 153

$ head Dnas_Dalb_mcscanX.gff
NC_047628.1 XP_034099297.1 24519288 24535641
NC_047628.1 XP_034100555.1 24538132 24554148
NC_047628.1 XP_034100562.1 24623105 24626154
NC_047628.1 XP_034099816.1 24628513 24629322
NC_047628.1 XP_034097326.1 24633653 24637715
collinearity_file.txt

Looking forward for a positive response.

thank you

Mac installation MCScanX Eroor

appledeMacBook-Pro-3:MCScanX Abdullah$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
struct.cc:48:5: error: use of undeclared identifier 'exit'; did you mean
'_exit'?
exit(1);
^~~~
_exit
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/unistd.h:429:7: note:
'_exit' declared here
void _exit(int) __dead2;
^
1 error generated.
make: *** [mcscanx] Error 1
appledeMacBook-Pro-3:MCScanX Abdullah$ make
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
struct.cc:48:5: error: use of undeclared identifier 'exit'; did you mean
'_exit'?
exit(1);
^~~~
_exit
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/unistd.h:429:7: note:
'_exit' declared here
void _exit(int) __dead2;
^
1 error generated.
make: *** [mcscanx] Error 1

File format gff or bed and what format?

I am a PhD student doing a collaborative project between Trinity College Dublin and UCLondon and am using MCScanX. I am just getting to know the tool and have been trying to use the test data available in the package ( at_vv.gff, at_vv.blast etc.) and am running into a few problems.

Some of the information online says to use .gff and others say .bed, and the format of these files is also conflicting in the documentation. I was wondering if you could tell me how exactly the files needed for MCScan to run should be formatted and is it advisable to have just two (the .bed/.gff and .blast) in a single folder when running ./MCScan command.

I hope you can help me out.

Define single entry point for various actions

This is just a suggestion, to simplify usage. For example, typing samtools will give you

Usage:   samtools  [options]

Command: view        SAM<->BAM conversion
         sort        sort alignment file
         pileup      generate pileup output
         mpileup     multi-way pileup
         faidx       index/extract FASTA

I suggest grouping each command item under the same mcscanx command, then dispatch to function calls. Much easier to use this way.

blastall: command not found

I have installed MCSCANX but the command blastall is not there even if blast+ is installed on my device, Is there any other alternative for blastall.
Also running the normal blastp gives e value error as it is not recognized by the BLAST+ program package.

error: ‘chdir’ was not declared in this scope !

jitendra@jitendra-UNLOCK-INSTALL[MCScanX] make []
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
msa.cc: In function ‘void msa_main(const char*)’:
msa.cc:289:22: error: ‘chdir’ was not declared in this scope
if (chdir(html_fn)<0)
^
makefile:2: recipe for target 'mcscanx' failed
make: *** [mcscanx] Error 1

request for your help to understand the syntheny plot

Hi,

I am new to comparative genomics and I am going to ask a very stupid question. I am so sorry for that. I am working on two vertebrates and assembled a genome for two close species. When I performed synteny analysis using MCSCANX and plotted, I could see a inverted synteny block for Gg_chr5
image

Does this means that this is an assembly error as the rest of all chromosomes are linear and nice, or could it mean that this is a biological discovery? If this is an error, how can I find the coordinates at which I could change the orientation for this scaffold?

I am again so sorry for this stupid question and will be grateful if you could advise me.

regards
Amit

Increase the default length of species ID (from gff/bed)

@tanghaibao @wyp1125 Hi Haibao,hi Yupeng,

Regards! Sorry to bother, I was wondering how to modify the code to make MCS take more letters as the species ID. Now the software take a two-letter short name for the species ID. Because for my analysis, sometimes I have many genomes from the same genus or family, so it's a pain for me to come up with unique two-letter IDs for these genomes. So it would be very handy if the software can take into account, say 4/5 letters as the species ID.
It seems to me maybe I should adjust the function of "read_gff" in "read_data.cc" (maybe somewhere else as well), but I know little about c. Thanks very much!
https://github.com/wyp1125/MCScanX/blob/master/read_data.cc
Best, Tao

Standardize input file format to bed

The gff format is confusing (which is my fault, since at the time there was no bed format). The bed format only has the data in different order, and 0-based. The bed format is defined here, and is especially useful since there are a ton of tools that can handle bed files, most notably BEDTOOLS.

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

You can change from gff format to bed through:

awk '{print $1"\t"$3"\t"$4"\t"$2}' grape.gff >grape.bed

Wrong bed like file

image
Hi, I was using MCScanX, thanks for your contribution to this marvelous software.
But I found a little problem that the order of column might be mistaken that would mislead users.
The gene name should be in the second column but not the last column?

got error while installing MCScanX on Ubuntu 14.04

I did make to install MCScanX and stuck in this problem...........please help...........
I am using corei7, ubuntu 14.04

g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
g++ struct.cc mcscan_h.cc read_homology.cc out_homology.cc dagchainer.cc msa.cc permutation.cc -o MCScanX_h
g++ struct.cc dup_classifier.cc read_data.cc out_utils.cc dagchainer.cc cls.cc permutation.cc -o duplicate_gene_classifier
g++ dissect_multiple_alignment.cc -o downstream_analyses/dissect_multiple_alignment
dissect_multiple_alignment.cc: In function ‘int main(int, char**)’:
dissect_multiple_alignment.cc:252:44: error: ‘getopt’ was not declared in this scope
while ((c = getopt(argc, argv, "g:c:o:")) != -1)
^
dissect_multiple_alignment.cc:257:32: error: ‘optarg’ was not declared in this scope
sprintf(gpath,"%s",optarg);
^
dissect_multiple_alignment.cc:269:17: error: ‘optopt’ was not declared in this scope
if (optopt!='g' || optopt!='c' || optopt!='o')
^
make: *** [mcscanx] Error 1

family_circle_plotter error: cannot find symbol

I have not seen this answered elsewhere.

Hello, when I run family_circle_plotter I get this output error and I can't find information on the web to solve my problem. I would be greatful for any advice.

family_circle_plotter.java:303: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic

Am I missing a required library or something?

Thanks very much in advance

Question regarding blast

Hi, thank you for developing MCScanX. I wonder if it is possible to use output of blastn instead of blastp for the analysis. I have a huge genome encountering many errors (memory and time) to perform balstp jobs. I tried to parallel the analyses but it was not efficient.
Thanks

Docker container and working example data

Hi,

I am really excited to try out this set of tools, and want to do it through Docker. However, I cannot get the example data to work, I guess because I have installed it badly (seems others have this issue too). Does anyone have a working Docker container to run the programs?

Epoch

How should value of epoch number be specified for MC-ScanX runs?
Will it change depending on the taxonomic / phylogenetic distribution of species being analyzed: within genus vs across genera vs across families etc.? Thanks in advance

collinearity file does not have any collinear genes

Hi,

I am trying to compare two snake genomes for identifying collinear genes but currently having some issues with MCScanX. Even I have generated the gff and blast files having exact same genes ids in both files, but the output collinearity file is empty.

############### Parameters ###############

MATCH_SCORE: 50

MATCH_SIZE: 2

GAP_PENALTY: -1

OVERLAP_WINDOW: 3

E_VALUE: 1e-10

MAX GAPS: 15

############### Statistics ###############

Number of collinear genes: 0, Percentage: -nan

Number of all genes: 0

##########################################
Can you please suggest, how can I fix this?
My blast and gff input files, have the following content types:

$ head nn_nk.blast
nn14A57224 nn14A57224 100.000 10294 0 0 1 10294 1 10294 0.0 18565
nn14A57224 nn14A57224 95.876 291 10 2 7740 8028 5213 4923 1.09e-127 464
nn14A57224 nn14A57224 95.876 291 10 2 4923 5213 8028 7740 1.09e-127 464
nn14A57224 nn14A57224 76.336 524 78 13 2697 3184 1623 2136 6.46e-99 369
nn14A57224 nn14A57224 76.336 524 78 13 1623 2136 2697 3184 6.46e-99 369
nn14A57224 nn14A57224 84.615 312 31 7 5818 6120 1833 2136 1.62e-87 331
nn14A57224 nn14A57224 84.615 312 31 7 1833 2136 5818 6120 1.62e-87 331
nn14A57224 nn14A57224 80.872 298 51 4 5819 6111 2879 3175 1.63e-68 268
nn14A57224 nn14A57224 80.872 298 51 4 2879 3175 5819 6111 1.63e-68 268
nn14A57224 nn14A57224 89.130 138 13 2 5677 5813 1997 2133 2.75e-40 174

$ head nn_nk.gff
nn14 nn14A57224 29426 39720
nn14 nn14A57225 41412 54382
nn14 nn14A57226 58913 65705
nn14 nn14A57227 70249 82715
nn14 nn14A57228 83915 85686
nn14 nn14A57229 93024 239710
nn14 nn14A57230 248031 267724
nn14 nn14A57231 267964 268794
nn14 nn14A57232 293986 419804
nn14 nn14A57233 422076 434387

Example DATA not working

I have tried to use the sample data and it doesn't seem to work? Perhaps I am doing something wrong but I seem not to be able to get any collinearity.

MCScanX Error - Can't plot the graphs (Downstream analysis)

Hi,
I am trying to use the MCScanX tool(the latest release) for ploting circular displays of syntenic genomic blocks and I am using CentOS 7 with Java version:

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

When I tried running $ java circle_plotter -g gff_file -s collinearity_file -c control_file -o output_PNG_file
[tt@localhost downstream_analyses]$ java circle_plotter -g ../MADS/MADS.gff -s ../MADS/MADS.collinearity -c circle.ctl -o MADS.png
I got the following error message:

Exception in thread "main" java.lang.NullPointerException
        at circle_plotter.paint(circle_plotter.java:142)
        at circle_plotter.main(circle_plotter.java:262)

Then I tried another drawing tool dual_synteny_plotter and I still got the error message:

[tt@localhost downstream_analyses]$ java dual_synteny_plotter -g ../MADS/MADS.gff -s ../MADS/MADS.collinearity -c dual_synteny.ctl -o dual_synteny.PNG
Exception in thread "main" java.lang.NullPointerException
        at dual_synteny_plotter.paint(dual_synteny_plotter.java:148)
        at dual_synteny_plotter.main(dual_synteny_plotter.java:265)

Do you know how to solve it? All data are stored in my repo.
Many thanks in advance.

missing information about how to specify information of the third column in the homology file in MCScanX_h

Dear Yupeng,

in the manual (http://chibba.pgml.uga.edu/mcscan2/documentation/manual.pdf) you are stating that the user can specify if higher or lower values are preferred when using MCScanX_h ("When the third column is used, users need to specify whether higher or lower values are preferred.").

However, in the manual you are not stating how to state this information. Could you please mention explicitly in the manual how values are interpreted and how this information can be changed. Also, could you please describe in more detail how these values are interpreted.

cannot open cds file

(base) padanas-MBP:downstream_analyses padana$ perl add_ka_and_ks_to_collinearity.pl -i ./data/crst-psr.collinearity -d ./data/crstpsr.cds -o kkkkkk
Cannot open cds_file!
i am facing this error.
can you guide me ?
my CDS file header

EVM0023101.1
header have .1 due to this error?/
i am interested to find kaks of orthologus of 2 species i have merge cds and have one file of cds

circle_plotter.java:232: error: cannot find symbol

Hi, I was compile the code downloaded from MCScanX home page. But it failed in making the programs under 'downstream_analyses' directory. Here is the message:

javac -g circle_plotter.java
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:232: error: cannot find symbol
Cubic xSpline = new Cubic(Cubic.BEZIER, GX);
^
symbol: variable Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: class Cubic
location: class circle_plotter
circle_plotter.java:233: error: cannot find symbol
Cubic ySpline = new Cubic(Cubic.BEZIER, GY);
^
symbol: variable Cubic
location: class circle_plotter
6 errors
make: *** [circle_plotter.class] Error 1

Could you help to solve this?
Thank you very much!

Best,
ZC

Question on missing homologous pairs in the .colinearity file

Hello,

I have a question regarding the default implementation of MCScanX,
As the manual suggested, I first run blastp and obtained the .gff and .blast file that were used as input for MCScan. In the .blastp file, I have noticed gene pairs with alignment e-value as low as 1e-110. However, these matched gene pairs were not shown in the .colinearity file after I run MCScanX. I have read through the manual and the paper, but still couldn't find any explanation for that. Can anyone please clarify?

Thanks in advance,
Mandy,

Mscanx Alignment e-value

Hey, I got a quick question about the e-value cutoff for collinear genes.

For example, here is a list of syntenic blocks. All the genes have very low e-value from blast. How do you calculate the score and e-value from for the syntenic blocks?

Best,
Li

Alignment 2: score=150.0 e_value=0.057 N=3 HZ1&JO11 minus

2- 0: HZgenemark-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.107 JOYER156C 3e-111
2- 1: HZsnap_masked-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.62 JOYER155C 4e-172
2- 2: HZgenemark-gi|1002316191|dbj|BCGQ01000010.1|-processed-gene-1.130 JOYER154W 1e-47

Family_circle_plotter...

Hi, @wyp1125
When I'm trying to run family_ circle_plotter I'm getting the circular plot but the collinear genes are not getting highlighted in red. I would be very grateful to you if you could suggest on this issue

0 matches imported (0 discarded). No genes detected.

Hi Yupeng,
Thanks for making MCScanX available. I'm trying to run the tool on my concatenated blast results and "gff" but I keep getting 0 matches imported and 0 discarded. Am I missing something trivial? My gene ids in blast and "gff" are matching. For my "gff" I've renamed the chromosome column to the "sp#" format as instructed in the manual. I'm not getting any errors either. Tips?

> Reading BLAST file and pre-processing
> Generating BLAST list
> 0 matches imported (0 discarded)
> 0 pairwise comparisons
> 0 alignments generated
> Pairwise collinear blocks written to data/mcscanx/input/.collinearity [0.004 seconds elapsed]
> Writing multiple syntenic blocks to HTML files
> Done! [0.006 seconds elapsed]

.collinearity :

> ############### Parameters ###############
> # MATCH_SCORE: 50
> # MATCH_SIZE: 5
> # GAP_PENALTY: -1
> # OVERLAP_WINDOW: 5
> # E_VALUE: 1e-05
> # MAX GAPS: 25
> ############### Statistics ###############
> # Number of collinear genes: 0, Percentage: -nan
> # Number of all genes: 0
> ##########################################

Thank you for your time 😄

collinearity.pl

(base) appledeMacBook-Pro-4:MCScanX fatima$ perl add_ka_and_ks_to_collinearity.pl -i ../data/at.collinearity -d ../data/at.cds -o at.collinearity.kaks

Can't open perl script "add_ka_and_ks_to_collinearity.pl": No such file or directory

i am facing this problem. anybody guide me. how i can solve this problem

Newick format

Hi, I followed the MCScanX tutorial and got stuck at this command:

java family_tree_plotter -t tree_file -s synteny_file -o output_PNG_file

How to create a tree in Newick format?

Thank you in advance

MCScanX_h

Hello, your software is very good. I saw in your article that the paired homologues that can be analyzed by orthomcl software are used as the input file of MCScanX_h. I would like to ask if it can only be generated by orthomcl and can be generated by orthofinder? Do files have to be in pairs? If a gene has multiple homologs, it is scored as multiple lines, right?

MCscanX BLASTALL ERROR

@wyp1125 @XTan hi,i hope you will be fine.i need your help, i am new user of MCScanX.i have successfully installed MCScanX. after that i have installed ncbi-blast+.now i am confuse in "at.blast" file.my result output "at.blast" not similar with your example data "at.blast" file.kindly guide where i am wrong.i am using for "database generating" protein sequence of arabidopsis thaliana and for query sequence ,i am using same protein sequence that i was use for "database generating".if i use two genome like arabidopsis and grape,i will put arabidopsis protein sequence as a querry and for database "grape protein sequence"??kindly guide me i will be very thankful.

detect_collinear_tandem_arrays segmentation fault

Hello,
I am not able to get the downstream_analyses executable detect_collinear_tandem_arrays to run properly. I downloaded MCScanX from the MCScanX homepage (Dec 2017), and have compiled on two different systems (Mac OS High Sierra 10.13.1 and CentOS 6.6 Linux). In both cases, MCScanX and MCScanX_h seem to run correctly, but I get the following error with detect_collinear_tandem_arrays:

Reading BLAST file and pre-processing
Generating BLAST list
38556 matches imported (6746 discarded)
Detecting tandem arrays...
Segmentation fault: 11

Is there a fix for this problem, or any thoughts as to why it is happening? I have also tried on three different datasets, each returns the same error.

One issue I have discovered so far that also seems important is that the vector alltandempair is not populating, at least prior to the sort command in the compute_tandem() module of detect_collinear_tandem_arrays.cc and so the corresponding detect_collinear_tandem_arrays executable.

Latest: as far as I can tell, the segmentation fault is returned when there are no collinear tandem arrays identified, so it looks like this is a feature of the data I am using and not a bug in the script.

Thanks,

Matt

missing homologs

Hi, there,
I found missing homologs in my results, though they show 100% homolog in my blastp results.
I tried different parameters, checked all my command, nothing wrong, and I found this issue occurred more than 5 times in my overall results.
Please help me solve this:

`
1 V.g1576 tig00000012_np1212.g1452
1 V.g1577 tig00000012_np1212.g1453
1 V.g1578 tig00000012_np1212.g1454
1 V.g1579 | |
1 V.g1580 | |
1 V.g1581 tig00000012_np1212.g1457
1 V.g1582 tig00000012_np1212.g1458
1 V.g1583 tig00000012_np1212.g1459

tig00000012_np1212.g1452 V.g1576 100.000 121 0 0 1 121 1 121 6.80e-86 243
tig00000012_np1212.g1453 V.g1577 100.000 129 0 0 1 129 1 129 1.98e-93 263
tig00000012_np1212.g1454 V.g1578 100.000 420 0 0 1 420 1 420 0.0 868
tig00000012_np1212.g1455 V.g1579 100.000 528 0 0 1 528 1 528 0.0 1078
tig00000012_np1212.g1456 V.g1580 100.000 530 0 0 1 530 1 530 0.0 1072
tig00000012_np1212.g1457 V.g1581 100.000 442 0 0 1 442 1 442 0.0 906
tig00000012_np1212.g1458 V.g1582 100.000 250 0 0 1 250 1 250 0.0 513
tig00000012_np1212.g1459 V.g1583 100.000 218 0 0 1 218 1 218 3.29e-157 432`

I used these commands:
blastp -query JYF80.pp.fa -db Sc2.pp -out 80_Sc2.blast -evalue 1e-5 -num_threads 16 -outfmt 6 -num_alignments 5 MCScanX 80_Sc2 -e 1e-5 -s 10

Question on Duplicate gene classifier

Hello,

You mention in the manual that the duplicate_gene_classifier tool should not be used on data from multiple genomes. I could not find the reason why in your paper. Is it compute time or something else. Can you clarify?

Thanks
Abhijit

No result after run commands

Greetings, I'm trying to make use of MCScanX_h, i've prepared the necessaries files following the manual yet my data neither example data is working.

My gff with 5 species gff is edited follwing the CH# gene start end
Ta1 TA20_000001 40390 41754
...

my .homology file achieved by running OrthoFinder, and extracting the pair-wise data as follows for each species withou the third optional collumn
TH179_000002 TH3844_011373
...

Reading other issues on git, the solution of tab delimited files and moving them to the program folder doesn't resolved it. Also the example data returns the same no output.

"using example data"
/home/h.paulocampiteli/MCScanX-master/MCScanX /home/h.paulocampiteli/MCScanX-master/data/
Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/data/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Done! [0.000 seconds elapsed]

"using my own on another folder"
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /storage4/h.paulocampiteli/synteny/mscscan_analysis/
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /storage4/h.paulocampiteli/synteny/mscscan_analysis/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage

"using my data on the MCscan folder
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /home/h.paulocampiteli/MCScanX-master/MCScanX
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/MCScanX.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage
Done! [0.001 seconds elapsed]

I could not find any other response regarding this problem. Anyone knows what sorcery I must make to put the program to work?

Thanks in advance

Error in family_tree_plotter

Hi
I want to display segmental and tandem duplications in a gene family tree. I ran the following commands :

javac family_tree_plotter.java
java family_tree_plotter -t ../../mcscanx_input/tree.nwk -s ../../mcscanx_input/rice.collinearity  –d ../../mcscanx_input/rice.tandem -o tandem_segmental_genes_tree.png

I get this error:

Exception in thread "main" java.lang.NullPointerException
        at family_tree_plotter.compute(family_tree_plotter.java:263)
        at family_tree_plotter.paint(family_tree_plotter.java:299)
        at family_tree_plotter.main(family_tree_plotter.java:397)

Please help me resolve this error. I am doing it on ubuntu in WSL. All other MCScanX commands I ran were successfull except this one.

Thank You

when using MCScanX_h 0 homologous pairs imported (316812 discarded)

Dear Yupeng,

when I run MCScanX_h data/at_gm_pt_vv, where at_gm_pt_vv.homology only contains two columns (I deleted column 3 manually), I get the following output:

Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (316812 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to ./at_gm_pt_vv.collinearity [1.861 seconds elapsed]
Writing multiple syntenic blocks to HTML files
at1.html
at2.html
...

Actually, I am expecting that I get similar results compared to the run where I have the third column with numerical values since in the manual (http://chibba.pgml.uga.edu/mcscan2/documentation/manual.pdf) it is described that you can pass a .homology file with two or three columns to MCScanX_h.

Restarting with -a option

Hi, thanks for developing MCScanX! I've found it of great use.

I have already ran one analysis using the -a option to stop after writing the .collinearity file, but now I realized that I could benefit from the .html files that are created downstream.

Is there a way to restart MCSCanX from the .collinearity file and only create the html files? Or do I need to redo the entire analysis to create the .html files?

Thanks!
Justin

Please fix the bug.

compilation error

" msa.cc: In function ‘void msa_main(const char*)’:
msa.cc:289:22: error: ‘chdir’ was not declared in this scope
if (chdir(html_fn)<0)
^
make: *** [mcscanx] Error 1" 
"""

Solution

http://ubuntuforums.org/showthread.php?t=2205151

if you are building on 64-bit you may need to add

#include <unistd.h>

to msa.h, dissect_multiple_alignment.h, and detect_collinear_tandem_arrays.h

help for old version of MCScanX for parameter U

Dr Wang:
In the early vserions of MSCanX, the parameter “u” designated the intergenetic distance (defualt value:10kb). This may be true for angiosperms, but recently, we sequenced a fern genome, and found the average genetic distance about 100kb, We have used the updated version of MCSCanX, the propotions of synteny blocks was extremly small, we wonder know what the default parameter for the prevous 'U'. W're appreciated if you provides us the old versions with paramter setting for U
Thanks for your help.
Wei

Collinear block coordinates

Hi,
I am not sure if I am missing something, but could it be possible to extract the entire block coordinates? So instead of the genes in that block, we have a start (1st gene start position) and stop (last gene final position) coordinates in that corresponding contig/scaffold.

Ideally, if we have the collinearity file:

`##########################################

Alignment 0: score=8871.0 e_value=0 N=188 at1&at1 plus

0- 0: AT1G17240 AT1G72300 0
0- 1: AT1G17290 AT1G72330 0
0- 2: AT1G17310 AT1G72350 5e-41
0- 3: AT1G17350 AT1G72420 2e-113
0- 4: AT1G17380 AT1G72450 7e-63
0- 5: AT1G17400 AT1G72490 2e-82
0- 6: AT1G17420 AT1G72520 0
......
0-183: AT1G22270 AT1G78190 6e-45
0-184: AT1G22280 AT1G78200 2e-107
0-185: AT1G22300 AT1G78220 1e-72
0-186: AT1G22330 AT1G78260 1e-63
0-187: AT1G22340 AT1G78270 3e-174

Alignment 1: score=2935.0 e_value=2.7e-245 N=64 at1&at1 plus

1- 0: AT1G10640 AT1G60590 0
...`

If we can convert it to an output file like .gff or .bed with the block start-stop positions, and including both columns. Something like:

at1-block0(1st column) start (1st gene coordinates AT1G17240) stop(last gene AT1G22340 final pos.) at1-block0(2nd column) start (1st gene coordinates AT1G72300) stop(last gene AT1G78270)

Extracting the first coordinate (coming from the 1st gene) for each block (and column) seems easy but I find it hard to implement the last gene into it.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.