In the case where MAFFT is being used to align sequences stored in sample-specific FASTA files, MAFFT should be able to do an alignment among the respective chromosomes from the FASTA files, such that the output is a series of chromosome-specific alignments in FASTA files.
In the case of two samples from organisms with two chromosomes, the input and output would be as follows:
Input:
sample1.fasta would contain chr1, chr2
sample2.fasta would also contain chr1, chr2
Output:
chr1.fasta would contain chr1_sample1, chr1_sample2
chr2.fasta would contain chr2_sample1, chr2_sample2
Before MAFFT, an intermediate step would be needed to transpose the chromosomes from their respective FASTA files to chromosome-specific FASTA files prior to alignment. This can be done via itertools.izip
(Python2)/zip
(Python3) and Bio.SeqIO.parse()
, which returns an iterable.