At present PFAM IDs are annotated with phage function(gene type) information, e.g. Phage Regulation, Phage Capsid.
See file pfam_function.tsv
for complete list of annotated families.
If there is no phage function annotation we call it "Other".
To create Figure with gene type annotation for CDD hits for several samples:
Rscript GeneTypes.R input_file.tsv
that script creates Gene_types_counts.pdf
Figure and Gene_types_proportions.pdf
Figure
Input file(input_file.tsv
) format has header and data:
files sample_names
file_name1 label1
file_name2 label2
file_name3 label3
where each file_name*
is tab-separated file with 2 columns: contig name and
cdd accession hit(see file DNAmod_sample_cdd.tsv
as an example), label is
used on Figure for that file. The file_name*
can be produced by
ContigAnnotation script.
Create the csv file where 1st column is cdd acession(e.g. pfam1234) and run the script:
cdd_description_addition.py file_with_no_description.csv file_with_description.csv
that would create a file with last column as title.
If cdd accesion is not in the 1st column the last argument specify index of the cdd column(1-based indexing):
cdd_description_addition.py file_with_no_description.csv file_with_description.csv 2
means that in file file_with_no_description.csv
column with cdd accession number is column
number 2.
Read data:
read_contig_cdd
read_cdd_annotation
Process data:
add_annotataion_to_cdd
cdd_annotation_counts