schlosslab / hannigan_crcvirome_mbio_2018 Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 8.0 54.94 MB

Investigating the gut virus communities associated with colon cancer.

License: MIT License

Shell 0.53% R 1.57% Makefile 0.26% Python 0.05% Perl 0.42% TeX 0.03% PostScript 97.14%

cancer-virome microbiome phages reproducible-paper

hannigan_crcvirome_mbio_2018's People

Contributors

Stargazers

Watchers

Forkers

ecogenomix duhaimelab zhumengyan kant erikadva skjq mladen5000 shbrief

hannigan_crcvirome_mbio_2018's Issues

Add mean summary stats

Add simple mean and std error bar graphs of contamination levels to compare to previous studies. Simple as that.

Final submission to sequencing core.

Once we get the samples pooled, we need to submit them to the sequencing core to be run on the HiSeq4000 platform.

Get authorization for sequence library kits

1: 24-sample NexteraXT library kit
1: 96-sample NexteraXT library kit

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

Run virome metagenome library through Quant-IT

Quantify the DNA for all of the samples in the virome metagenome library (second library that Charlie prepped) using the Quant-IT system.

Samples need to be run in duplicate to account for potential variation in the system.

Once this is done we can pool the samples and ship them off for sequencing.

Abundance length correction.

Need a section to correct for alignment length.

Evaluate impact of sequencing depth on model performance.

It looks like the impact of sequencing depth on the model performance might be fairly significant. To get at this, I would like to run some curves plotting model performance vs sequencing depth. I suspect that might be important here.

Cluster ORFs into OPFs

Get those ORFs clustered into OPFs.

QC: Rename files to MG format

Right now they have the sequencing core names.

Get virome sequences from core

I need to download the virome sequences from the core facility.

Fix rarefaction of samples going into random forest

I need to fix the rarefaction going into the random forest model. Plain and simple.

Everything after QC needs to be redone

Too "back-of-the-envelope". It really needs to be redone after QC.

Call ORFs

Extracts ORFs from the contigs using Prodigal.

Run human decontamination (deconseq)

Add python module to CONCOCT.

Need concoct to run with:

module load python/2.7.10
module load concoct/0.4.0

Create figure for sample processing (bench top).

QC: Quality trim with FASTX

You know the drill.

Create figure for bioinformatics processing (I hate text on a poster).

Rerun contig clustering with whole meta genome samples as well as virome.

Problem is that it is running out of memory and dying.

Create predictive model using 16S Zackular data.

Get started on the pred model using the 16S data for now.

Rerun files impacted by contig cat error.

The last time I ran the contig cat, there was a problem causing a bunch of reads to mix together. Probably due to a parallel run.

I fixed it but the files need to be remade.

Get virome library in NAS `runs` directory

Permission issues are preventing me from doing this now.

Get replacement NexteraXT kit

It seemed like there was something wrong with the first NexteraXT kit that Charlie used (for the whole metagenome prep). We need to contact Illumina and get them to replace that kit since they ain't cheap.

Run metagenome library through Quant-IT

Quantify the DNA for the metagenome library prep using the Quant-IT plate quantification protocol.

Cluster contigs

CONCOCT time.

QC: Trim adapters from reads

Time for cutadapt.

Collect virome mapping file

This run needs a mapping file which I can get from my SQL database.

No 16S analysis for Zackular data.

This is a pretty vague issue, but essentially I have the data up on Axiom so now I just have to go through with the main analysis workflow.

I should be able to follow the standard Mothur SOP.

Beta diversity between cancer states.

Alpha diversity is done, but I still need to do beta diversity.

Rerun bacterial metagenome library prep

The original bacterial metagenome prep did not work, which I think might have been due to a faulty kit. In addition to getting a replacement kit from Illumina, we are going to re-prepare this library using a new kit.

This is a rerun of the first kit that charlie prepped.

Create plot of lysogenic states over cancer progression.

Standard protocol at this point. Use the rep sequences to find integrase genes, similarity to bacterial genomes, and similarity to aclame genes.

Prepare poster outline including introduction & methods.

Add information for sequencing depth of each of the samples.

Get sequencing depth of the samples after quality control.

Download 16S samples for predictive modeling processing.

Add Zackular data download into workflow

Mothur server is down right now so I have to wait for it to be up before I can start this.

Update Sequencing Database

I realized I don't have the sample information officially recorded in my master SQL sequencing database, so I need to go back and get that taken care of.

Rerun contig abundance with both virome and whole meta genome.

There seem to be some problems with the alignments for making relative abundance tables.

Fix bacteria IDs to be same as sample IDs.

The bacterial sample IDs need to be fixed to match up with my standard notation so that they can be easily merged with the viral and metagenomic samples.

Get whole meta genome sequences from the core

I need to get the whole meta genome sequences from the core after they run it through.

Classify contigs by blasting only the longest reads (rep sequences). Avoid short because they are less informative.

Pool virome library

After the Quant-IT is done, we can calculate the volumes required to pool the samples. Once this is done we can submit the library.

Calculate OPF abundance per sample.

Get the number of sequences that map to the open reading frames.

I think that using the Diamond local aligner will be the best bet.

Plot changes in degree centrality of fusobacterium OTU over time.

This means that I need to identify the bacterial OGU(s) that is(are) fusobacteria, and then plot the average degree centrality for the three disease categories.

Change repo name

Change the name of the repo to conform to Schloss lab standards.

Fix contig fasta problem with wrapped sequence ids.

Some of the sequence IDs are wrapped up in with the sequence before it, so I need to make sure I parse these. ugh.

Perform BioA Trace on NexteraXT Metagenome Library

Charlie is going to be running the BioAnalyzer trace on the library after he fished the library prep itself.