schlosslab / hannigan_crcvirome_mbio_2018 Goto Github PK
View Code? Open in Web Editor NEWInvestigating the gut virus communities associated with colon cancer.
License: MIT License
Investigating the gut virus communities associated with colon cancer.
License: MIT License
Pool the metagenome library samples.
Add simple mean and std error bar graphs of contamination levels to compare to previous studies. Simple as that.
Once we get the samples pooled, we need to submit them to the sequencing core to be run on the HiSeq4000 platform.
1: 24-sample NexteraXT library kit
1: 96-sample NexteraXT library kit
Hi there,
This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs
parameter:
Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.
If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.
Thank you!
-- Arman (armish/blast-patrol)
Quantify the DNA for all of the samples in the virome metagenome library (second library that Charlie prepped) using the Quant-IT system.
Samples need to be run in duplicate to account for potential variation in the system.
Once this is done we can pool the samples and ship them off for sequencing.
Need a section to correct for alignment length.
It looks like the impact of sequencing depth on the model performance might be fairly significant. To get at this, I would like to run some curves plotting model performance vs sequencing depth. I suspect that might be important here.
Get those ORFs clustered into OPFs.
Right now they have the sequencing core names.
I need to download the virome sequences from the core facility.
I need to fix the rarefaction going into the random forest model. Plain and simple.
Too "back-of-the-envelope". It really needs to be redone after QC.
Extracts ORFs from the contigs using Prodigal.
Need concoct to run with:
module load python/2.7.10
module load concoct/0.4.0
You know the drill.
Problem is that it is running out of memory and dying.
Get started on the pred model using the 16S data for now.
The last time I ran the contig cat, there was a problem causing a bunch of reads to mix together. Probably due to a parallel run.
I fixed it but the files need to be remade.
Permission issues are preventing me from doing this now.
It seemed like there was something wrong with the first NexteraXT kit that Charlie used (for the whole metagenome prep). We need to contact Illumina and get them to replace that kit since they ain't cheap.
Quantify the DNA for the metagenome library prep using the Quant-IT plate quantification protocol.
CONCOCT time.
Time for cutadapt
.
This run needs a mapping file which I can get from my SQL database.
This is a pretty vague issue, but essentially I have the data up on Axiom so now I just have to go through with the main analysis workflow.
I should be able to follow the standard Mothur SOP.
Alpha diversity is done, but I still need to do beta diversity.
The original bacterial metagenome prep did not work, which I think might have been due to a faulty kit. In addition to getting a replacement kit from Illumina, we are going to re-prepare this library using a new kit.
This is a rerun of the first kit that charlie prepped.
Standard protocol at this point. Use the rep sequences to find integrase genes, similarity to bacterial genomes, and similarity to aclame genes.
Get sequencing depth of the samples after quality control.
Mothur server is down right now so I have to wait for it to be up before I can start this.
I realized I don't have the sample information officially recorded in my master SQL sequencing database, so I need to go back and get that taken care of.
There seem to be some problems with the alignments for making relative abundance tables.
The bacterial sample IDs need to be fixed to match up with my standard notation so that they can be easily merged with the viral and metagenomic samples.
I need to get the whole meta genome sequences from the core after they run it through.
After the Quant-IT is done, we can calculate the volumes required to pool the samples. Once this is done we can submit the library.
Get the number of sequences that map to the open reading frames.
I think that using the Diamond local aligner will be the best bet.
This means that I need to identify the bacterial OGU(s) that is(are) fusobacteria, and then plot the average degree centrality for the three disease categories.
Change the name of the repo to conform to Schloss lab standards.
Some of the sequence IDs are wrapped up in with the sequence before it, so I need to make sure I parse these. ugh.
Charlie is going to be running the BioAnalyzer trace on the library after he fished the library prep itself.
yes, that.
Use megahit to assemble contigs from the individual samples.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.