qbic-pipelines / rnadeseq Goto Github PK
View Code? Open in Web Editor NEWDifferential gene expression analysis and pathway analysis of RNAseq data
License: MIT License
Differential gene expression analysis and pathway analysis of RNAseq data
License: MIT License
see here
So the pipeline runs properly, filenames need to be named after the sample QBiC code:
QXXXXNNNNN_whateveryoulike.ext
Otherwise MultiQC report and raw count tables will contain the wrong sample names! This is a pre-requisite for the pipeline.
Add this to docs and make everybody aware!
Hi there,
When providing genes not differentially expressed in --genelist, the following error occures :
The following object is masked from ?package:S4Vectors?:
space
The following object is masked from ?package:stats?:
lowess
Registering fonts with R
Attaching package: ?limma?
The following object is masked from ?package:DESeq2?:
plotMA
The following object is masked from ?package:BiocGenerics?:
plotMA
Exiting.
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Warning messages:
1: In data.frame(count = cnts + pc, group = as.integer(group)) :
NAs introduced by coercion
2: In data.frame(count = cnts + pc, group = as.integer(group)) :
NAs introduced by coercion
3: In data.frame(count = cnts + pc, group = as.integer(group)) :
NAs introduced by coercion
Error in counts(dds, normalized = normalized, replaced = replaced)[gene, :
subscript out of bounds
Calls: plotCounts
Execution halted
This is caused in DESeq2.R starting line 377 with plotCounts().
It would be great to :
Thanks a lot!
Laurence
Hi there,
I am currently running the pipeline with treatment (control, 25% and 50% of the drug) as condition. There is no pathways found between 25% and 50% of the drug intake :
[1] "DE_contrast_condition_treatment_50.CSF_vs_25.CSF"
[1] "Number of genes in query:"
[1] 997
[1] "Number of pathways found:"
integer(0)
which causes the pipeline to fail at pathway_analysis.R
:
##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.
The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################
No results to show
I think it's the condition here that needs to be checked
rnadeseq/bin/pathway_analysis.R
Line 204 in abe7427
I opened an issue just as a reminder, maybe it'll be a small task i can do at the hackathon :)
Best,
Laurence
One per line is good, but that it has to be ENSEMBL genes wasn't clear to me ;-)
When adding a kegg pathway with --kegg_blacklist, the pipeline runs but ignores the parameter. When I hardcoded the pathway here it worked, so it is the append function here causing the issue.
Just creating a small issue so I have it noted somewhere, as soon as I have 5 minutes to test it out, I will fix this small bug :)
Thank you!
Laurence
The parameter name is misleading, we are supposed to attach the offer, not the quote.
Hi there,
I am running the pipeline with 2 different conditions (treatment and patient) and have provided a contrast list :
factor numerator denominator
condition_treatment_type A B
refering only to treatment.
However, in my boxplots, my data is plotted according to condition and patient on the x axis.
Would it be logical to also plot them depending on the contrast list?
Best,
Laurence
The pathway analysis results table does not provide the gene names of the DE genes found in the pathway.
By default without threshold but possible to apply threshold by parameter in nextflow pipeline.
The normalized counts table header names should match qbiccode + secondary name.
Otherwise downstream pathway analysis does not work.
Correct headers in DESeq2 script.
In DESeq2/results/plots
the .png
pictures have no legenda, i.e. the sample names are not shown properly. In the corresponding .pdf
files they are reported, but I fail to include these files in the report (i.e., they don't get shown).
The offer needs to be added as input file, as it is linked to in the last paragraph of the report (Summary and outlook).
All our reports should contain this at the end, thus please add it to the template (there might be some small revisions in the future, but...)
"The results for all work packages, as described in the quote (give link to quote) can be found in this report. Further support for this project will be restricted to the results presented in this report (e.g. requests to update/manipulate figures and tables).
For further analysis (e.g. the re-analysis of the dataset) we will generate a new quote containing cost estimates."
Soutions about that:
Hi there,
Just a small issue regarding the Volcano plots, as they are capped on the x-axis (logFC) from -5 to 5, this sometimes leaves out genes with a higher logFC :
rnadeseq/assets/RNAseq_report.Rmd
Line 435 in abe7427
I can have a look as it's a minor fix, I just want to keep a written trail.
Thanks and best,
Laurence
I used nf-core/rnaseq without STAR but with salmon. Also, I used a rather unusual .gff for bacterial with very limited information.
I could identify three problems:
--fc_group_features "transcript_id"
when running nf-core/rnaseq, the header looked like that:transcript_id | QMFCE006AD |
---|
instead of
Geneid | gene_name | QBICK031A9Aligned.sortedByCoord.out.bam |
---|
I solved the problem by making csv to tsv, round floats to integer and changed header to
Ensembl_ID | gene_name | QMFCE006AD |
---|
I am not sure the header is really required that way.
remove from DESeq2
Rather print all versions from all tools used in the Rmarkdown report at the end of the report, e.g. SessionInfo() print at the end of report.
If gene list is not provided as input, then the report does not show it.
How to name and explain better the filter columns of the final table.
The currently used program for pathway analysis gprofiler
can't handle bacteria in general and applies only to isolates, i.e. a single species opposed to metatranscriptomics.
All this software fits into the existing container without conflicts.
--rawcounts
--rawcounts
, --species
--metadata
.--remove_rRNA
& --save_nonrRNA_reads
), required for meta-pathway analysisThis would be a major increase in code / parameters and output.
Pathway abundance (only metatranscriptome) would be the first step to implement, followed by addition of pathway expression analysis (RNA & DNA measures).
edit: added section "three independent analysis"
edit2: nf-core/rnaseq v1.4 pre-processing is only valid for environmental samples! For host - microbiome studies the host sequences have to be removed too!
It would be nice to have this file (list containing only the DE genes), together with the final_list_DESeq2.tsv
, as output of the DESeq.v2.7.R
script.
complete path: DESeq2/zips/DESeq2/results/final/DE_list_DESeq2.tsv
I would like to discuss this, though, before making a pull request.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.