Coder Social home page Coder Social logo

tidymultiqc's Introduction

TidyMultiqc

TidyMultiqc Provides the means to convert ‘multiqc_data.json’ files, produced by the wonderful ‘MultiQC’ tool, into tidy data frames for downstream analysis in R.

Please visit the pkgdown website for a comprehensive tutorial and function documentation.

Installation

The latest stable version can be installed from CRAN:

install.packages("TidyMultiqc")

You can also install the development version of TidyMultiqc from GitHub with:

# install.packages("devtools")
devtools::install_github("multimeric/TidyMultiqc")

Example

TidyMultiqc::load_multiqc(multiqc_data_path)
#> # A tibble: 6 × 165
#>   metadata.sample_id general.total_reads general.mapped_reads general.percentag…
#>   <chr>                            <dbl>                <dbl>              <dbl>
#> 1 P4107_1003                   868204107            847562410               97.6
#> 2 P4107_1004                  1002828927            985115356               98.2
#> 3 P4107_1005                   974955793            955921317               98.0
#> 4 P4107_1002                   865975844            847067526               97.8
#> 5 P4107_1006                   912383669            894970438               98.1
#> 6 P4107_1001                   772071557            751147332               97.3
#> # … with 161 more variables: general.median_coverage <int>,
#> #   general.median_insert_size <int>, general.avg_gc <dbl>,
#> #   general.1_x_pc <dbl>, general.5_x_pc <dbl>, general.10_x_pc <dbl>,
#> #   general.30_x_pc <dbl>, general.50_x_pc <dbl>, general.genome <chr>,
#> #   general.number_of_variants_before_filter <dbl>,
#> #   general.number_of_known_variants_brie_non_empty_id <dbl>,
#> #   general.number_of_known_variants_brie_non_empty_id_percent <dbl>, …

tidymultiqc's People

Contributors

multimeric avatar

Stargazers

Colton Baumler avatar Jie Zhu avatar Sam Bryce-Smith avatar Paula Restrepo avatar Sateesh_Peri avatar Edmund Miller avatar wook2014 avatar  avatar Taylor Reiter avatar (major) john (major) avatar Ronak Shah avatar

Watchers

 avatar  avatar

tidymultiqc's Issues

How to extract the list of the plots present in multiqc_data.json

Hello,
I would like to know if there is a way to actually list the plot names present in a report from multiqc_data.json.
The reason is that I try to extract adapter info from the "fastqc_adapter_content_plot" and that I have a case where this plot is not existing in the report because I got "No samples found with any adapter contamination > 0.1%" in the actual multiqc report (html).
So if I need a way to test if a plot is present in my report.
multiqc_data.json.txt

Read multiqc zip file

Right now, TidyMultiqc reads the MultiQC JSON file; would it be possible to also read the ZIP file (by unzipping the JSON file and reading the JSON file under the hood)?

Make a readme

  • Use Rmarkdown
  • Pick simple sections from the vignette

How to extract the complete data.frame from plot data.

Dear Mr Milton

I have been trying to extract plot data from multiqc.
I couldn't understand how to extract at once all the values for x and y, so I did the following:

gene_cov <- as.data.table(load_multiqc(
    paste0(MQC_DATA, "multiqc_data.json"),
    sections = "plots",
    plot_opts = list(picard_rna_coverage = list(
      extractor = extract_xy,
      summary = list(
        `0` = purrr::partial(summary_extract_df, row_select = x == 0),
        `1` = purrr::partial(summary_extract_df, row_select = x == 1),
        `2` = purrr::partial(summary_extract_df, row_select = x == 2),
        `3` = purrr::partial(summary_extract_df, row_select = x == 3),
        `4` = purrr::partial(summary_extract_df, row_select = x == 4),
        `5` = purrr::partial(summary_extract_df, row_select = x == 5),
        `6` = purrr::partial(summary_extract_df, row_select = x == 6),
        `7` = purrr::partial(summary_extract_df, row_select = x == 7),
        `8` = purrr::partial(summary_extract_df, row_select = x == 8),
        `9` = purrr::partial(summary_extract_df, row_select = x == 9),
        `10` = purrr::partial(summary_extract_df, row_select = x == 10)
      )
    ))
  ))

At this point gene_cov look like this:

metadata.sample_id	plot.picard_rna_coverage.0	plot.picard_rna_coverage.1	plot.picard_rna_coverage.2	plot.picard_rna_coverage.3 ...
sample1	0.307203	0.352601	0.428728	0.506128	...
sample2	0.337342	0.397810	0.482380	0.562006	...
sample3	0.315192	0.366014	0.437372	0.503733	...
sample4	0.326004	0.382353	0.466560	0.544932	...
... 

Then I did this :

names(gene_cov)[-1] = gsub("plot.picard_rna_coverage","x",names(gene_cov)[-1])
setnames(gene_cov, "metadata.sample_id","Sample")

gene_cov <- melt(gene_cov, id.vars = "Sample", variable.name = "x", value.name = "y")
gene_cov <- gene_cov[, x := as.integer(gsub("x\\.","",x))]

At this point gene_cov look like this:

Sample  x        y
sample1	0	0.307203
sample2	0	0.337342
sample3	0	0.315192
sample4	0	0.326004
sample1	1	0.352601
sample2	1	0.397810
sample3	1	0.366014
sample4	1	0.382353
sample1	2	0.428728
sample2	2	0.482380
sample3	2	0.437372
sample4	2	0.466560
...

Please could you tell me what command to use to extract at once all the data from a multiqc plot that I don't have to write the list of all the coordinate ?

Best regards

Nicolas Blavet

Support plot recreation

It seems like a common use case to want to recreate MultiQC plots in R, for example to make a customized Rmarkdown report. It might be possible to add a new function that recreates a plot using R for this purpose.

Bug when `plots` vector is not provided

 load_multiqc(
      paths = system.file("extdata", "wgs/multiqc_data.json", package = "TidyMultiqc"),
      sections = "plot"
    )

gives

<tibble_error_assign_incompatible_size/tibble_error/rlang_error/error/condition>
Error: Assigned data `maybe_zap(base_case(x[[nm]], y[[i]]))` must be compatible with existing data.
x Existing data has 101 rows.
x Assigned data has 202 rows.
i Only vectors of size 1 are recycled.
Backtrace:

NaN in multqc json files.

Hello Michael,

Thanks for making this R package.

In my multiqc json files for some samples I have NaN for some results. For example a sample had very low amounts of STAR mapping and hence was not carried forward in analysis and will not have rseqc metric.

Currently the function cannot handle such NaNs but would be great if it could. Ideally these NaNs should be converted to NA's in R i.e. missing values in R.

Thanks,
-Mohammed.

lexical error when parsing multiqc json from nf-co.re/rnaseq

I am getting an error when trying to use this package with the output from the nf-co.re/rnaseq pipeline.

> multiqc = TidyMultiqc::load_multiqc(
+   paths = "multiqc_data.json"
+ )

Error in parse_con(txt, bigint_as_char) : 
  lexical error: invalid char in json text.
                  "mapped_failed_pct": NaN,                 "paired in
                     (right here) ------^

Session Info

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TidyMultiqc_1.0.0           SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.1       
 [5] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        
 [9] MatrixGenerics_1.6.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] pillar_1.6.4           compiler_4.1.2         XVector_0.34.0         zlibbioc_1.40.0        bitops_1.0-7          
 [6] tools_4.1.2            jsonlite_1.7.2         lattice_0.20-45        lifecycle_1.0.1        tibble_3.1.6          
[11] gtable_0.3.0           pkgconfig_2.0.3        rlang_0.4.12           Matrix_1.3-4           DelayedArray_0.20.0   
[16] DBI_1.1.1              cli_3.1.0              rstudioapi_0.13        GenomeInfoDbData_1.2.7 dplyr_1.0.7           
[21] generics_0.1.1         vctrs_0.3.8            grid_4.1.2             tidyselect_1.1.1       glue_1.5.1            
[26] R6_2.5.1               fansi_0.5.0            ggplot2_3.3.5          purrr_0.3.4            magrittr_2.0.1        
[31] scales_1.1.1           ellipsis_0.3.2         assertthat_0.2.1       colorspace_2.0-2       utf8_1.2.2            
[36] RCurl_1.98-1.5         munsell_0.5.0          crayon_1.4.2  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.