mcanouil / nacho Goto Github PK
View Code? Open in Web Editor NEWNAnostring quality Control dasHbOard
Home Page: https://m.canouil.dev/NACHO/
License: GNU General Public License v3.0
NAnostring quality Control dasHbOard
Home Page: https://m.canouil.dev/NACHO/
License: GNU General Public License v3.0
Prepare for release:
git pull
usethis::use_github_links()
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::revdep_check(num_workers = 4)
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
Prepare for release (2020-01-05):
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
When trying to import many samples from different runs, this results in the following error message:
Brief description of the problem
nacho_data <- load_rcc(data_directory = params$path_rccs,
+ ssheet_csv = params$path_rcc_samplesheet,
+ id_colname = "FILENAME")
[NACHO] Importing RCC files.
|=======================================================================================================================================================================================================================================================================================|100% ~0 s remaining
[NACHO] Performing QC and formatting data.
[NACHO] Computing normalisation factors using "GEO" method.
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 6, 10
A "template function" as trans_
functions from ggplot2
and scales
could be implemented, to allow users to use their own normalisation methods (for the one not yet built-in).
Add normalisations from https://cran.r-project.org/package=ruv
Allow to provide to load_rcc
a named vector to provide path to RCC along with the samples' labels?
This means, that no outside information can be provided.
Currently, visualise()
includes many nested 'if else' among other things.
Jointly with externalise plotting function (#4) , the shiny app could be simplified to enhance controls about input requirements.
This would enable to customize the plots, for example coloring by group, treatment, timepoint, etc rather than cartridgeID.
Add some examples of the newly implemented functions:
render()
autoplot()
Plus an example of how to use print()
within a Rmarkdown document.
Create functions to call inside visualise()
and render()
(#2) for all QC plots.
This will makes the package maintenance easier and more reliable to change in plots.
Note: shinyMeta might appear in future versions.
Hi,
I have a question about formatting the input files. I'm using the load_rcc() function and am not clear on the required structure and contents of the sample sheet. I have all my RCC files together in one folder and that seems to be right. For a sample sheet, right now csv I imported with just a single column of RCC file names and "IDFILE" as the header. I'm looking at the GSE74821 example and see a ton more columns. What is the minimum information needed in the sample sheet? As it is, when I run the load_rcc function with my one column sample sheet I get the following error message:
#> Error in switch(EXPR = class(ssheet_csv), data.frame = ssheet_csv, character = utils::read.csv(file = ssheet_csv, : EXPR must be a length 1 vector
here's what I did below:
mytargets <- read_csv("mytargets.csv")
#> Error in read_csv("mytargets.csv"): could not find function "read_csv"
mydata <- load_rcc(
data_directory = "pathname", # Where the data is
ssheet_csv = mytargets, #This is just a list of file names under column name "IDFILE"
id_colname = "IDFILE", # Name of the column that contains the unique identfiers
housekeeping_genes = NULL, # not sure where this fits in. WOuld this list of housekeeping genes be in the sample sheet somehow?
housekeeping_predict = TRUE, # Whether or not to predict the housekeeping genes
normalisation_method = "GEO", # Geometric mean or GLM
n_comp = 5 # Number indicating how many principal components should be computed.
)
Thank you very much!
Prepare for release:
devtools::check()
devtools::check_win_devel()
rhub::check_for_cran()
cran-comments.md
Submit to CRAN (7th of October):
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_dev_version()
Check again (CRAN checks failed for system requirements) on the 8th of October:
devtools::check()
devtools::check_win_devel()
rhub::check()
(all CRAN platforms)cran-comments.md
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_dev_version()
Fix CRAN checks notes produced by the new render_nacho()
* checking R code for possible problems ... NOTE
print_nacho: no visible binding for global variable ‘CartridgeID’
print_nacho: no visible binding for global variable ‘CodeClass’
print_nacho: no visible binding for global variable ‘Name’
print_nacho: no visible binding for global variable ‘Count’
print_nacho: no visible binding for global variable ‘.’
print_nacho: no visible binding for global variable ‘MC’
print_nacho: no visible binding for global variable ‘BD’
print_nacho: no visible binding for global variable ‘MedC’
print_nacho: no visible binding for global variable ‘X.PC’
print_nacho: no visible binding for global variable ‘Y.PC’
print_nacho: no visible binding for global variable ‘Proportion of
Variance’
print_nacho: no visible binding for global variable
‘ProportionofVariance’
print_nacho: no visible binding for global variable ‘Negative_factor’
print_nacho: no visible binding for global variable ‘Positive_factor’
print_nacho: no visible binding for global variable ‘House_factor’
print_nacho: no visible binding for global variable ‘Count_Norm’
print_nacho: no visible binding for global variable ‘Status’
Undefined global functions or variables:
. BD CartridgeID CodeClass Count Count_Norm House_factor MC MedC Name
Negative_factor Positive_factor Proportion of Variance
ProportionofVariance Status X.PC Y.PC
NACHO still guess housekeeping genes even when a list of housekeeping genes is provided.
library(GEOquery)
gse <- getGEO("GSE70970")
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
untar(
tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"),
exdir = paste0(tempdir(), "/GSE70970/Data")
)
# Add IDs
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)
GSE70970_sum <- summarise(
data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
ssheet_csv = targets, # The samplesheet
id_colname = "IDFILE", # Name of the column that contains the identfiers
housekeeping_genes = NULL, # Custom list of housekeeping genes
housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
normalisation_method = "GEO", # Geometric mean or GLM
n_comp = 5 # Number indicating the number of principal components to compute.
)
#> [NACHO] Importing RCC files.
#> [NACHO] Performing QC and formatting data.
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#> - hsa-miR-103
#> - hsa-let-7e
#> - hsa-miR-1260
#> - hsa-miR-500+hsa-miR-501-5p
#> - hsa-miR-1274b
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Normalising data using "GEO" method with housekeeping genes.
#> [NACHO] Returning a list.
#> $ access : character
#> $ housekeeping_genes : character
#> $ housekeeping_predict: logical
#> $ housekeeping_norm : logical
#> $ normalisation_method: character
#> $ remove_outliers : logical
#> $ n_comp : numeric
#> $ data_directory : character
#> $ pc_sum : data.frame
#> $ nacho : data.frame
#> $ outliers_thresholds : list
#> $ raw_counts : data.frame
#> $ normalised_counts : data.frame
unlink(paste0(tempdir(), "/GSE70970"), recursive = TRUE)
my_housekeeping <- GSE70970_sum[["housekeeping_genes"]][-c(1, 2)]
GSE70970_norm <- normalise(
nacho_object = GSE70970_sum,
housekeeping_genes = my_housekeeping,
housekeeping_norm = TRUE,
normalisation_method = "GEO",
remove_outliers = TRUE
)
#> [NACHO] Normalising "GSE70970_sum" with new value for parameters:
#> - housekeeping_genes = TRUE
#> - remove_outliers = TRUE
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#> - hsa-let-7e
#> - hsa-miR-1260
#> - hsa-miR-1274b
#> - hsa-miR-103
#> - hsa-miR-16
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Returning a list.
#> $ access : character
#> $ housekeeping_genes : character
#> $ housekeeping_predict: logical
#> $ housekeeping_norm : logical
#> $ normalisation_method: character
#> $ remove_outliers : logical
#> $ n_comp : numeric
#> $ data_directory : character
#> $ pc_sum : data.frame
#> $ nacho : data.frame
#> $ outliers_thresholds : list
#> $ raw_counts : data.frame
#> $ normalised_counts : data.frame
GSE70970_sum[["housekeeping_genes"]]
#> [1] "hsa-miR-103" "hsa-let-7e"
#> [3] "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p"
#> [5] "hsa-miR-1274b"
my_housekeeping
#> [1] "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p"
#> [3] "hsa-miR-1274b"
GSE70970_norm[["housekeeping_genes"]]
#> [1] "hsa-let-7e" "hsa-miR-1260" "hsa-miR-1274b" "hsa-miR-103"
#> [5] "hsa-miR-16"
👋🏽 I maintain the cran checks badges. Please change to the new cran checks badge URL (e.g., https://badges.cranchecks.info/worst/dplyr.svg
). Old badges at (e.g. https://cranchecks.info/badges/worst/dplyr
) will be unavailable as of Jan 1st 2023.
Move from tidyverse
to data.table
framework.
Thanks for creating this tool / method! The Shiny App / general functionality is really an improvement and works pretty good. However, I'd like to export the data (e.g. normalized counts) to a separate TSV/CSV file to do downstream analyses with it, which is currently not very straightforward.
It would be very nice to see an export functionality - either in the Shiny App or (also good) in the R package in general. Couldn't find anything unfortunately...
Figures and details about the quality-control and normalisation of NanoString datasets are only available through the use of visualise()
.
A function to render a comprehensive report of what has be done during the normalisation and QC process within the map (or with default parameters) might be useful.
Hi!
the documentation for normalize()
suggests that the function creates a list of normalized counts etc pp - however, I only get a Nacho Object back (which is fine but different than expected given the documentation). Maybe this is a relict?
QC threshold slider in visualise()
is wrongly updated in "Binding Density" when switching between "MAX/FLEX" and "SPRINT" instrument measures.
library(NACHO)
data(GSE74821)
visualise(GSE74821)
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Hey Nacho team, when reading in my RCC files, each one harbors the internal positive and negative controls, labeled POS and NEG. How does this affect the predict housekeeping genes?
# insert reprex here
Hi,
Thanks for this great tool.
We were trying to look on how to use NACHO with a single cartridge (12 samples). You have mentioned in the manual - "Each sample in the plots can be coloured based on either technical specifications which are included in the RCC files or based on specifications of your own choosing, though these specifications need to be included in the samplesheet.". I added one column as sample annotations in the samplesheet. The dynamic report was generated with Catridge Id as the grouping variable. When I changed grouping variable to ID, it shows 1-12 numbers rather than sample annotation I have given in the samplesheet.
Can you please help on this to bring the sample annotations rather than 1-12 numbers for the samples?
Thanks,
Athul
Add citation file to cite the package using the upcoming article from Bioinformatics.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
I am trying to implement NACHO on our nanostring data. I am have extreme trouble reading in the RCC files. Our data structure is a follows, I have a master folder RCC and a subfolder for each seq run. This subfolder has 12 samples. The number on each cartridge flow cell. I want to loop through each subfolder and have load_rcc do this. However, list.files() does not work with load_rcc. this returns warnings saying that load_rcc can find the file paths. Any help would be appreciated. However, when I put all rcc files into one folder, load_rcc works fine.
# insert reprex here
Prepare for release:
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
When reading Plexset RCC files which include 8 samples named Endogenous*s
with *
from 1
to 8
.
The Code_Summary
column is not unnested properly, i.e., a two-level list-column instead of a simple list-column.
library(NACHO)
.data <- summarise(
data_directory = paste(params[["data_directory"]], "RCC", sep = "/"),
ssheet_csv = paste(output_directory, "sample_sheet.csv", sep = "/"),
id_colname = "IDFILE",
housekeeping_genes = NULL,
housekeeping_predict = FALSE,
housekeeping_norm = TRUE,
normalisation_method = "GLM",
n_comp = 10
)
# > Error in `[[<-.data.frame`(`*tmp*`, "CodeClass", value = character(0)) : replacement has 0 rows, data has 5760
This issue might araise other issues later for normalisation and visualisation.
Prepare for release:
devtools::check()
devtools::check_win_devel()
rhub::check_for_cran()
rhub::check_on_macos()
Submit to CRAN:
usethis::use_version('major')
cran-comments.md
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
Submit to CRAN:
usethis::use_version('major')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Prepare for release:
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
urlchecker::url_check()
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
if I follow the example in the vignette I encounter this error:
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)Attaching package: 'NACHO'
The following object is masked from 'package:BiocGenerics':
normalize
GSE70970_sum <- summarise(
cols
must be length 1 (the number of rows), not 3Hey guys,
This looks like an awesome tool! I was wondering if NACHO supported PlexSet analysis where each well/lane is multiplexed with up to 8 samples? If so, it looks like the load_rcc
function requires each sample to be a separate file so should we demultiplex each well/lane?
Thanks in advance for your help!
Yan
Thanks for your amazing package.
@mcanouil
getwd()
library(NACHO)
setwd("/home/zhou/raid/TOPACIO_Oncopanel/TOPACIO_Nanostring/mRNA")
plexset_nacho<-load_rcc("/home/zhou/raid/TOPACIO_Oncopanel/TOPACIO_Nanostring/mRNA",
ssheet_csv = ssheet_csv,
id_colname = "id_colname"
)
however, there are errors.
[NACHO] Importing RCC files.
|=================================================|100% ~0 s remaining
[NACHO] Performing QC and formatting data.
Error: invalid first argument
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
invalid first argument
Backtrace:
1. NACHO::load_rcc(...)
12. base::.handleSimpleError(...)
13. tidyselect:::h(simpleError(msg, call))
Run `rlang::last_trace()` to see the full context.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Hi, I observed differences in the values in the QC plots like the ``BD` plot on repeated execution. See the code below that was executed multiple times.
library(NACHO)
# load RCC files
nacho_data <- load_rcc(data_directory = input_rcc_path,
ssheet_csv = input_samplesheet,
id_colname = "RCC_FILE_NAME")
plot_bd <- autoplot(
object = nacho_data,
x = "BD",
colour = "CartridgeID",
size = 0.5,
show_legend = TRUE
)
I guess this might be due to the use of position_jitter(width = 0.25)
in auto_plot
.
Line 328 in 7784ecd
height
parameter (seems to be NULL
) seems to have an effect and shift the data points verrtically as well.
This difference can be observed executing the code below.
library(ggplot2)
jitter <- position_jitter(width = 0.1)
ggplot(mtcars, aes(am, vs)) +
geom_point(position = jitter)
jitter2 <- position_jitter(width = 0.1, height = 0.0)
ggplot(mtcars, aes(am, vs)) +
geom_point(position = jitter2)
I guess this is not expected. Please let me know if you need more information.
Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Hi,
Tried to run NACHO as mentioned in the supplementary file. Data from GEO is downloaded and Samplesheet.csv is also generated. But once I am using Summarize function it shows "Error: Column cols
must be length 1 (the number of rows), not 3". The reprex() shows different error.
Thanks in advance.
Athul
nacho <-summarize(data_directory =paste0(getwd(), "/GSE70970"),ssheet_csv =paste0(getwd(), "/GSE70970/Samplesheet.csv"),id_colname = "IDFILE",housekeeping_predict = TRUE, normalisation_method = "GEO",n_comp = 10)
#> Error in summarize(data_directory = paste0(getwd(), "/GSE70970"), ssheet_csv = paste0(getwd(), : could not find function "summarize"
Update calls to tidyr functions with v1.0.0.
library(NACHO)
GSE70970_sum <- summarise(
data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
ssheet_csv = targets, # The samplesheet
id_colname = "IDFILE", # Name of the column that contains the identfiers
housekeeping_genes = NULL, # Custom list of housekeeping genes
housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
normalisation_method = "GEO", # Geometric mean or GLM
n_comp = 5 # Number indicating the number of principal components to compute.
)
#> [NACHO] Importing RCC files.
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(Header, Sample_Attributes, Lane_Attributes))`, with `mutate()` if needed
#> Warning: The `.drop` argument of `unnest()` is deprecated as of tidyr 1.0.0.
#> All list-columns are now preserved.
#> This warning is displayed once per session.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(Header, Sample_Attributes, Lane_Attributes))`, with `mutate()` if needed
Hi,
Thank you for your efforts again. Was the positive control F (POS_F) included in positive control linearity calculation?
Based on the code here it seems like all positive probes were used. Nanostring suggests excluding POS_F from this calculation since it is below the limit of detection. Here is the related part in nanostring documentation:
"Note that because POS_F has a known concentration of 0.125 fM, which is considered below the limit of detection of the system, it should be excluded from this calculation (although you will see that POS_F counts are significantly higher than the negative control counts in most cases)."
Thank you!
Issue raised in #19
Hi
I get a crash every time when I try to visualise the data after "renormalisation" even with the example data:
Error in visualise(nacho_norm) :
[NACHO] Mandatory fields are missing in "nacho_norm"!
"load_rcc()" and/or "normalise()" must be called before "visualise()".
Here the code that I copy paste, I have this issue R 3.6.1/3.6.2 linux and mac.
library(GEOquery)
library(NACHO)
# Import data from GEO
gse <- GEOquery::getGEO(GEO = "GSE74821")
targets <- Biobase::pData(Biobase::phenoData(gse[[1]]))
GEOquery::getGEOSuppFiles(GEO = "GSE74821", baseDir = tempdir())
utils::untar(
tarfile = file.path(tempdir(), "GSE74821", "GSE74821_RAW.tar"),
exdir = file.path(tempdir(), "GSE74821")
)
targets$IDFILE <- list.files(
path = file.path(tempdir(), "GSE74821"),
pattern = ".RCC.gz$"
)
targets[] <- lapply(X = targets, FUN = iconv, from = "latin1", to = "ASCII")
utils::write.csv(
x = targets,
file = file.path(tempdir(), "GSE74821", "Samplesheet.csv")
)
# Read RCC files and format
nacho <- load_rcc(
data_directory = file.path(tempdir(), "GSE74821"),
ssheet_csv = file.path(tempdir(), "GSE74821", "Samplesheet.csv"),
id_colname = "IDFILE"
)
nacho_norm <- normalise(
nacho_object = nacho,
normalisation_method = "GLM",
remove_outliers = TRUE
)
visualise(nacho_norm)
Does NACHO provide / perform Background normalization methods / functionality via Background subtraction or thresholding? I just saw the normalization methods available when running load_rcc
and normalize()
but these seem not to include such a normalization approach.
Hi,
Thank you for the great tool. I have a question on calculation of the positive control linearity QC. Based on NACHO's QC plot output and Nanostring's guide it should be "R^2 value". When I looked at the calculation function for this metric, it is actually calculating Pearson's correlation coefficient here . Just wanted to ask if you are aware of this or am I missing a point?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.