hadexversum / hadex Goto Github PK
View Code? Open in Web Editor NEWAnalysis and Visualisation of Hydrogen/Deuterium Exchange Mass Spectrometry Data
Home Page: http://mslab-ibb.pl/shiny/HaDeX/
Analysis and Visualisation of Hydrogen/Deuterium Exchange Mass Spectrometry Data
Home Page: http://mslab-ibb.pl/shiny/HaDeX/
Additional personalization : user can choose colors of the plot and change labels if there is need.
It is not default - enabled by action button.
Primary code is hard to read due to messy-named variables (that's on me). Once we reach an agreement on the glossary project, the variables should be renamed for consistency.
The manipulation of the data to produce data frames for later use should be put into package functions.
The reactives that should contain a function call instead of data processing:
For tests should be generated test data - from actual version the data from plots (available to download from the app) should be in separate csv files.
Assumptions about the task:
add parameter "length" to the function reconstruct_sequence to manually correct sequence length read from experimental data. this functionality is already done in app
As we rework the application, changes in UI should be made:
fractional
as an optionAll of these changes should be also included in the report!
Confidence limits and t-tests from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3164548/ Note that this article comes with their own error propagation method.
An article for the documentation about our methods of data visualization.
It should include the description, construction, and examples of:
I have added download button for one of our plots https://github.com/hadexversum/HaDeX/blob/master/inst/HaDeX/server.R#L889-L901. It is also related to moving the part of the code responsible for the plot generation into the plots directory. Would you mind dealing with other plots?
S3 objects and methods (print, summary) for
Before our intended semiparametric test implementation:
there should be an option in the app to show Houde test for all of the time points (now is only calculated for the selected time point) as the semiparametric test uses all time points as well.
Add all equations along with the exemplary code and plots. The vignette should parse to the pdf.
Woods plot computations and plotting functions should be moved into the package.
The parametrization of the label size is useful and allows the production of readable descriptions.
Every text input for a label (axis/title) should be accompanied by the select input of the size for that object. The size of the legend is the same as for x axis.
At the end of all changes, the report template should be updated to include all new features.
Change state name from gui and propagate changes everywhere
We're seeing these errors when testing HaDeX against the soon to be released version of dplyr (1.0.0):
[master*] 126.1 MiB ❯ revdepcheck::revdep_details(, "HaDeX")
══ Reverse dependency check ═══════════════════════════════════════ HaDeX 1.1 ══
Status: BROKEN
── Still failing
✖ checking dependencies in R code ... NOTE
── Newly failing
✖ checking examples ... ERROR
✖ checking tests ...
── Before ──────────────────────────────────────────────────────────────────────
❯ checking dependencies in R code ... NOTE
Namespaces in Imports field not imported from:
‘DT’ ‘gsubfn’ ‘stringr’
All declared Imports should be used.
0 errors ✔ | 0 warnings ✔ | 1 note ✖
── After ───────────────────────────────────────────────────────────────────────
❯ checking examples ... ERROR
Running examples in ‘HaDeX-Ex.R’ failed
The error most likely occurred in:
> ### Name: quality_control
> ### Title: Experiment quality control
> ### Aliases: quality_control
>
> ### ** Examples
>
> # load example data
> dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))
>
> # calculate mean uncertainty
> (result <- quality_control(dat = dat,
+ state_first = "CD160",
+ state_second = "CD160_HVEM",
+ chosen_time = 1,
+ in_time = 0.001))
Error in `[.data.table`(dat, "Exposure") :
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Calls: quality_control -> unique -> [ -> [.data.table
Execution halted
❯ checking tests ...
See below...
❯ checking dependencies in R code ... NOTE
Namespaces in Imports field not imported from:
‘DT’ ‘gsubfn’ ‘stringr’
All declared Imports should be used.
── Test failures ───────────────────────────────────────────────── testthat ────
> library(testthat)
> library(HaDeX)
>
> test_check("HaDeX")
── 1. Error: class is right ───────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
1. testthat::expect_is(...)
4. HaDeX::quality_control(...)
7. data.table:::`[.data.table`(dat, "Exposure")
── 2. Error: size is right ────────────────────────────────────────────────────
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Backtrace:
1. testthat::expect_equal(...)
6. HaDeX::quality_control(...)
9. data.table:::`[.data.table`(dat, "Exposure")
══ testthat results ═══════════════════════════════════════════════════════════
[ OK: 16 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 2 ]
1. Error: class is right
2. Error: size is right
Error: testthat unit tests failed
Execution halted
2 errors ✖ | 0 warnings ✔ | 1 note ✖
There should be elements:
The same for the butterfly differential plot.
Add tables as tabs for charts (nested tabs). Tables should be created using DT template.
The reworking of the application and the package must be accompanied by major changes in the documentation.
The changes should cover:
I suggest more short articles than one long vignette.
Right now fileInput accepts only csv files. Should we extend it to xlsx?
The calculation of the mass (aggregation of measurements for different charge values) should be available as a separate function, not only as a part of calculating state deuteration.
For consistency, there should be made small changes:
I want to keep generate_* functions as internal and keep only necessary ones. Some of already existing ones should not be internal.
Short log entries to check if tool is used (not by use).
My proposition : when, size file, maybe ip?, if raport was generated
There should be a possibility to download the deuterium curves of all the peptides (one uptake curve per peptide, with biological states), as it is usually included in a supplement of the publication.
There should be two options: download a zipped folder with all the plots separately or one file with all the plots plotted in a grid. Each plot should have in the title the peptide sequence and its position in the protein sequence.
This is a request from MD.
There should be a comparison of data measured in repetitions of the experiment.
The comparison should be performed for a specific time point, as the measurements are repeated n times in each time point and can be treated as somehow separate experiments.
This comparison should be implemented as a function and available in the GUI, as it allows to spot the differences between replicates and possibly disqualify a replicate.
For comparison plot: show all states (user can choose which states to see simultaneously in checkbox, default : all)
For differential plot: user chooses two states to compare
Uncertainty - treat each time (aside from t_0) point as t_100. Returns data.fram, where each row represents a single time point and its mean uncertainty.
As an option - include back-exchange effects in experimental calculations as proposed in "Recommedations for ... HDX-MS experiments" by Rand and others
Results should be similar to theoretical calculations but would be nice to compare them.
Code for reconstructing covered protein sequence from the dynamix file should be available as a function
There should be:
Package function to check what kind of file is uploaded by user and change it if needed (calculate and aggregate).
DynamiX can produce different files - HaDeX should be able to work with all of them.
Possible css templates?
the supplement says:
After obtaining the mass of the peptide, we can compute the deuteration level depending on the chosen maximum deuteration level. The maximum deuteration can also be computed in two different ways: either as theoretical (where the maximum deuteration depends on the theoretical deuteration levels) and experimental (where the maximum deuteration is assumed to be equal to the deuteration measured at the last time point).
Experimental deuteration level
The experimental deuteration level is computed as the deuteration level of the peptide from a protein in a specific state and after incubation time 𝑡
compared to the deuteration level measured at the start of the incubation (𝑡0). It yields a value for the chosen state and chosen time 𝑡.
It would be nice to have a formula explaining how exactly we go from from average mass to deuteration, just for the sake of clarity.
Add package functions for replicate analysis (tab Replicates
) from the application.
amino coverage comparison from two IAO files in two colors
Add basic description how theoretical calucations work (theoretical ins and outs). This information is included in the vignette but we're not sure the potential user will check them out or just get discouraged. How to do it cleverly?
All of the existing functions (except for the deprecated ones) should be re-written with data.table and stringi instead of dplyr and stringr. This is also an opportunity to brush up the documentation of each function and provide complete test coverage (aim for the green).
The current summary of the situation can be found in the document.
This issue also covers updates of tests.
.rmd template for html reports
First at all, thank you very much for your package. It has been really cool to see an R package for HDX data.
I have just started using it, so I would like to thank you all.
I have one or two comments.
###Disclaimer: I am not programmer. I just know some R and do HDX and was just using your software to plot some HDX data, since I like it, I wanted to share these ideas with you (I guess you probably already thought of it, but just in case) ####
In order to process directly the kinetic for all the peptides, I wrote this:
dat <- read_hdx(system.file(package = "HaDeX", "HaDeX/data/KD_180110_CD160_HVEM.csv"))
#check the states and proteins in the experiment. This is only for me, to list the diff states and proteins
States <- unique(dat[["State"]])
Proteins <- unique(dat[["Protein"]])
Exposure <- unique(dat[["Exposure"]])
kin_state <- function(state, protein, time_in, time_out, start, end, sequence, data) { #this function calculates the kin for a list of peptides
peptide_state <- data %>% group_by(Sequence) %>% filter(Sequence == sequence) %>% group_by(State) %>% summarise() %>% as_vector()
if (!state %in% peptide_state) { #this filters out the peptides that are not present in the state indicated
print(peptide_state)
print(paste(as.character(sequence), " does not belong to", as.character(state)))
kin <- NA
} else {
kin <- calculate_kinetics(data, #calculates kin for all the peptides for a given state in the table
protein = protein,
state = state,
sequence = sequence,
start = start,
end = end,
time_in = time_in,
time_out = time_out)
}
return(kin)
}
so that then you can have a tidy df as follows:
kin_data <- dat %>%
filter(!Sequence == "") %>%
mutate("Seq_peptide" = Sequence) %>%
group_by(Sequence) %>%
nest() %>%
mutate("kin_State1" = map(.x = data,
.f = ~kin_state(state = States[1],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat)),
"kin_State2" = map(.x = data,
.f = ~kin_state(state = States[2],
protein = Proteins[1],
time_in = 0,
time_out = 1500,
start = .x$Start[1],
end = .x$End[1],
sequence = .x$Seq_peptide[1],
data = dat))
)
and now, this is the main reason why I wrote. Regarding plot_kinetics, I think that it would be a good idea to have color/fill = State as well (I did it in my PC), but I think it would be useful to have it out of the box, so that you can easily do things like:
bind_rows(kin_data$kin_State1[1:8], kin_data$kin_State2[1:8]) %>%
plot_kinetics_State(theoretical = FALSE,
relative = FALSE) +
facet_grid( ~ Sequence)
and then you have in one plot many peptides in panels with same color for easy comparison.
Thank you very much again.
Best,
Alonso
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.