guillawme / rfret Goto Github PK

View Code? Open in Web Editor NEW

2.0 6.0 5.0 1.77 MB

Analyze FRET Binding Data with R

Home Page: https://guillawme.github.io/rfret

License: Other

R 100.00%

fret fluorescence spectroscopy binding-assay biochemistry biophysics binding-constant r data-analysis stoichiometry

rfret's Introduction

No longer maintained

I no longer have time to maintain this package. I won't add new functionality, I won't fix bugs, I won't review or merge contributions. Use at your own risk, fork if you want.

Making this package taught me a lot. Most importantly, I realized that I tried to make it too broad in scope. A more generic non-linear curve-fitting package would still be a useful addition to the R package ecosystem (at the time of this writing), but it ought to be a lot narrower in scope and hence a lot simpler in design.

rfret: Analyze FRET Binding Data with R

This R package allows you to analyze FRET binding data and produce this kind of binding curve figure:

Given raw fluorescence data from a FRET binding experiment, you can:

plot all channels (donor, acceptor, FRET) to visually inspect raw data and find possible outliers;
average fluorescence values of technical replicates of a same experiment;
correct FRET signal by subtracting signal from a blank experiment;
guess initial values for the parameters of the binding model equation (kd, signal_min, signal_max);
fit a binding model equation to the data;
report the value of Kd;
plot the corrected FRET signal and the binding curve obtained by fitting the data.

This package allows batch processing and analysis of any number of datasets at a time. It can also process and analyze fluorescence polarization or anisotropy binding data. Support for fluorescence quenching data is also planned.

Installation

First, install the devtools package, if not already present on your system:

install.packages("devtools")

You can then install rfret from GitHub, right from within R:

devtools::install_github("Guilz/rfret", build_vignettes = TRUE)

Usage

You can access a detailed tutorial using the following commands:

library(rfret)
vignette("analyzing_fret_data")

rfret's People

Contributors

Stargazers

Watchers

Forkers

tstas alb202 benjnkns tiana22 bradyajohnston

rfret's Issues

Add support for Job plots

This simply requires the following:

inspect_raw_data should return all plots with linear x scales;
a new function is needed to plot the corrected FRET signal with vertical lines at the theoretical 1:1, 1:2 and 2:1 ratios, and with a linear x scale;
a new function is needed to provide a best guess of the stoichiometry based on the corrected signal.

Make a graphical interface

This sort of defeats the purpose of having a command-line tool, but can be valuable if it's still simpler to use than a spreadsheet program.

Ideally, the user would simply drop a (properly-formatted) data file, indicate names of blanks and titrations data series, and get the result table and final figure immediately.

Shiny seems well suited for that.

Need a function to make a replicates figure (data points with error bars, and fit curve)

It would be nice to make fit_binding_model() able to take any number of datasets, average these datasets, fit against the average data (or all data?), and plot points with error bars.

It should at least:

check how many datasets it receives;
check that the concentration column matches across datasets (otherwise, it seems difficult to handle without ending up with meaningless average fluorescence values), and throw an error if this is not the case.

Error with using fit_binding_model function

Hi, there first of all thanks for making this package. But I have few queries while processing the example data file (fret_binding_data).

I got the corrected file output from fret_correct_signal() and then ran fit_binding_model() and got this error:

fit_binding_model('./my_results/input_corrected.csv',binding_model = "quadratic", probe_concentration = 5)
Error in UseMethod("group_by") :
no applicable method for 'group_by_' applied to an object of class "character"_

Can you please tell if I did some mistake while running it.

Thanks,
Maddy

Implement config file to abstract column names out

See examples here: https://www.r-bloggers.com/a-principle-of-writing-robust-r-program

With a config file mapping actual column names found in the csv files to generic names used in the functions (like "fret_channel"), the whole package will be much more generic.

fit_binding_model can only use one binding model equation

As of commit 181d27d, fit_binding_model() can only be used with one equation (the quadratic binding equation).

Ideally, this function should be able to take any binding model equation (like a homodimerization model, for example) and set of initial parameters, and return parameter values from the fit and a plot.

Available equations should be defined in a dedicated file and accessed by their names, which would allow to add new binding model equations easily.

Write more documentation as vignettes

Latest vignette todo list:

Fix make_figure

The function make_figure was designed with the first prototype of this package. It cannot handle properly the output of fit_binding_model. This needs to be fixed.

Related to #19.

Add +/- 10% shaded area in donor plot

In inspect_raw_data(), figure out a way to calculate the average fluorescence of the donor channel and plot a horizontal line + shaded area representing +/- 10% of this average value.

This will give a useful indication of the pipetting accuracy and precision.

Make a website to present the package

Seems easy using GitHub pages and this R package https://hadley.github.io/pkgdown/articles/pkgdown.html

inspect_raw_data doesn't have any unit test

This should be fixed, because unit tests really make life easier.
Store expected outputs in an rda file to compare with dynamically generated outputs during testing.

Missing logic in FP processing

As it is now, processing FP data with format_data(data_type = "FP") will fail if the input datasets don't contain the polarization, anisotropy and intensity columns.

There should be some logic checking that these columns are present, and if not we should generate them using fp_calculate_pola_aniso_int() (and this function can then become internal). This must be done after mapping internal column names to the actual names designated by the metadata file, because fp_calculate_pola_aniso_int() relies on these internal names.

Overview of this logic:

set column names according to metadata files;
check for polarization, anisotropy and intensity columns;
if not present, calculate them with fp_calculate_pola_aniso_int().

Eliminate code duplication

There is currently a fair amount of code duplication, and stabilizing the interface requires getting rid of this duplication. The goal would be to have generic functions for most tasks, internally calling specialized functions based on the value of a parameter. This way, extending the package would simply require adding a specialized function and a parameter caught by a switch statement in the generic function.

This refactoring started in #33 and fd7f1a6

More things to refactor:

*_average_replicates()
*_inspect_raw_data()

Add support for fluorescence quenching data

This should be straightforward, and only requires studying the data file format to adapt the existing code. See fp_inspect_one_dataset(), fp_format_one_dataset() and related functions for a template.

Use package lintr in unit tests

See https://github.com/jimhester/lintr
Useful to enforce a consistent coding style automatically.

Write a generic parameter guessing function

Not sure this is even possible, because different binding model equations likely have different sets of parameters.

As of commit 0cdadc0, there is only one specialized function to guess the parameters of the quadratic binding equation. It would be nice to have instead a generic function able to guess initial values of parameters to be fitted against experimental data using any binding model equation.

How to set the binding curve baseline?

Because the few datasets I analyzed so far gave negative FRET values after correction by correct_fret_signal(), I added the following line to shift the entire curve upwards such that the minimal value becomes 0:

fret_corr <- fret_corr + abs(min(fret_corr))

This is a problem, because in case the function encounters a dataset that, for some reason, doesn't give negative values, the curve will be shifted up regardless, whereas this is not necessary in this case. This operation can be bypassed easily by checking the occurrence of negative values before.

In case there is no negative value, should the curve be shifted down such that the lowest value becomes zero?

This is not critical, because the absolute vertical position of the curve does not influence the determination of Kd, but the bigger question is: how to give the user an accurate idea of the background before a vertically shifted curve is produced? Maybe in a step before correction?

Improve pre-processing functions

The following should be carried out by pre-processing functions:

Replicate averaging: test for the presence of replicates (levels(as.factor(.data$replicates)) > 1) and if so average them properly.
Metadata checking: check that all expected columns are present, stop with an informative warning if not.

Improve fit_binding_model

It would be very useful to have a way to specify constraints on the parameters to be fitted, for example, it makes sense to:

restrain all parameters to positive numbers (negative numbers are physically impossible),
restrain fret_max to be greater than fret_min (also a physically realistic assumption).

It would also be very useful to have a way to choose between keeping a parameter fixed or letting it vary in the fitting procedure. This is especially useful with the quadratic equation, to let the fitting procedure determine the donor concentration (if the value determined by the fitting procedure matches the intended value, this is a convincing control for the quality of the dataset under study).

Write publication

This package should be presented in a publication, which should explain mostly the why and how.

Ideal target journal is probably the R Journal, but I will consider other open access alternatives.

Make a simulation function

It would be useful to have a binding curve simulation function.

Such a function should take a kd value (or a vector of several values), a probe_concentration value (or a vector of several values), a hill_coeff value (or a vector of several values), and the minimum and maximum values of a concentration series, and should output a simulated binding curve (or several binding curves on the same plot, if a vector of values was provided for any parameter).

This would be a very useful tool for experimental planning.

Weird bugs in fret_correct_signal

Depending on input files, weird bugs can happen in fret_correct_signal.

This is because of R's vector recycling rules. Specifically, the number of points in the titration series must be a multiple of the number of points making up the donor_only control (otherwise, the latter cannot be recycled an integer number of times to match the length of the former). Potential solutions:

average donor_only points if length(donor_only) < length(titration), so that the single resulting number can be recycled against any number of points in the titration series;
enforce length(donor_only) == length(titration), which is the most correct way to set up the experiment anyway.

Enforcing a particular set up is not nice to the user. On the other hand, silent averaging can lead to great confusion, therefore averaging should be explicitly announced with message().

If NA values are present in the raw dataset, doing math with them will propagate them and make the entire corrected dataset full of NAs. This could be checked for, to make sure NAs are dropped properly before doing any math.

Use package here

The here package looks like it could make certain things easier, especially in unit tests, and possibly in make_default_metadata().

Add Travis continuous integration (?)

Not sure this is worth the effort. Can be nice to have if configuration is easy.

See: https://docs.travis-ci.com/user/languages/r

Isolate input file check code

There are many redundant lines of code in fret_format_data and fp_format_data. These lines can probably be isolated in an internal function check_input.

Not enough shapes to plot more than 6 data series

In inspect_raw_data(), shapes are currently used to differentiate data points from different data series (blank, titration, technical replicates). The maximal number of shapes is 6, which allows to plot only three replicates of a complete experiment: 3x (blank + titration).

This limitation can be avoided using colors, but this introduces a potential problem for color blind users. Maybe there is a convenient color palette out there.

Use package covr?

See https://github.com/r-lib/covr
This might help devising unit tests for important parts that are not yet being tested.

Make the package work with variable donor

In the current form, the package only works with data from an experiment with a constant concentration of donor and a titration series of acceptor. It would be useful to make it work the other way as well (i.e. with a constant concentration of acceptor and a titration series of donor), because an experiment can be carried out both ways.

This would involve:

in inspect_raw_data: some code to detect which of the donor and acceptor channel is constant and which is a titration series (this can probably be achieved by fitting lines to the raw fluorescence data, and comparing slopes: the constant data should yield a null slope, and the titration series should yield a positive slope);
in correct_fret_signal: making the current code more generic, to apply the proper correction either on a single value (average of four replicates of the measurement on the constant channel) or on the titrations series.

Important raw data points are not plotted

Not really a bug, more of a design problem.

The inspect_raw_data() function won't plot data points where raw_data$concentration == NA, which is a problem because calculation of the donor bleed through (in correct_fret_signal()) relies on such data points.
On the other hand, setting raw_data$concentration == 0 for these points is easy to do when preparing the data file, but unpractical to handle properly in the code...

I encountered a case where obvious outliers existed in these data points (including a point where detection saturated), and they were not detected by inspection of the control plots (because not plotted).
These points were used to calculate the donor bleed through (Xd), which ultimately gave a Kd = 26.72 +/- 9.98.
Excluding them in the downstream analysis did not change the Kd significantly, giving Kd = 26.86 +/- 10.04.

In some cases, depending on the values of the "good" points and outliers, keeping outliers might significantly alter Kd (?).

Anyway, it would be good to offer a possibility to spot outliers even outside of the titration series, so the user has an opportunity to exclude these points if they decide so.

Add unit tests

This is important, especially for functions average_technical_replicates() and correct_fret_signal().

See http://r-pkgs.had.co.nz/tests.html

guess_quadratic_parameters is too simplistic

I have at least one dataset (from March 17th) which yields a guess of kd=250 using guess_quadratic_parameters(), whereas the real value is 26. It seems to be a problem for nls(), which cannot fit properly with a first guess too far from the actual value.

Error from nls():

NaNs produced

Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model

7. numericDeriv(form[[3L]], names(ind), env)
6. getRHS()
5. assign("rhs", getRHS(), envir = thisEnv)
4. assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)), envir = thisEnv)
3. (function (newPars) { setPars(newPars) assign("resid", .swts * (lhs - assign("rhs", getRHS(), envir = thisEnv)), ...
2. stats::nls(formula = equation, data = fret_corrected, start = parameters) at fit_binding_model.R#44
1. fit_binding_model(dat_Mar17_corr, donor_concentration = 10, params_Mar17)

I don't know how to solve this problem. There is no easier guess for kd than taking the concentration at half-maximum FRET signal. Maybe the call to nls() can be modified to use a more robust algorithm? Maybe we need to use nls2 instead?

Practical examples for using nls2:

Examples must use a provided dataset

The "examples" section of function documentations must show code that can evaluate without error (no dummy variable names, etc.), otherwise devtools::check() issues an error.

This is related to #9 because it also needs an example dataset.

Migrate from base to tidyverse

This is mostly polishing.

All occurrences of lapply and mapply could be rewritten using map functions from the purrr package.

Check the FRET correction

I am not sure that the correct_fret_signal() function, as of commit 6d5d945, works as it should.

I should check how to exactly apply the correction. See following articles:

Add processing of FP data

Fluorescence polarization data are easier to process (only replicate averaging, no signal correction), column names are different. It should be straightforward to process them the same way as FRET data are currently processed (any number of files).

The processing code should detect polarization or anisotropy or both, and apply the replicate averaging on both.

The user should be given the choice to use polarization or anisotropy data (if both are present in the file) in the curve fitting procedure.

average_technical_replicates can only handle 2 replicates

The average_technical_replicates() function only handles 2 replicates, as of commit d33eabc. Ideally, it should be able to handle an arbitrary number of replicates.

This probably involves the use of a function of the apply family, since we would have to iterate over a vector of arbitrary (and unpredictable) length containing as many description words as technical replicates in the experiment.

Submit to CRAN

This will be submitted to CRAN eventually, when current issues are addressed, interface is stable and documentation is complete.

batch_process doesn't check filesystem paths

The function batch_process doesn't check anything on the filesystem paths it works with. I am not sure how it performs when provided with an invalid path, or whether the current implementation is portable.

Improve inspect_raw_data

It would be useful to make inspect_raw_data work from the output of format_data, instead of on raw files. This way, only format_data would depend on the format of actual raw files, and all other functions can rely on a stable format output by format_data.