vincentalcazer / stataid Goto Github PK

StatAid is a Shiny app built in an R package to guide researchers and clinicians through data analysis

License: Other

R 99.31% HTML 0.07% TeX 0.62%

data-analysis r statistics life-sciences

stataid's Introduction

Welcome to StatAid

StatAid is a free open-source software provided as an R package allowing clinicians and researchers to perform statistical analysis through an intuitive graphical interface. It has been developed with the R software, using the Shiny package. Golem has been used for package compilation and deployment.

The software guides the users through the steps of a good data analysis, including multiple features such as:

Exploratory data analysis: distribution, count, missing-values and outliers check
Descriptive analysis, simple comparative analysis and publication ready ‘table 1’ output
Publication-ready graph customization
Paired data analysis (matched case-control studies, repeated measures)
Univariate analysis and models for continuous and categorical outcome: Correlation, linear and logistic regression
Univariate analysis and models for time-dependent outcome: Kaplan-Meier curves and cox regression
Multivariate analysis and models for continuous, categorical and time-dependent outcomes

ROC Curves

Getting started

Online version

StatAid has a ready-to-use online version available at https://vincentalcazer.shinyapps.io/StatAid/.

Local version

You can install the development version from GitHub either by cloning the repository or directly by downloading the package in R:

install.packages("remotes")
remotes::install_github("VincentAlcazer/StatAid")

StatAid::run_app()

Quick-start user guide

If you are not familiar with StatAid or just want to have an overview of the different possibilities, you can check the StatAid’s quick-start user guide

Citing StatAid

If you found StatAid useful and used it for your research, please cite the paper published in the Journal of Open Source Software.

Troubleshooting and contribution

All troubleshooting and contributions can be found on the Github page.

Bug report

If you encounter any problem with the software or find a bug, please report it on GitHub:

Create a new issue on the Github page
Try to describe the problem/bug with reproductible steps

Feature request

To ask for new feature implementation/current feature enhancemenet:

Create a new issue on the Github page
Briefly describe the research question you want to answer and the type of data you have
If possible: provide pictures of the graph you would like to make or references from the paper you saw the analysis in.

Contribution proposal

Contributions to new features or code enhancement are welcomed by creating a new pull request.

stataid's People

Stargazers

Watchers

Forkers

nistara muriel3

stataid's Issues

JOSS Review

Hello, Thanks for submitting this to JOSS and inviting me to review. I found the software very useful. Very good work. Congratulations. I am adding a few comments for this below:

I am not able to see the page correctly there is a overlap between the mobile/drag and drop menu in the right and the table area for the Data Loading tab. (I am using a Windows system). Any thoughts about this? Can the table to be placed a little lower?
I tried to run this app without the any data upload : It shows me this error for all combinations dropdown items selected and running the Descriptive Table analysis in Data Exploration tab. Am I selecting the wrong options?
Let me know if I am doing anything wrong here: I am trying to run GAM and LOESS for two numeric variables I get this error.
Please add Community Guidelines to the README.md file: stating how to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support.
I see that the description and walk-through is well written inside the application interface. It would be very helpful for other users not in the medical/clinical research to use this package for their own research. A descriptive README.md containing example usage and functionality documentation would be very helpful. I strongly suggest to write a descriptive README.md in your repository. Also, this might be a general comment: Is there a possibility to add description of the pre-loaded data base columns to understand the working of the package better.
Are there any other software similar to StatAid existing already? It will be useful to add a comparison here.
You can also list the academic papers in the API Documentation/ README.md where StatAid has been cited. As stated in the paper.md: "It has already been used by multiple researchers and students for their PhD or medical doctors for their thesis."
I think the good practice of writing cleaner consistent code can be implemented here using these wonderful resources (I too found out during my paper review). Implementing 80 or 120 characters in a line and spacing consistency, see http://adv-r.had.co.nz/Style.html and https://devguide.ropensci.org/building.html#code-style.
I would also suggest to do a check using this command- devtools::check(). I came across three warnings(). To solve those you can add some of the build ignore files into this file. There is also a import from namespaces issue/warning which can be probably solved using this.
A minor addition in the installation here would be more clear:

# install.packages("remotes")
# remotes::install_github("VincentAlcazer/StatAid")
# StatAid::run_app()

I would recommend to more software usage in the paper.md file.
I would also suggest to add a few more automated tests other than the recommended one in golem. Please let me know if they are already added and if I skipped it.
I downloaded the data sets and found that the IRIS data set was not giving desired results I was not able to select any numeric column for regression and correlation or run lm/gam/loess. Are there any other checks to be done before we enter new data into the software other than those mentioned in the software Introduction tab.

Let me know if you have any further questions.
Thanks again,
Stay Safe and Regards
Adithi

JOSS Review

Hi @VincentAlcazer,

Thank you for your submission to JOSS and for inviting me to review. I think StatAid is a great app, and I wish I had come across it sooner!!

Here are my comments and suggestions:

Regarding the app

Introduction
a. The text bleeds off of the grey content background (noticed both on Chrome and Safari browsers):

Data loading
In addition to the sample dataset already provided, I was able to explore the app using this penguin dataset as well. It uploaded and worked well.

a. Consider emphasizing the sentence describing the file types supported by your app, so it stands out for new users: e.g. make the "...your dataframe should be saved as a tab (.txt/.tsv), comma or semicolon (.csv) delimited file" bold or italic or add some sort of emphasis on this key point!

b. Regarding the example dataset: In the Readme or vignette/tutorial, consider giving a slightly more detailed explanation of each variable. Also, consider your target audience....if it's worldwide beyond France, you should think about changing the variable name "Sexe" to "Sex".

c. Enable horizontal scrolling in the table view because columns are otherwise cut off to the right:

They remain cut off even when I zoom out a lot:

Data exploration
a. In "Categorical variables" exploration, there is a very useful "Summary table". It's showing the breakdown by only one variable, however, and not both variables plotted (for e.g. no FAB in the screenshot below):

b. In the Descriptive table, I think it's great and I appreciate you put in a short explanation of the methods used. Suggestions for the methods paragraph, below, are:

Methods : Categorical variables are expressed as n (%) and compared with the Chi-squared test or its non-parametric alternative Fisher's test with simulated p-values. Numerical variables are expressed as mean (standard-derivation) or median [IQR] and compared with else Welch's t-test (or its non-parametric alternative Wilcoxon's rank-sum test) or ANOVA (or its non-parametric alternative Kruskal-Wallis test) where appropriate. Variables with >80% missing are removed from the analysis. P-values are not adjusted.

Change standard-derivation to standard deviation, IQR to interquartile range, and else to either above.

c. In custom graphs, the text might be overlaid on the graph itself...so possibly have an option to increase Y-axis limits? I tried increasing the Y axis limits, but didn't see a change in the plot:

Paired-data analysis section
a. In the data format, you state: "each row correspond to an observation (sample/patient), each selected variable/column to a different timepoint of the same variable (e.g. Concentration_day0, concentration_day15, concentration_day30...)"

The example dataset you provided doesn't have columns corresponding to the names above. I know users will have their own datasets with paired/time-dependent columns, but for the sake of a good explanation, your sample dataset should have columns which can be used in this case, and whose column names are identifiable in the example you give above. Doing so will be much more intuitive for new users.
Univariate analysis
a. I could run the linear model, but not GAM or LOESS in the Continuous Univariate analysis section:

The error I get is:
```
Warning: Error in eval: object 'Age' not found
```
I'll check to see if there's an outdated package at my end causing this issue. If so, others might comes across this issue as well.

Points on the paper, documentation, and tests
a. Your statement of need says "The package provides all the tools necessary to perform a complete data analysis in an intuitive ready-to-use graphical interface. Any user with no coding skill or no software with paid-license available can easily perform all the steps of a good statistical analysis, from data-exploration/quality check to multivariate modeling."

I think this is an excellent application that provides essential and core statistical analytical tools, however, I don't think it provides "all the tools for a complete data analysis". I'm sure you'll agree that there are many different types of analyses that suit different situations (and have various assumptions), for e.g. mixed-effects models might in many cases be more appropriate than standard linear regression models in health data analyses. It's great that you later specify that users can request new features (further adding emphasis to the point above that not all tools are included) - I think this evolving aspect, in combination with the existing features make it a solid package. I therefore recommend you to update your Statement of need to reflect your package's use better.

While I see how useful this application can be (and believe it's been used as you say), I also agree with Adithi regarding prior use of this software: if you state that it's been used by researchers and doctors, it helps to have a citation, else it's not verifiable.

Btw, regarding other apps that do the similar things, I came across other shiny apps that provide various aspects of statistical analysis and visualization, for e.g. https://pkgs.rsquaredacademy.com/, but your app is unique in that it incorporates various steps of the analytical workflow in one package itself.

b. In both the paper and README, under the features section, there is the point "Paired data analysis (case-control studies, repeated measures)". Case control studies are a type of observational study and it's not necessary that paired data analysis will always be used for this kind of study (for e.g., even logistic regression can be used, depending on the type of case-control study, matched pairs vs not, etc).

c. One important thing about the analyses you've included: there are no references about functions from various packages (or base R) that are used for statistical analyses, nor any references about the statistical methods themselves. The Contact/About page has the full package list though, and it would be a good place to add summarized information about specific functions used to perform various tests: e.g. glm(), coxph(), gam(), cor.test(), and any others you used. Your users can then look these functions up for further references and also get an idea of default function options and any other pertinent information. This could be crucial for people considering using your package for their research and publications.

d. In line with Adithi's suggestion, I recommend adding more examples to your documentation. Beyond making it easier for users to grasp what your package is doing and enabling them to see what's possible, it is also a requirement for JOSS submission that we have to check for. The addition of a short description of the dataset helped immensely, and adding more examples to the README would be useful while you work on your detailed tutorial on the side.

e. Regarding tests, you've got three tests pertaining to the app itself (as seen in test-golem-recommended.R and when devtools::test() is run). There aren't any tests for the functions in Functions.R themselves. It's recommended to have tests for functions in your package, especially as it starts adding more features and becomes more complex. They'll also lend further credibility to your package. A useful resource for this would be https://mastering-shiny.org/scaling-testing.html. I suggest adding a few more tests, as also recommended by Adithi, while understanding that it's an iterative process which will evolve even after the review is finished.

I've made minor typographical corrections to your app interface, documentation and paper, which you can see via my pull request. It was easier to do it like that than mention them separately.

Some suggestions which you could keep in mind as your package evolves. You don't need to incorporate them now, but it's food for thought:

a. X axis labels can go outside of plot margins if the labels are really large (for e.g. with the penguin dataset). Ideally the labels won't be so big, but just bringing this to your notice.

b. Using color palettes suitable for the colorblind people amongst us. Right now the plots will appear as below to people with Deuteranopia (red-green colorblindness, which is the more common form):

It's difficult to choose a colorblind friendly palette when there are many categories, but as seen in the topmost image, a palette could be chosen such that it is more easily distinguishable when there are fewer categories.

c. A way to export the tables and graphs would be great.

d. The figure options menu on the right overlaps with the plots depending on the zoom level:
At 100% zoom:

At 90% zoom:

Please let me know if you have any questions on my comments above. I think this app is really good, and I can definitely see people using it to explore and work with their data.

Merci,
Nistara

Survival estimates at time points on KM curves

Reporting survival estimates at time points on KM curves. Eg 2 and 5 year OS and PFS , with 95% CI and p values.
Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.