diyabc / diyabcgui Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 2.0 64.43 MB

User interface to DIYABC/AbcRanger

Home Page: https://diyabc.github.io/

License: Other

R 99.95% Shell 0.05%

population-genetics approximate-bayesian-computation

diyabcgui's Introduction

Graphical User Interface for DIYABC-RF software

Disclaimer: DIYABC-RF GUI is under final development stage. You may still encounter a few bugs. Please check ongoing issues or fill a new one here if you encounter any problem.

We provide a graphical user interface (GUI) for the DIYABC-RF software [1], called DIYABC-RF GUI.

Note: DIYABC-RF GUI replaces the old interface DIYABC V2.1 which is not maintained anymore.

Please check the project website for additional information and detailed documentation.

Availability

DIYABC-RF GUI is available as a standalone application, or as a shiny web app implemented in the diyabcGUI R package .

You can either install and run the standalone app, or install the diyabcGUI R package and run DIYABC-RF GUI as a standard shiny app, c.f. below.

DIYABC-RF GUI provides a set of tools implementing Approximate Bayesian Computation (ABC) combined with supervised machine learning based on Random Forests (RF), for model choice and parameter inference in the context of population genetics analysis.

DIYABC-RF GUI (and the package diyabcGUI) is a user-friendly interface for command-line softwares diyabc and abcranger, which are elementary bricks of the DIYABC-RF pipeline.

Authorship and licensing

The DIYABC-RF GUI software is edited by the DIYABC-RF Core team.

DIYABC-RF Core team: François-David Collin, Ghislain Durif, Louis Raynal, Mathieu Gautier, Renaud Vitalis, Eric Lombaert, Jean-Michel Marin, Arnaud Estoup

The Windows DIYABC-RF GUI standalone app is based on DesktopDeployR by Wyming Lee Pang (https://github.com/wleepang/DesktopDeployR).

See the dedicated file for detailed copyright and licensing information.

Using and citing DIYABC-RF

If you use the DIYABC-RF software suite (GUI or CLI) in your study, please consider citing [1].

Installation

Requirements

zip program

Standalone app

For Windows users:

Please download the latest release of DIYABC-RF GUI at https://github.com/diyabc/diyabcGUI/releases/latest and unzip DIYABC-RF_GUI_<latest_version>.zip
To launch DIYABC-RF GUI, run DIYABC-RF_GUI (or DIYABC-RF_GUI.bat) in the previously extracted directory (either by double-clicking it or in a terminal, you can also create a shortcut to run it by right-clicking on it).
It will open a new tab in your web browser and you can use DIYABC-RF GUI as a web app.

Important: you should not forget to quit the app when you are done with the dedicated button (otherwise some background related processes will remains active). Repeat steps 2 and 3 to launch again the application.

A log file for DIYABC-RF GUI is available in your user-specific directory for temporary files, generally C:\Users\<username>\AppData\Local\Temp\DIYABC-RF_GUI_<timestamp>_<random_number>/.

If you want to open multiple DIYABC-RF projects, you need to simultaneously open multiple instances of DIYABC-RF GUI (i.e. step 2 and 3).

At the moment, the standalone app is not available for Linux and MacOS users. Nonetheless, Linux and MacOS users can install the diyabcGUI package, c.f. below, and run the DIYABC-RF GUI as a standard shiny app.

Note: if encountering instability in the standalone app, we recommend to install and use the shiny app available in the diyabcGUI R package, c.f. below.

R package installation

Install devtools package (if not installed on your system)

install.packages("devtools")

Note: if you encounter any issue when installing devtools, please check the next section.

Install diyabcGUI package

devtools::install_github(
    "diyabc/diyabcGUI",
    subdir = "R-pkg"
)

The first time after installation, you need to download required binary files (e.g. diyabc and abcranger command line tools) by running

library(diyabcGUI)
diyabcGUI::dl_all_latest_bin()

Note: you can run this command from time to time to update the required binary files in case new versions were released.

Launch the interface

library(diyabcGUI)
diyabcGUI::diyabc()

The function diyabc() will launch DIYABC-RF GUI as a standard shiny web app, that you will be able to use either in your web browser or in the Rstudio shiny app viewer.

To run simultaneously mutliple instances of DIYABC-RF GUI, e.g. to simultaneously manage and run multiple projects, you just need to run several times the function diyabc() from R (this is not possible from RStudio).

Potential issue with devtools

You may encounter some issue when installing devtools, please check the official devtools page.

Following devtools recommendations, make sure you have a working development environment.

Windows: Install Rtools.
Mac: Install Xcode from the Mac App Store.
Linux: Install a compiler and various development libraries (details vary across different flavors of Linux).

For Ubuntu users here is a guide to install devtools requirement (users of other Linux distributions may still find it useful).

Shiny server installation

As a shiny app, DIYABC-RF GUI can be installed and run from a Shiny server. To do so, you just need (on Unix system, please adapt for Windows server) to:

install the diyabcGUI package on your system, c.f. above
manage the file access rights so that the Shiny server has access to the R package installation directory
Create a symbolic link to the directory given by the R command system.file("application", package = "diyabcGUI") inside the site_dir folder configured in /etc/shiny-server/shiny-server.conf (by default /srv/shiny-server), e.g.:

ln -s /path/to/R_LIBS/diyabcGUI/application /srv/shiny-server/diyabc

DIYABC-RF GUI is now available on your server at https://my.shiny.server.address/diyabc

Standalone build (for developpers)

Please see the dedicated directory for instructions about the standalone building.

Reference

[1] Collin F-D, Durif G, Raynal L, Gautier M, Vitalis R, Lombaert E., Marin J-M, Estoup A., 2021, Extending Approximate Bayesian Computation with Supervised Machine Learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest. Molecular Ecology Resources, Wiley/Blackwell, 21(8), pp. 2598–2613. <doi/10.1111/1755-0998.13413> <hal-03229207>

diyabcgui's People

Contributors

Stargazers

Watchers

Forkers

caoxie91 zhcchz

diyabcgui's Issues

Windows standalone build

run build_standalone.R (based on R package electricShine)
If 1. not working, try running build_standalone_windows.R (based on R package RInno)

Allow user to provide existing project as zip

When setting up existing project, allow to provide individual config files are existing project zip

[project setup] Load and parse datafile in a background process

To avoid losing interface when loading and parsing large SNP data files

Problem when reading tab-delimited SNP data files when sex is unknown

Potential source of the problem:

tab delimitation
unknow sex (value = 9)
other?

Following error:
"Issue with data file.
Issue with IndSeq SNP data file content: 'SEX' column should contain only 'F' for female or 'M' for male (see manual)."

Write package vignette

Possible vignettes:

full manual
output figure generation

[RF] For MSS data list of possible parameters to estimate is missing in RF section in parameter estimation mode

Avec microsat et/ou séquences, dans la partie random forest, il semble que la liste des paramètres
n’est pas donnée lorsqu’on veut faire une estimation de paramètre (il est écrit « missing
parameter »). Pour autant, ça fonctionne quand même : on peut tout-à-fait faire une estimation de
paramètre sur le paramètre qu’on veut. Il n’y a juste pas ce « pense-bête » qui s’affiche (alors qu’il est
bien présent sur mes essais avec SNPs).
Toutefois, ça n’est pas annodain, car il est impossible de connaître les vrais noms des paramètres
mutationnels à partir du Standalone. Il faut fouiller dans le header pour trouver, ce que beaucoup de
gens ne saurons pas faire. J’imagine d’ailleurs que c’est à cause des paramètres mutationnels que ça
ne marche pas pour les microsats/séquences, alors que ça fonctionne pour les SNPs.

Fix default value for group priors

Problem with default value for group priors + add note on some of the priors + hide some mutation model if not used.

Download only diyabc/abcranger binaries related to the user OS

When running dl_latest_bin() or dl_all_latest_bin(), all binaries are download for all OS, including Windows, MacOS and Linux.

A better behavior would be to only download the binary related to the current OS where the command is run.

[training set simu] Hide sequence mutation model parameters if not used in model

TODO: quick fix, add an help note

Add information about summary statistics

Add following information in the training set simulation panel:

For SNP loci (both IndSeq and PoolSeq SNPs)

WARNING! ALL SUMMARY STATISTICS IMPLEMENTED IN THE PROGRAM WILL BE COMPUTED AND INCLUDED IN THE TRAINING DATASET

For both IndSeq and PoolSeq SNP loci, the following set of summary statistics has been implemented.

Proportion of monomorphic loci for each population, as well as for each pair and triplet of populations (ML1p, ML2p, ML3p)

Mean and variance (over loci) values are computed for all subsequent summary statistics.
2. Heterozygosity for each population (HW) and for each pair of populations (HB)
3. FST-related statistics for each population (FST1), for each pair (FST2), triplet (FST3), quadruplet (FST4) and overall (FSTall) populations (when the dataset includes more than four populations)
4. Patterson’s f-statistics for each triplet (f3-statistics; F3) and quadruplet (f4-statistics; F4) of populations
5. Nei’s distance (NEI) for each pair of populations
6. Maximum likelihood coefficient of admixture (AML) computed for each triplet of populations.

For microsatellite loci

WARNING! ALL SUMMARY STATISTICS IMPLEMENTED IN THE PROGRAM WILL BE COMPUTED AND INCLUDED IN THE TRAINING DATASET

For microsatellite loci, the following set of summary statistics has been implemented.

Single sample statistics:
1. mean number of alleles across loci (NAL)
2. mean gene diversity across loci (HET)
3. mean allele size variance across loci (VAR)
4. mean M index across loci (MGW)

Two sample statistics:
1. mean number of alleles across loci (two samples) (N2P)
2. mean gene diversity across loci (two samples) (H2P)
3. mean allele size variance across loci (two samples) (V2P)
4. F_{ST} between two samples (FST)
5. mean index of classification (two samples) (LIK)
6. shared allele distance between two samples (DAS)
7. distance between two samples (DM2)

Three sample statistics:
1. Maximum likelihood coefficient of admixture (AML)

For DNA sequence loci

WARNING! ALL SUMMARY STATISTICS IMPLEMENTED IN THE PROGRAM WILL BE COMPUTED AND INCLUDED IN THE TRAINING DATASET

For DNA sequence loci, the following set of summary statistics has been implemented.

Single sample statistics:
1. number of distinct haplotypes (NHA)
2. number of segregating sites (NSS)
3. mean pairwise difference (MPD)
4. variance of the number of pairwise differences (VPD)
5. Tajima’s D statistics (DTA)
6. Number of private segregating sites (PSS)
7. Mean of the numbers of the rarest nucleotide at segregating sites (MNS)
8. Variance of the numbers of the rarest nucleotide at segregating sites (VNS)

Two sample statistics:
1. number of distinct haplotypes in the pooled sample (NH2)
2. number of segregating sites in the pooled sample (NS2)
3. mean of within sample pairwise differences (MP2)
4. mean of between sample pairwise differences (MPB)
5. between two samples (HST)

Three sample statistics:
1. Maximum likelihood coefficient of admixture (SML)

Errors encountered when install diyabc standalone on windows

Hi developers,
I tried to install diyabc standalone on Rstudio with R version 4.0.3 with "build_standalone.R" you provided in diyabcGUI-1.0.3.zip; However, the installation process failed with the following errror:

Finshed: Installing your Shiny package into electricShine framework
Error in system.file("extdata", "icon", package = my_package_name, lib.loc = library_path) :
'package' must be of length 1

but the package listed in the package catalogue, so I tried the code > diyabcGUI::diyabc(). GUI turned up but after I chose the data file, the GUI disppeared with the following error:

Loading required package: shiny
Listening on http://127.0.0.1:3375
[1] "project directory: C:\Users\admin\AppData\Local\Temp\RtmpIfo76m\diyabc56d876745c24"
Warning: Error in stri_c: can't find 'content'
50: stri_c
49: str_c
47: check_mss_data_file
46: check_data_file
45:
2: shiny::runApp
1: diyabcGUI::diyabc

Could you help me figure out the problem? I am planning to analyze my mss data with your work.

My system is Windows 10 X64 , RAM32GB, 6 cores.

Thank you very much in advance~

Siran

P.S. I also found DIYabc v2.1.0 and v2.0.4 can't run anymore.

[training set simu][MSS] Add notes on microsat/sequence mutation model parameter setting

TODO like in the old interface:

Add specific abcranger output prefix to allow different runs in a single project

Specify name parameter or candidate models in prefix output for abcranger run.

abcranger option to do so:

  -o, --output arg        Prefix output (modelchoice_out or estimparam_out by
                          default)

Interest: run multiple parameter estimation or multiple model choice procedure in a single project

[project setup] Fix data info for PoolSeq data set

Use following output for data info in case of PoolSeq data

Number of population pools : 4
Total number of loci = 30000
MRC=5 (forget MAF)
Number of loci available with MRC >= 5: XXXXXXXXXXXX (from diyabc output)

Same for section Loci description

Recall under '<n_loci> ': Number of loci available with MRC >= 5: XXXXXXXXXXXX

[training set simu][SNP mode] Missing possibility to manage multiple groups of SNP IndSeq loci

In the interface, it is not possible to add and manage multiple groups of SNP IndSeq loci (especially for different types of chromosomes)

[datagen] Refactor "synthetic data generation" module

This module is currently disabled and should be redesigned (following the same framework as the training set simulation module).

Disable diyabc/abcranger log file cleaning

Purpose: users will be able to provide log file in case of bug

[historical model display] Issue when drawing historical model with multiple merge at same time

Exampe:

N1 N2 N3
0 sample 1
0 sample 2
0 sample 3
t merge 1 3
t merge 1 2

Fix license file

Binary files are not stored at the same place

Add a confirmation request when resetting a project

To avoid unintentional loss of work

[RF][graphical output] Add prior distribution (in dashed line) to posterior graph output

To be discussed

Graphical output generation v1.0.7

Graphical output: modelchoice_out_graph_lda.png is not computed (section 7.3 of the user manual)

[RF] Add a spinner when checking input in RF submodule

To be sure that something is happening

Bottom of body page background (white) is covered by sidebar background (black)

Example with bottom of "Diyabc-rf pipeline page":

Issue with shinydashboard? c.f. rstudio/shinydashboard#349

Windows build script asks for a CRAN mirror at first start

On a vanilla just installed windows 10 machine with R (never launched) :

franc@DESKTOP-QHPLHB6 C:\Users\Franc\Documents\dev\diyabcGUI>"c:\Program Files\R\R-4.0.2\bin\x64\R.exe" --no-save < build_standalone_windows.R

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R est un logiciel libre livré sans AUCUNE GARANTIE.
Vous pouvez le redistribuer sous certaines conditions.
Tapez 'license()' ou 'licence()' pour plus de détails.

R est un projet collaboratif avec de nombreux contributeurs.
Tapez 'contributors()' pour plus d'information et
'citation()' pour la façon de le citer dans les publications.

Tapez 'demo()' pour des démonstrations, 'help()' pour l'aide
en ligne ou 'help.start()' pour obtenir l'aide au format HTML.
Tapez 'q()' pour quitter R.

> # generate standalone interface for Windows
>
> # requirement
> install.packages("devtools")
Error in contrib.url(repos, "source") :
  essai d'utilisation de CRAN sans fixer un miroir
Calls: install.packages -> contrib.url
Exécution arrêtée

[general] Fix check for binary file at startup

Fix file R-pkg/R/zzz.R

[project setup] Parallel IndSeq SNP data parsing on Windows

See https://www.r-bloggers.com/parallelsugar-an-implementation-of-mclapply-for-windows/

Bug in 2_historical_model_display.R?

Hi,

Should 'rev_tree_d' be 'rev_tree_df' at line 260?

Yulong

Automatic download/update of diyabc/abcranger latest release

Write a script to automatically download latest release of diyabc and abcranger

Linux standalone app build not available at the moment

Problem with electricShine

chasemc/electricShine#294

TODO: write a fix for electricShine

[RF][training set simu] Graphical output generation

Generate figure output, c.f. section 7.3 of the manual

[general] Improve console output

Replace print to console by message

Problem: issue when printing environement

Scenario numbering

In the training set simulation sub-module, in the "historical scenario definition" panel, when adding new scenarii and then removing some, the numbering still account for deleted scenarii.

Fix doc pages available from inside the interface

In particular inst/help/hist_model_description.md

[prefs] Add a button in pref to download latest version of diyabc/abcranger binaries

Clicking this button would call dl_all_latest_bin()

[prefs] Add a button in preference to update diyabc/abcranger binaries

A button to trigger diyabcGUI::dl_all_latest_bin()

[RF] Allow to use abcranger without the original data set file if reftableRF and statobsRF files are provided

At the moment, the data set file is always mandatory for any project. Allow user to use abcranger i.e. the "RF-analysis" submodule without requiring to provide the data file if all required files are provided (headerRF.txt, reftableRF.bin and statobsRF.txt)

MacOS standalone app build

ToDo

check possible issue with electricShine

[param estim mode] Allow users to estimate multiple parameters

C.f. diyabc/abcranger#63

Users could estimate multiple parameters by specifying them with r;N1;NA;tb.

ToDo: manage output to avoid result files to be overwritten.

[training set simu] Add possibility to modify existing project

When using "existing project" and uploading existing files, it should be possible to edit the configurations (scenario, priors, etc.)

Check for full coalescence in historical scenario

When defining a scenario, a check for full coalescence should be done.

Example of non coalescent scenario which is currently not detected:

N1 N2 N3 N4
0 sample 1
0 sample 2
0 sample 3
0 sample 4
t1 merge 1 2
t2 merge 3 4

[historical model display] Historical model drawing generates unreadable tree

Example:

Ne Nw Nlou Ngh Ne Nw
0 sample 1
0 sample 2
0 sample 3
tlou-DBlou VarNe 3 NFlou
tlou split 3 5 6 r
tge merge 1 5
tgw merge 2 6
tn merge 4 1
tn merge 4 2

[project admin] Log message when saving project

zip command can take some time and it could reinsure users to have some feedback

[training set simu] Freeze with complex scenarii (in particular merge to ghost pop)

J’ai fait différents tests avec des données microsats, notamment d’analyses réelles pour lesquelles j’avais des résultats, afin de pouvoir faire des comparaisons : je n’ai globalement pas vu de problèmes, à part ceux évoqués précédemment.

En revanche, j’ai rencontré un problème quand je suis passé à une analyse un peu plus complexe. Toujours sur un dataset microsat (ci-joint) contenant 7 pops, le programme s’est figé lorsque j’ai essayé le scénario suivant (à noter que ce scénario fonctionne très bien avec les derniers programmes sous linux) :

N1 NKo NKa NJ12 N2 N3 N4 Na
0 sample 1
0 sample 2
0 sample 3
0 sample 4
0 sample 5
0 sample 6
0 sample 7
tJ12-DBJ12 varNe 4 NJ12B
tJ12 VarNe 4 NgJ12
tgJ12 merge 8 4
tKa-DBKa varNe 3 NKaB
tKa VarNe 3 NgKa
tgKa merge 8 3
tKo-DBKo varNe 2 NKoB
tKo VarNe 2 NgKo
tgKo merge 8 2
t1 merge 8 1
t2 merge 8 5
t3 merge 8 6
t4 merge 8 7
ta VarNe 8 Naold

Edit1: no problem with R package on Linux, c.f. 86c39c2
Edit2: but simulations failed (because need of conditions over time parameters)

C’est un scénario qui utilise notamment une population fantôme (la pop 8) de manière assez récurrente. J’ai fait différents tests, et j’ai l’impression que c’est le fait de faire des merge avec cette pop fantôme plusieurs fois qui pose problème. Si je ne fait qu'un seul merge, ça fonctionne. Par exemple :

N1 NKo NKa NJ12 N2 N3 N4 Na
0 sample 1
0 sample 2
0 sample 3
0 sample 4
0 sample 5
0 sample 6
0 sample 7
tJ12-DBJ12 varNe 4 NJ12B
tJ12 merge 3 4
tKa-DBKa varNe 3 NKaB
tKa merge 2 3
tKo-DBKo varNe 2 NKoB
tKo merge 1 2
t1 merge 5 1
t2 merge 6 5
t3 merge 7 6
t4 merge 8 7
ta VarNe 8 Naold

Il suffit que je merge une seconde pop dans la pop 8 pour que ça plante.

Voilà, that’s all.

Je vais essayer de creuser d’autres aspects, notamment en travaillant avec des datasets Microsat/Séquences et Séquences seules. Ensuite, je ferai des essais sur SNP. Mais ça sera sans doute à la rentrée.

I can't find any bug in the manual, I don't understand if this is a memory issue or what is wrong with the program.
I would highly appreciate any advices or suggestions,
Best regards
Alice

[general] Modify management of project name and sub-projects

Setup main project name (mandatory and not optional as now)
Setup training set simulation sub-project name with a corresponding sub-directory
Setup analysis name for each analysis (to avoid over-writing) with a corresponding sub-directory

diyabc / diyabcgui Goto Github PK

diyabcgui's Introduction

Graphical User Interface for DIYABC-RF software

Availability

Authorship and licensing

Using and citing DIYABC-RF

Installation

Requirements

Standalone app

R package installation

Potential issue with devtools

Shiny server installation

Standalone build (for developpers)

Reference

diyabcgui's People

Contributors

Stargazers

Watchers

Forkers

diyabcgui's Issues

Recommend Projects

Recommend Topics

Recommend Org