bakaburg1 / baysren Goto Github PK

BaySREn. A R package to automatize citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning.

Home Page: https://arxiv.org/abs/2202.10033

License: Other

R 100.00%

baysren's People

Contributors

Stargazers

Watchers

baysren's Issues

Add a "get more results" session, integrating citation and similar articles to positive articles.

Once a session is fully labelled, it is possible to get possibly unrecovered papers from the set of relevant records:

The papers cited by the relevant records: either manually with the creation of a .bib file, through packages like RefManageR which can access CrossRefs or directly through CrossRefs API (eg https://api.crossref.org/works/10.1098/rstb.1999.0425) which lists references. But not always all the required data is present (abstract may be missing and doi may be missing in the citations)
Related papers on PubMed via RefManageR::GetPubMedRelated().
More ideas?

The collected papers may represent a new session, to be labelled in the usual way.

Implement reproducibility in `enrich_annotation_file()`

A global random seed won't work: some models like BART or brms need a direct seed to be passed, and also the seed needs to change when the model ensemble is used.

The idea is to generate a new seed starting from an original one in every iteration of the ensemble.
The problem with just this approach is that the seed would be the same for each run of enrich_annotation_file().

So an even more general solution is needed.

Implement number formatting in tables

The code was already implemented in the manuscript repo. Take it from there

In "Results" files, the formatting of the "New labels" indicator is wrong when no new labels are present

Probably a formatting error

Decrease dependency on external packages

At the moment the package imports 28 packages, making it vulnerable.
As many packages as possible should be replaced by ad-hoc internal functions.

Implement consistent record limit in all search methods

At the moment only search_pubmed and search_ieee have an argument to limit the number of downloaded records, but with different defaults (numeric() in the first and NULL in the second).
Such limit should be implemented also into search_wos too.
Similarly, also perform_search_session needs this argument to be percolated to the search functions, but it's not clear to me whether it should be applied to API search only or also when importing already downloaded records.

Remove duplicated IDs for the same record

For some reason, the ID field keeps collecting duplicated IDs. I need to understand where those are generated and fix the problem. I suspect this happens in create_annotation_file when importing previous session data.

In "Result" files, the "Records to review" indicator shows "incorrect input"

Probably an error in the formatting functions

Manage the search functions returning no results for a given query

The function fails if the query returns no results, with message:

Error in UseMethod("transmute") : 
no applicable method for 'transmute' applied to an object of class "NULL"

Should be easy to fix.

Change create_annotation_file to create_annotation_data

the function does not create a file, so it's a misnomer

Using `search_ieee()` with webscraping produces harmless warnings

Even if the function works, the following output is produced:

Unhandled promise error: argument is of length zero
Unhandled promise error: Invalid InterceptionId.. code: -32602

Understand the cause or suppress the warnings if not solvable.

`perform_search_session` fails if no `year_query` is used

res <- perform_search_session(
query = '(systematic review) AND ((heart failure) AND (COPD))',
year_query = NULL, ## <- this line triggers the error!
actions = c("API", "parsed"),
sources = c("Pubmed"),
session_name = "Session1",
query_name = "Query1",
records_folder = "Records",
overwrite = T,
skip_on_failure = FALSE,
journal = "Session_journal.csv"
)

returns:

Error in data.frame(Session_ID = session_name, Query_ID = query_name, : 
arguments imply differing number of rows: 1, 0

Manage installation of cmdstanr or use rstanarm in the results' analysis.

estimate_performance() relies on a brms model built with cmdstan.
Therefore a user would need to install both cmdstanr and cmdstan to make it work.

Either add cmdstanr and brms as Suggested packages and manage the installation, or switch to rstanarm if possible, which would also skip compilation of the stan program at every run.

Add export/import of .bib files

Useful to interact with other bibliography tools.

Can be built manually or wrapping RefManageR or bibtex packages.

For importation, the functionality can be implemented in the import_data() function.

In the "Results" files, some indicators need fixing

There duplicated indicators like "New labels"/"New_labels" and there is "Total_labeled" whose name need fixing.

bakaburg1 / baysren Goto Github PK

baysren's People

Contributors

Stargazers

Watchers

baysren's Issues

Recommend Projects

Recommend Topics

Recommend Org