Coder Social home page Coder Social logo

bakaburg1 / baysren Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 2.03 MB

BaySREn. A R package to automatize citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning.

Home Page: https://arxiv.org/abs/2202.10033

License: Other

R 100.00%

baysren's People

Contributors

bakaburg1 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

baysren's Issues

Add a "get more results" session, integrating citation and similar articles to positive articles.

Once a session is fully labelled, it is possible to get possibly unrecovered papers from the set of relevant records:

  • The papers cited by the relevant records: either manually with the creation of a .bib file, through packages like RefManageR which can access CrossRefs or directly through CrossRefs API (eg https://api.crossref.org/works/10.1098/rstb.1999.0425) which lists references. But not always all the required data is present (abstract may be missing and doi may be missing in the citations)
  • Related papers on PubMed via RefManageR::GetPubMedRelated().
  • More ideas?

The collected papers may represent a new session, to be labelled in the usual way.

Implement reproducibility in `enrich_annotation_file()`

A global random seed won't work: some models like BART or brms need a direct seed to be passed, and also the seed needs to change when the model ensemble is used.

The idea is to generate a new seed starting from an original one in every iteration of the ensemble.
The problem with just this approach is that the seed would be the same for each run of enrich_annotation_file().

So an even more general solution is needed.

Implement consistent record limit in all search methods

At the moment only search_pubmed and search_ieee have an argument to limit the number of downloaded records, but with different defaults (numeric() in the first and NULL in the second).
Such limit should be implemented also into search_wos too.
Similarly, also perform_search_session needs this argument to be percolated to the search functions, but it's not clear to me whether it should be applied to API search only or also when importing already downloaded records.

Remove duplicated IDs for the same record

For some reason, the ID field keeps collecting duplicated IDs. I need to understand where those are generated and fix the problem. I suspect this happens in create_annotation_file when importing previous session data.

`perform_search_session` fails if no `year_query` is used

res <- perform_search_session(
query = '(systematic review) AND ((heart failure) AND (COPD))',
year_query = NULL, ## <- this line triggers the error!
actions = c("API", "parsed"),
sources = c("Pubmed"),
session_name = "Session1",
query_name = "Query1",
records_folder = "Records",
overwrite = T,
skip_on_failure = FALSE,
journal = "Session_journal.csv"
)

returns:

Error in data.frame(Session_ID = session_name, Query_ID = query_name, : 
arguments imply differing number of rows: 1, 0

Manage installation of cmdstanr or use rstanarm in the results' analysis.

estimate_performance() relies on a brms model built with cmdstan.
Therefore a user would need to install both cmdstanr and cmdstan to make it work.

Either add cmdstanr and brms as Suggested packages and manage the installation, or switch to rstanarm if possible, which would also skip compilation of the stan program at every run.

Add export/import of .bib files

Useful to interact with other bibliography tools.

Can be built manually or wrapping RefManageR or bibtex packages.

For importation, the functionality can be implemented in the import_data() function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.