hrdag / verdata Goto Github PK
View Code? Open in Web Editor NEWUna herramienta para el uso y análisis de los datos de Conflicto armado en Colombia resultantes del proyecto conjunto JEP-CEV-HRDAG.
License: GNU General Public License v2.0
Una herramienta para el uso y análisis de los datos de Conflicto armado en Colombia resultantes del proyecto conjunto JEP-CEV-HRDAG.
License: GNU General Public License v2.0
Line 7 in 8bf2329
Hi @thegargiulian and @pamadoa I've made updated the README, please change anything as you see fit .
@thegargiulian I added the estimates_exist
function and made just a draft for it, but feel free to change it completely if you need to.
Under the installation instructions, it might be better to replace
install.packages("devtools")
with if (!require("devtools")) install.packages("devtools")
to only install devtools
if it's not already installed. This could reduce unnecessary re-installations.
This issue is with reference to the JOSS review here openjournals/joss-reviews#5844
Hi @thegargiulian, since I have been trying to figure out the issue with mse
, I thought it would be helpful in making the package a little bit more user friendly to provide a list of the stratification used for different estimations in the methodological report. For example, the estimates by year for desaparicion used a stratification by year and is_forced_dis. I don't think this would be obvious for everybody, maybe someone would just filter by is_conflict and is_forced_dis and the stratify by year, which would mean they wouldn't find the estimates, mse
would take too long to run and they would end up with different results. Or in secuestro, the stratification for the estimates by year was by year and sex.
Maybe some people would like to use their own stratification, but I was thinking, for someone who wants to save time or just wants to replicate the results, maybe providing this would be helpful.
What do you think? And, in case you agree, where do you think we should put it?
Thanks!
The README tells users that the data dictionary can be found in inst/docs
however R users that are not package developers are not familiar with these files. The data dictionary should be made easy to access and interact with and preferably within R.
Consider exporting the data dictionary as a data.frame see R Packages Ch. 7.
At minimum, it would be helpful to demonstrate to users how to download or load the file itself. For example:
verdata_dict_fp <- system.file(
"docs/diccionario-variables-replicas.xlsx",
package = "verdata"
)
readxl::read_excel(verdata_dict_fp)
#> # A tibble: 55 × 4
#> `Nombre variable` Tipo `Detalle variable` `Categorías variable`
#> <chr> <chr> <chr> <chr>
#> 1 replica caracter Número de réplica "R1 - R100"
#> 2 match_group_id caracter Identificador único <NA>
#> 3 in_1 numérico Variable dicotómica que tom… "1 \r\n0"
#> 4 in_2 numérico Variable dicotómica que tom… "1 \r\n0"
#> 5 in_3 numérico Variable dicotómica que tom… "1 \r\n0"
#> 6 in_4 numérico Variable dicotómica que tom… "1 \r\n0"
#> 7 in_5 numérico Variable dicotómica que tom… "1 \r\n0"
#> 8 in_6 numérico Variable dicotómica que tom… "1 \r\n0"
#> 9 in_7 numérico Variable dicotómica que tom… "1 \r\n0"
#> 10 in_8 numérico Variable dicotómica que tom… "1 \r\n0"
#> # ℹ 45 more rows
Perhaps even adding download.file(verdata_dict_fp)
Hi @pamadoa!
I saw that there are a couple of places in summary_observed.R
, combine_replicates.R
, and test-summary-observed.R
where you use &&
.
Some of the latest versions of R
don't like the use of &&
and ||
(explained in this Stackoverflow post). Do you think you could rewrite the affected lines of code to remove the &&
symbols, likely replacing them with a single &
and probably also making use of all(...)
.
Thank you!
For any examples that can be run, replace \dontrun
with \donttest
. \dontrun
should only be used in instances where the example really can't be run by the user, \donttest
can handle the examples that take >5 seconds to run
Provide a better title for combine_estimates()
right now it is the name of the function which makes for odd reading in the help page:
I suspect this can be changed to Combine Estimates
silly, but important change :)
Line 7 in 8bf2329
Hi @thegargiulian I have been working on figuring out why it takes so long to run mse
for strata that should already have estimates. I had worried about the changes we made to the data for DANE or the code we were using in the examples, so I created a temporary branch in the CO-examples
repository for all the regarding testing. This code summarizes what I have tried.
First, I downloaded the data directly from DANE and the estimates from the Commission site and used it for this tests.
Then, I used the same code we are using in the examples for the stratification and tested to the lookup_estimates
function to see if it could find the estimates. It can. However, whenever I try to run the mse
function it takes way too long, even though I know the estimates exist. So I created short_mse
function, which is basically mse
, except it doesn't run lcmcr
if it doesn't find the estimates, just returns a tibble containing NA. When I run this, it takes 1, maybe 2 minutes, and returns the results of the estimation, not the NA tibble, so again, the estimates are there, and lookup_estimates
can find them.
I can't figure out why mse
takes so long. Is it possible that it is trying to run lcmcr
even though the estimates exist? I checked the if_else in mse
and it seems fine to me, so I don't know what the issue is.
There is no example associated with the estimates_exist() function.
Note that these tests use data frome extdata
which is not inclduded in the github repository. As such, tests are not reproducible.
Our beta tester noted that some of the parameter combinations can be confusing and might result in undesired results (like NaNs). Is there a way to simplify these arguments?
In the README, could you please provide information on how to contribute to this package and how to report issues, bugs, errors, and other problems with the package?
This issue is with reference to the JOSS review here openjournals/joss-reviews#5844
Additional function reference titles that should be in title case and not the name of the function.
title | function |
---|---|
proportions_imputed | proportions_imputed |
proportions_observed | proportions_observed |
run_lcmcr | run_lcmcr |
Responding to an issue raised in the beta test, confirm_files
(and read_replicates
) should just take a vector from the user containing the number of the replicates they'd like to read. It's fine to note that we're imposing structure on the file naming convention to achieve this
An auxiliary function that given stratum data, returns the stratum identifier and a column that is TRUE
if the stratum has already been estimated and FALSE
if it has not.
verdata requiere el paquete LCMCR como dependencia. La instalación de LCMCR requiere la instalación del GNU Scientific Library. Es posible que necesite instalar esta librería en su computadora por separado antes de instalar verdata.
This tells the user to install LCMCR and GNU Scientific Library. This is not necessary as the package LCMCR is available on CRAN and is listed, correctly, as a package Import. This suggestion may cause confusion for new users.
Could you set up continuous integration, for example with GitHub Actions, to check that the tests and infrastructure are working correctly? The package {usethis} provides quick functionalities to get started. See here.
This issue is with reference to the JOSS review here openjournals/joss-reviews#5844
If replicates aren't in the folder where they should be, the function should alert the user that a replicate was not found.
From points raised in the beta test:
Currently confirm_files
prints a message for the user indicating they have the right files/an error if the file is incorrect. It would be more helpful for the function to output a data frame with two columns: file_path
indicating the path to the file being checked and result
indicating whether the file has been verified.
It would be helpful for the function to crash and list which files are broken rather than just stopping at the first broken file it encounters.
This package requires data to be downloaded from a government website. Naturally, it is not clear or straight forward how to download the data.
It would be very helpful if there was a vignette or short guide on how to download this data. The link provided in the readme is the dataset description and not the path to the downloads themselves.
Alternatively, it would be quite nice if the package provided a utility to actually download the data itself. For example I was able to download the secuestro data like so:
# create a temporary directory
tmp <- tempdir()
# create path to save to
ver_path <- file.path(tmp, "verdata.zip")
# download the file
download.file(
"https://microdatos.dane.gov.co/index.php/catalog/795/download/22642",
ver_path
)
# unzip it
unzip(ver_path, exdir = file.path(tmp, "verdata"))
# read it
verdata::read_replicates(
file.path(tmp, "verdata", "Secuestro.csv"),
"secuestro",
1
)
Ideally, we would be able to produce nicely formatted documentation in both Spanish and English for all aspects of the code and have that organized in a easy to access way for all users.
I've asked around the R-Ladies Slack for best practices for multilingual documentation and was suggested to take a look at this proof of concept: https://github.com/eliocamp/translated
See suggestion in #20
Currently test-estimate_mse.R
, test-lookup_estimates.R
, and test-read_replicates.R
are giving testthat
warnings
The examples provided here in a separate repo (https://github.com/HRDAG/verdata-examples) should be properly linked to this repository. One way to do that would be through package vignettes [See the vignettes section of the R Packages book]. To take this even further, it might be better to set up a website for this repository where the vignettes can be accessed as Articles. See the website chapter of the R Packages book.
I see there is a related issue here HRDAG/verdata-examples#2 to convert the examples to a human-readable format. I will leave this to the authors' discretion.
This issue is with reference to the JOSS review here openjournals/joss-reviews#5844
You are setting options(warn=-1) in your function. This is not allowed. Please rather use suppressWarnings() if really needed.
-> R/estimate_mse.R; tests/testthat/test-combine_estimates.R
Many functions refer to the same/similar inputs or arguments and there should be consistency in naming across all function files. In my original review of the documentation I noticed the use of violation
and violacion
as arguments, there may be other instances where argument names (and associated documentation) can be harmonized
Many functions take a violation type as an input, but most do not do any sort of checking to ensure that the violation type matches one of the four acceptable options.
Add license file to .Rbuildignore
In my opinion, the second sentence onwards might be better placed in the README as it seems to get lost in the package DESCRIPTION file.
Line 20 in 9c5af3d
This issue is in relation to the JOSS review here openjournals/joss-reviews#5844
Not all functions provide explicit checks for correctness of inputs or helpful error messages if the inputs take incorrect forms. There should be explicit input checking for each function in verdata
and associated testthat
tests.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.