Coder Social home page Coder Social logo

hrdag / verdata Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 4.0 3.74 MB

Una herramienta para el uso y análisis de los datos de Conflicto armado en Colombia resultantes del proyecto conjunto JEP-CEV-HRDAG.

License: GNU General Public License v2.0

R 99.46% TeX 0.54%
armed-conflict capture-recapture colombia multiple-imputation multiple-systems-estimation rstats

verdata's People

Contributors

mduranf avatar pamadoa avatar thegargiulian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

verdata's Issues

README updates

Hi @thegargiulian and @pamadoa I've made updated the README, please change anything as you see fit .

@thegargiulian I added the estimates_exist function and made just a draft for it, but feel free to change it completely if you need to.

Create a list of the stratification used in the methodological report?

Hi @thegargiulian, since I have been trying to figure out the issue with mse, I thought it would be helpful in making the package a little bit more user friendly to provide a list of the stratification used for different estimations in the methodological report. For example, the estimates by year for desaparicion used a stratification by year and is_forced_dis. I don't think this would be obvious for everybody, maybe someone would just filter by is_conflict and is_forced_dis and the stratify by year, which would mean they wouldn't find the estimates, mse would take too long to run and they would end up with different results. Or in secuestro, the stratification for the estimates by year was by year and sex.

Maybe some people would like to use their own stratification, but I was thinking, for someone who wants to save time or just wants to replicate the results, maybe providing this would be helpful.

What do you think? And, in case you agree, where do you think we should put it?

Thanks!

Make data dictionary available as data frame in package

The README tells users that the data dictionary can be found in inst/docs however R users that are not package developers are not familiar with these files. The data dictionary should be made easy to access and interact with and preferably within R.

Consider exporting the data dictionary as a data.frame see R Packages Ch. 7.

At minimum, it would be helpful to demonstrate to users how to download or load the file itself. For example:

verdata_dict_fp <- system.file(
  "docs/diccionario-variables-replicas.xlsx",
  package = "verdata"
)

readxl::read_excel(verdata_dict_fp)
#> # A tibble: 55 × 4
#>    `Nombre variable` Tipo     `Detalle variable`           `Categorías variable`
#>    <chr>             <chr>    <chr>                        <chr>                
#>  1 replica           caracter Número de réplica            "R1 - R100"          
#>  2 match_group_id    caracter Identificador único           <NA>                
#>  3 in_1              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  4 in_2              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  5 in_3              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  6 in_4              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  7 in_5              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  8 in_6              numérico Variable dicotómica que tom… "1 \r\n0"            
#>  9 in_7              numérico Variable dicotómica que tom… "1 \r\n0"            
#> 10 in_8              numérico Variable dicotómica que tom… "1 \r\n0"            
#> # ℹ 45 more rows

Perhaps even adding download.file(verdata_dict_fp)

Replace use of &&

Hi @pamadoa!

I saw that there are a couple of places in summary_observed.R, combine_replicates.R, and test-summary-observed.R where you use &&.

Some of the latest versions of R don't like the use of && and || (explained in this Stackoverflow post). Do you think you could rewrite the affected lines of code to remove the && symbols, likely replacing them with a single & and probably also making use of all(...).

Thank you!

replace `\dontrun` with `\donttest`

For any examples that can be run, replace \dontrun with \donttest. \dontrun should only be used in instances where the example really can't be run by the user, \donttest can handle the examples that take >5 seconds to run

Issue with lookup_estimates and mse

Hi @thegargiulian I have been working on figuring out why it takes so long to run mse for strata that should already have estimates. I had worried about the changes we made to the data for DANE or the code we were using in the examples, so I created a temporary branch in the CO-examples repository for all the regarding testing. This code summarizes what I have tried.

First, I downloaded the data directly from DANE and the estimates from the Commission site and used it for this tests.
Then, I used the same code we are using in the examples for the stratification and tested to the lookup_estimates function to see if it could find the estimates. It can. However, whenever I try to run the mse function it takes way too long, even though I know the estimates exist. So I created short_mse function, which is basically mse, except it doesn't run lcmcr if it doesn't find the estimates, just returns a tibble containing NA. When I run this, it takes 1, maybe 2 minutes, and returns the results of the estimation, not the NA tibble, so again, the estimates are there, and lookup_estimates can find them.

I can't figure out why mse takes so long. Is it possible that it is trying to run lcmcr even though the estimates exist? I checked the if_else in mse and it seems fine to me, so I don't know what the issue is.

Fix documentation titles

Additional function reference titles that should be in title case and not the name of the function.

title function
proportions_imputed proportions_imputed
proportions_observed proportions_observed
run_lcmcr run_lcmcr

README contains unnecessary instructions

verdata requiere el paquete LCMCR como dependencia. La instalación de LCMCR requiere la instalación del GNU Scientific Library. Es posible que necesite instalar esta librería en su computadora por separado antes de instalar verdata.

This tells the user to install LCMCR and GNU Scientific Library. This is not necessary as the package LCMCR is available on CRAN and is listed, correctly, as a package Import. This suggestion may cause confusion for new users.

Set up continuous integration

Could you set up continuous integration, for example with GitHub Actions, to check that the tests and infrastructure are working correctly? The package {usethis} provides quick functionalities to get started. See here.

This issue is with reference to the JOSS review here openjournals/joss-reviews#5844

Replicates not found

If replicates aren't in the folder where they should be, the function should alert the user that a replicate was not found.

`confirm_files` should return an object with the results of the confirmation test

From points raised in the beta test:

Currently confirm_files prints a message for the user indicating they have the right files/an error if the file is incorrect. It would be more helpful for the function to output a data frame with two columns: file_path indicating the path to the file being checked and result indicating whether the file has been verified.

It would be helpful for the function to crash and list which files are broken rather than just stopping at the first broken file it encounters.

Provide guidance on how to download data

This package requires data to be downloaded from a government website. Naturally, it is not clear or straight forward how to download the data.

It would be very helpful if there was a vignette or short guide on how to download this data. The link provided in the readme is the dataset description and not the path to the downloads themselves.

Alternatively, it would be quite nice if the package provided a utility to actually download the data itself. For example I was able to download the secuestro data like so:

# create a temporary directory
tmp <- tempdir()

# create path to save to
ver_path <- file.path(tmp, "verdata.zip")

# download the file
download.file(
  "https://microdatos.dane.gov.co/index.php/catalog/795/download/22642",
  ver_path
)

# unzip it
unzip(ver_path, exdir = file.path(tmp, "verdata"))

# read it 
verdata::read_replicates(
  file.path(tmp, "verdata", "Secuestro.csv"),
  "secuestro",
  1
)

Function documentation in Spanish

Ideally, we would be able to produce nicely formatted documentation in both Spanish and English for all aspects of the code and have that organized in a easy to access way for all users.

I've asked around the R-Ladies Slack for best practices for multilingual documentation and was suggested to take a look at this proof of concept: https://github.com/eliocamp/translated

Review test warnings

Currently test-estimate_mse.R, test-lookup_estimates.R, and test-read_replicates.R are giving testthat warnings

Provide examples as vignettes

The examples provided here in a separate repo (https://github.com/HRDAG/verdata-examples) should be properly linked to this repository. One way to do that would be through package vignettes [See the vignettes section of the R Packages book]. To take this even further, it might be better to set up a website for this repository where the vignettes can be accessed as Articles. See the website chapter of the R Packages book.

I see there is a related issue here HRDAG/verdata-examples#2 to convert the examples to a human-readable format. I will leave this to the authors' discretion.

This issue is with reference to the JOSS review here openjournals/joss-reviews#5844

Remove `options(warn=-1)`

You are setting options(warn=-1) in your function. This is not allowed. Please rather use suppressWarnings() if really needed.
-> R/estimate_mse.R; tests/testthat/test-combine_estimates.R

Argument naming consistency

Many functions refer to the same/similar inputs or arguments and there should be consistency in naming across all function files. In my original review of the documentation I noticed the use of violation and violacion as arguments, there may be other instances where argument names (and associated documentation) can be harmonized

Violation type checking

Many functions take a violation type as an input, but most do not do any sort of checking to ensure that the violation type matches one of the four acceptable options.

Move part of package description to README

In my opinion, the second sentence onwards might be better placed in the README as it seems to get lost in the package DESCRIPTION file.

Description: This package facilitates use and analysis of data about the armed conflict in Colombia resulting from the joint JEP-CEV-HRDAG project. The package has three main sets of features. First, researchers can use this package to verify that they are using unaltered versions of the data published by the Colombian National Administrative Department of Statistics. Second, they can use the package to replicate the main findings of the joint JEP-CEV-HRDAG project. Finally, they can use the package to design their own statistical analyses of patterns of violence that address the two forms of missing data present in the documented data.

This issue is in relation to the JOSS review here openjournals/joss-reviews#5844

Explicit checks of input structure

Not all functions provide explicit checks for correctness of inputs or helpful error messages if the inputs take incorrect forms. There should be explicit input checking for each function in verdata and associated testthat tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.