Coder Social home page Coder Social logo

khailper / gendercoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ropensci/gendercoder

0.0 1.0 0.0 265 KB

Creating R package to code free text gender responses

Home Page: https://ropenscilabs.github.io/gendercoder

License: Other

R 100.00%

gendercoder's Introduction

gendercodeR

The goal of gendercodeR is to allow simple recoding of freetext gender responses.

Why would we do this?

Researchers who collect self-reported demographic data from respondents occasionally collect gender using a free-text response option. This has the advantage of respecting the gender diversity of respondents without prompting users and potentially including misleading responses. However, this presents a challenge to researchers in that some inconsistencies in typography and spelling create a larger set of responses than would be required to fully capture the demographic characteristics of the sample.

For example, male participants may provide freetext responses as "male", "man", "mail", "mael". Non-binary participants may provide responses as "nonbinary", "enby", "non-binary", "non binary"

This package uses dictionaries of common misspellings to recode these freetext responses into a consistent set of responses.

Installation

This package is not on CRAN. To use this package please run the following code:

devtools::install_github("ropenscilabs/gendercodeR")
library(gendercodeR)

Example

You have a dataframe with the following format where individuals have provided gender in a range of inconsistent formats.

library(gendercodeR)
#> Welcome to the genderCodeR package
#> 
#> This package attempts to remove typos from free text gender data
#> The defaults that we used are specific to our context and your data may be
#> different. We offer two categorisations, board and narrow both are opinionated
#> about how gender descriptors collapse into categories as these are culturally
#> specific they may not be suitable for your data. In particularly the narrow
#> setting makes opinionated choices about some responses that we want to
#> acknowledge are potentially problematic.
#>       In particular,
#>         * In 'narrow' coding intersex responses are recoded as 'sex and gender
#>           diverse'
#>         * In 'narrow' responses where people indicate they are trans and
#>           indicate their identified gender are recoded as the identified gender
#>           (e.g. 'Male to Female' is recoded as Female). We wish to acknowledge
#>           that this may not reflect how some individuals would classify
#>           themselves when given these categories and in some contexts may make
#>           systematic errors. The broad coding dictionary attempts to avoid these
#>           issues as much as possible - however users can provide a custom
#>           dictionary to add to or overwrite our coding decisions if they feel
#>           this is more appropriate. We welcome people to update the inbuilt
#>           dictionary where desired responses are missing.
#>         * The 'broad' coding separates out those who identify as trans
#>           female/male or cis female/male into separate categories it should not
#>           be assumed that all people who describe as male/female are cis, if you
#>           are assessing trans status we recommend a two part question see:
#> 
#>           Bauer, Greta & Braimoh, Jessica & Scheim, Ayden & Dharma, Christoffer.
#>           (2017).
#>           Transgender-inclusive measures of sex/gender for population surveys:
#>           Mixed-methods evaluation and recommendations.
#>           PLoS ONE. 12.

df <- data.frame(stringsAsFactors=FALSE,
      gender = c("male", "MALE", "mle", "I am male", "femail", "female", "enby"),
         age = c(34L, 37L, 77L, 52L, 68L, 67L, 83L)
)

df
#>      gender age
#> 1      male  34
#> 2      MALE  37
#> 3       mle  77
#> 4 I am male  52
#> 5    femail  68
#> 6    female  67
#> 7      enby  83

Running the genderRecode() function will take the inputted dataset, match freetext gender responses to the dictionary, create a new column in the dataframe with the recoded gender response. For freetext gender responses that are not in the dictionary the original freetext will be copied to the recoded gender column.

genderRecoded <- genderRecode(input=df,
                              genderColName = "gender", 
                              method = "broad",
                              outputColName = "gender_recode", 
                              missingValuesObjectName = NA,
           customDictionary = NULL)
#> 
#> The following responses were not auto-recoded. The raw responses
#>         have been carried over to the recoded column 
#>  
#> # A tibble: 1 x 2
#> # Groups:   responses [1]
#>   responses     n
#>   <fct>     <int>
#> 1 i am male     1

genderRecoded
#>      gender age gender_recode
#> 1      male  34          male
#> 2      MALE  37          male
#> 3       mle  77          male
#> 4 I am male  52     i am male
#> 5    femail  68        female
#> 6    female  67        female
#> 7      enby  83    non-binary

Options within the function

method

The package provides the option to either correct spelling and standardise terms while maintaining the diversity of responses. This is selected by setting method = "broad" or to compress all responses down to male/female/'sex and gender diverse' using method = "narrow".

Example using narrow coding
genderRecoded <- genderRecode(input=df,
                              genderColName = "gender", 
                              method = "narrow",
                              outputColName = "gender2", 
                              missingValuesObjectName = NA,
           customDictionary = NULL)
#> 
#> The following responses were not auto-recoded. The raw responses
#>         have been carried over to the recoded column 
#>  
#> # A tibble: 1 x 2
#> # Groups:   responses [1]
#>   responses     n
#>   <fct>     <int>
#> 1 i am male     1

genderRecoded
#>      gender age                gender2
#> 1      male  34                   male
#> 2      MALE  37                   male
#> 3       mle  77                   male
#> 4 I am male  52              i am male
#> 5    femail  68                 female
#> 6    female  67                 female
#> 7      enby  83 sex and gender diverse

missingValuesObjectName

By default the unmatched responses and the number of times it exists in the input dataframe. Setting missingValuesObjectName saves the list of unmatched free text gender responses. We recommend assessing these as they may be human code-able (e.g. "I am male") or otherwise meaningful and not captured by our base dictionary. We expect that researchers may wish to manually recode such responses or add frequently occurring responses to their custom dictionary.

customDictionary

Users can specify a custom dictionary that supplements or overwrites entries to the in-built dictionary.

There is a vignette for this function

Viewing the inbuilt dictionary

The inbuilt dictionary can be assigned to an object using the getDictionary() function. This function does not take any arguments.

test <- getDictionary()
#> Parsed with column specification:
#> cols(
#>   entries = col_character(),
#>   broad = col_character(),
#>   narrow = col_character()
#> )
test
#> # A tibble: 56 x 3
#>    entries     broad              narrow                
#>    <chr>       <chr>              <chr>                 
#>  1 female      female             female                
#>  2 male        male               male                  
#>  3 androgynous androgynous        sex and gender diverse
#>  4 non-binary  non-binary         sex and gender diverse
#>  5 nonbinary   non-binary         sex and gender diverse
#>  6 non binary  non-binary         sex and gender diverse
#>  7 trans       transgender        sex and gender diverse
#>  8 trans man   transgender male   male                  
#>  9 trans woman transgender female female                
#> 10 transman    transgender male   male                  
#> # ... with 46 more rows

Contributing to this package

This package is a reflection of cultural context of the package contributors we welcome issues and pull requests to make the package more inclusive and/or suitable for a broader range of cultural contexts.

Acknowledgement of Country

We acknowledge the Wurundjeri people of the Kulin Nation as the custodians of the land on which this package was developed and pay respects to elders past, present and future.

gendercoder's People

Contributors

ekothe avatar fsingletonthorn avatar jlbeaudry avatar kylehamilton avatar kylehaynes avatar michaelweylandt avatar rhydwyn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.