strengejacke / sjlabelled Goto Github PK

Working with Labelled Data in R

Home Page: https://strengejacke.github.io/sjlabelled

R 100.00%

sjlabelled's Introduction

Hello there!

I'm a senior researcher working in the Institute of Medical Sociology (IMS) at the University Medical Center Hamburg-Eppendorf (UKE). I chair the working group on health and health care among the aged.

My research interests are

Health and health care among the aged
Family carers of older people
Organizational behaviour and health care research
Social inequalities in health care

Scientific and social media profiles

You can find further information on my website:

👉👉 www.danielluedecke.de 👈👈

including a list of publications.

Statistical skills

I'm co-chairing the working group "Research Methods" of the German Society of Medical Sociology (AG Methoden / DGMS) and do a lot R programming. Currently, I'm mostly involved in projects related to R statistics (through the easystats project or my other packages like ggeffects). However, my previous main activity in developing research software was the Zettelkasten, in Java.

Here are the main R-package projects I'm working on:

Data wrangling and preparation

datawizard 🧙 Magic potions to clean and transform your data

Statistics and regression modelling

parameters: 📊 Computation and processing of models' parameters
performance: 💪 Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
bayestestR: 👻 Utilities for analyzing Bayesian models and posterior distributions
effectsize: 🐉 Compute and work with indices of effect size and standardized parameters
correlation: 🔗 Methods for Correlation Analysis
modelbased: 📈 Estimate effects, contrasts and means based on statistical models
report: 📜 🎉 Automated reporting of objects in R
insight: 🔮 Easy access to model information for various model objects
sjstats: Effect size measures and significance tests

Data Visualization

ggeffects: Estimated marginal means, contrasts and pairwise comparisons and effects plots for regression models
see: 🎨 Visualisation toolbox for beautiful and publication-ready figures
sjPlot: Plots and tables for summary statistics, descriptive statistics and regression models

sjlabelled's People

Stargazers

Watchers

Forkers

guhjy mkim0710 gravitytrope onesandzeroes dataeducation daranzolin iago-contributedforks karthy257 martinctc sidsherborne vineetp6 davidhodge931

sjlabelled's Issues

Quasi-quotation for get_label function

Since var_labels and val_labels are supporting quasiquotation now, I was wondering whether there should also be support within the get_label function or a get_label equivalent that supports quasiquotation to use this function inside own funtions?

df %>% sjlabelled::get_label(!!var_name)

sjPlot::view_df: error while processing a data set imported via sjlabelled::read_spss

I double checked this issue with two data sets that worked before. The below message continue to appear:

"Error in sprintf("%i", range(x[[index]], na.rm = T)) : invalid format '%i'; use format %f, %e, %g or %a for numeric objects"

Please help. Thank you, Daniel.

Best,
Frank

In get_term_labels(), set attribute after call to convert_case()

Vignette does not show that sjmisc is loaded for `replace_na`

I see you use replace_na in your vignettes yet the function is not exported neither from haven or sjlabelled. I though this was related to tidyr::replace_na but there's not method for labelled classes. I searched through the repo and found this commit dd36a83#diff-b70fc554e2e21020786db05c1cd5ff58 which shows you might've deleted it perhaps.

Is there any way you can restore it? I'm actually very interested in using it.

label_to_colnames additional arguments for variable selection

Hi Daniel! Thanks for this awesome package. The label_to_colnames function is very handy for me. Would it be possible to additionally supply the function with the names of the variables for which you want to the conversion to happen? Something like:

Converts all variables:
sjlabelled::label_to_colnames(data)

Converts q1:
sjlabelled::label_to_colnames(data, q1)

Converts anything starting with q:
sjlabelled::label_to_colnames(data, vars(starts_with("q")))

expecting a sjlabelled cheat sheet

I recentlly found this https://raw.githubusercontent.com/rstudio/cheatsheets/master/labelled.pdf and wondered if sjlabelled could has its cheat sheet to facilitate my teaching. Thank you.

How can i swiitch values and labels ?

sjlabelled::read_spss: the argument enc does not work.

Hi, Daniel,

I am trying to import a survey data that was encoded with big-5.
sjlabelled::read_spss: the argument enc does not work.
Please help check this issue.

Thank you.
Best,
Frank Liu

get_term_labels() returns wrong level if lowest factor level is 0

library(strengejacke)
data(efc)
efc$e42dep <- recode_to(efc$e42dep)
efc$e42dep <- as.factor(efc$e42dep)
m <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
get_term_labels(m)
#>           e42dep          c161sex          e42dep1          e42dep2 
#>         "e42dep" "carer's gender"        "e42dep0"        "e42dep1" 
#>          e42dep3          e42dep4 
#>        "e42dep2"        "e42dep3"

^{Created on 2018-11-01 by the reprex package (v0.2.1)}

Retrieving original numeric values to make as_numeric(as_label(x)) consistent

Labels based on named vectors are very handy to convert numeric values reliably to factors using the correctly named factor levels. However, at the moment I do not find a way to retrieve the original numeric values of labels when trying to convert a labelled factor back to a numeric value.
I always thought the use.labels = TRUE option does this, but found out that I just misunderstood functionality.
Is there a way to create an exact reverse of the as_label() function, so that x and as_numeric(as_label(x)) are the same?

Example code:


library(sjlabelled)

#creating dataframe x and setting labels
x<-data.frame(a = c(0,1,0))

x$a <- set_labels(x$a, labels = c(`null` = 0, `one` = 1))

get_labels(x$a)

as_numeric(as_label(x$a))
x$a

#option use.labels fails in this example because labels are non-numeric
as_numeric(as_label(x$a), use.labels = TRUE)

new function val_labels not working

Hi Daniel, I have downloaded the current github version of sjlabelled 1.0.15.9000 and it seems like the new val_labels() function with quasi-quotation is not working yet. At least when I run your example from the new help file I receive the error message

Error in val_labels(., ":="(!!x1, c("really low", "low", "a bit mid", :
unused argument (dummy3 = !!x2)

Was I testing too early?
Will the final version also support unqoute-splicing for lists of arguments with the !!! operator?

Question regarding the sjmisc -> sjlabelled transition

Hi,
I just installed sjmisc, sjPlot and sjlabelled from CRAN and noticed that some functions still use label function from sjmisc, e.g. sjt.corr() (line 29).
The problem with this is that even when a new user properly loads sjlabelled, the message

This function will be removed in future versions of sjmisc and has been moved to package 'sjlabelled'. Please use sjlabelled::get_label() instead.This function will be removed in future versions of sjmisc and has been moved to package 'sjlabelled'. Please use sjlabelled::get_label() instead.

still appears. Is this intended? The behavior is a little confusing to me.

How to make sure that only the coefficient names - and not the terms - will be shown in the label

Hi,

I've plotted a plot_model picking specific terms from the plot. However, when I plot it, it shows the coefficients names preceded by the terms names as you can see below.

levels(b4$sexo2)
[1] "fem" "masc"
levels(b4$raca2)
[1] "branca" "parda" "preta" "amarela" "indigena"
levels(b4$escol2)
[1] "universitario" "2º grau" "< 2º grau"
levels(b4$rendacat2)
[1] "3" "2" "1"

I wanted only the coefficients names in the label, like in the document where you described how to plot estimates of regression models.

E.g. you shown how to keep only coefficients sex2, dep2 and dep3

plot_model(m1, terms = c("sex2", "dep2", "dep3"))

However, the plot axis show different labels from those specified. Instead of showing sex2, dep2 and dep3 it shows slightly dependent, moderately dependent and female.

remove_var_labels

Hi,
I think it would be nice if we have separate functions to remove variable (remove_var_label) and value labels (remove_val_label). I think it would be useful in constructing tables when the variable label is too large.

Thank you.

Copy value and variable labels to MERGED data frames

Hi,

I am having trouble migrating the variable labels from two smaller datasets to a merged form of the dataset. when i merge the data frames (technically i have them as tibbles) the variable lebels are dropped. How do i go about attaching the variable labels back on to the variables in the new, merged data frame?

Below is my session info as well as example of the code i am trying to run. I can't give any data for security reasons, but any help/suggestions/directions would be helpful as this issue will keep coming up.

carroll1 <- as_tibble(read.spss("../raw data/Carroll data/tsfdis.master11.26.2018.sav",
header = T, to.data.frame = T))
carroll2 <- as_tibble(read.spss("../raw data/Carroll data/tsfdis.humphreys.03.05.2019.sav",
header = T, to.data.frame = T))

test <- rbind(cbind(attr(carroll1, "variable.labels")),
cbind(attr(carroll2, "variable.labels")[-1]))

carroll <- copy_labels(carroll, test1)

both the following do not show that any variable labels have been made.
attributes(carroll)
attr(carroll,"variable.labels")

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sjlabelled_1.0.17 labelled_2.2.1 Hmisc_4.2-0 Formula_1.2-3 survival_2.44-1.1 lattice_0.20-38 sas7bdat_0.5 forcats_0.3.0
[9] stringr_1.3.1 dplyr_0.8.1 purrr_0.2.5 readr_1.3.0 tidyr_0.8.2 tibble_2.1.1 ggplot2_3.1.0 tidyverse_1.2.1
[17] tableone_0.10.0 arm_1.10-1 lme4_1.1-21 Matrix_1.2-14 MASS_7.3-51.4 expss_0.8.11 foreign_0.8-71 lavaan_0.6-3
[25] psych_1.8.10

loaded via a namespace (and not attached):
[1] nlme_3.1-137 matrixStats_0.54.0 lubridate_1.7.4 insight_0.3.0 RColorBrewer_1.1-2 httr_1.4.0 tools_3.5.1
[8] backports_1.1.4 utf8_1.1.4 R6_2.4.0 rpart_4.1-15 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1
[15] nnet_7.3-12 withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3 mnormt_1.5-5 compiler_3.5.1 cli_1.1.0
[22] rvest_0.3.2 htmlTable_1.13.1 xml2_1.2.0 scales_1.0.0 checkmate_1.9.3 digest_0.6.18 pbivnorm_0.6.0
[29] minqa_1.2.4 base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6 htmlwidgets_1.3 rlang_0.3.4 readxl_1.3.1
[36] rstudioapi_0.10 generics_0.0.2 jsonlite_1.6 acepack_1.4.1 magrittr_1.5 Rcpp_1.0.1 munsell_0.5.0
[43] fansi_0.4.0 abind_1.4-5 stringi_1.4.3 yaml_2.2.0 plyr_1.8.4 grid_3.5.1 parallel_3.5.1
[50] crayon_1.3.4 haven_2.1.0 splines_3.5.1 hms_0.4.2 zeallot_0.1.0 knitr_1.22 pillar_1.4.0
[57] boot_1.3-22 stats4_3.5.1 glue_1.3.1 mitools_2.4 latticeExtra_0.6-28 data.table_1.12.2 modelr_0.1.4
[64] vctrs_0.1.0 nloptr_1.2.1 cellranger_1.1.0 gtable_0.2.0 assertthat_0.2.1 xfun_0.7 broom_0.5.2
[71] survey_3.36 coda_0.19-2 cluster_2.0.9

Using set_labels function in a for-loop

I have spend some time trying to figure out, how to use the sjlabelled package in loops, so that I can automatically apply labels from a codeplan (e.g. list of variable and value labels defined in an excel file) to an R dataframe. The problem so far is that I cannot manage to use the set_label and set_labels functions in for loops or in R functions.

As an example, lets assume we would like to label variables 1,2, and 3 for a dataframe (columns 1,2 and 3) and formats are given in a single vector called formats$allvallabels (e.g. "1=Male, 2=Female, 9=Unknown".

for(i in 1:3) {
  if (!is.na(formats$`Value labels`[i]) & formats$`Value labels`[i]=="yes") {
    data[,i] <- set_labels(data[,i], labels=eval(parse(text = paste("c(",formats$allvallabels[i], ")"))))
  }
}

does not work. There is no error message, but also the dataframe "data" remains unchanged.

However, using the exact same code and explicitly defining i=1, i=2, i=3 to loop through the variables (i.e. lines of the formatting/codeplan tabel) works:

i <- 1
if (!is.na(formats$`Value labels`[i]) & formats$`Value labels`[i]=="yes") {
  data[,i] <- set_labels(data[,i], labels=eval(parse(text = paste("c(",formats$allvallabels[i], ")"))))}
i <- 2
if (!is.na(formats$`Value labels`[i]) & formats$`Value labels`[i]=="yes") {
  data[,i] <- set_labels(data[,i], labels=eval(parse(text = paste("c(",formats$allvallabels[i], ")"))))}
i <- 3
if (!is.na(formats$`Value labels`[i]) & formats$`Value labels`[i]=="yes") {
  data[,i] <- set_labels(data[,i], labels=eval(parse(text = paste("c(",formats$allvallabels[i], ")"))))}

Do you have any explanation or workaround for this?

Labelling interaction terms in `tab_model`

It would be great if tab_model (and other functions) could apply variable labels to interactions, as in the below example:

library(sjPlot)
library(sjlabelled)
library(dplyr)

data(iris) 

iris = iris %>%
    mutate(Species = as.factor(Species)) %>%
    var_labels(Petal.Length = "Petal length",
               Petal.Width = "Petal width",
               Species = "Species of flower")
    
m1 = lm(Petal.Length ~ Petal.Width * Species, data = iris)

tab_model(m1, prefix.labels = "label")

I'd like to have a go at a pull request for this, but I'm a bit unsure where to start. Should insight be providing information about interaction terms? Or should this be done "post-hoc" in sjlabelled::get_term_labels? It seems like if insight was providing this info in a robust way, we could do something smarter than just splitting the coefficient names by ":" and trying to match them up to terms.

Several issues with sjlabelled (missing value labels) and other sjverse packages

Hi Daniel,

while using the sjverse packages for teaching data analysis this term, we noticed some issues with current versions, starting with missing values labels after reading in a Stata dataset. I included all issues in a markdownfile, which is available here.

As for the issue specific related to sjlabelled, it seems as if value labels get lost somehow, although they definitely exist in the original Stata dataset:

library(tidyverse)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(sjstats)
setwd('C:/Dropbox/lehre/Methoden der politischen Soziologie/0 Daten/GLES Vorwahl')

d <- read_stata("GLES_Vorwahlquerschnitt_ZA5700_v1-0-0.dta")

frq(d$q11bb)

# Beabsichtigte Stimmabgabe: Zweitstimme (Version B) (x) <numeric>
# total N=2001  valid N=2001  mean=-18.09  sd=70.99

  val frq raw.prc valid.prc cum.prc
  -99 109    5.45      5.45    5.45
  -98 176    8.80      8.80   14.24
  -97 313   15.64     15.64   29.89
  -83  12    0.60      0.60   30.48
    1 556   27.79     27.79   58.27
    4 376   18.79     18.79   77.06
    5  79    3.95      3.95   81.01
    6 161    8.05      8.05   89.06
    7 144    7.20      7.20   96.25
  171   2    0.10      0.10   96.35
  180   4    0.20      0.20   96.55
  206   5    0.25      0.25   96.80
  209   3    0.15      0.15   96.95
  215  31    1.55      1.55   98.50
  225   1    0.05      0.05   98.55
  237   2    0.10      0.10   98.65
  322  27    1.35      1.35  100.00
 <NA>   0    0.00        NA      NA

The dataset for reproduction and a sessionInfo() output are available in the rmarkdown file linked above.

Suggestion for using formats as an attribute

I am a 45+ year SAS user (yes -- I really am that old), and I love the idea that I can attach formats to variables one and for all, and then use these formats in printing without giving the issue much thought. I have functions to do that.

I would like to be able to use var_labels() and related functions to aid in assigning the functions. One option is to merely edit the function code, and replace the "label" with "format" in the appropriate place -- or whatever other attribute I want to set. But I would be much happier if I could simply change a parameter in the function call.

I suggest that var_labels() and related functions be modified to have the attribute passed as a parameter, with "label" as the default. If that can be pulled off existing users should not be impacted.

Remove defunct functions

Duplicate value labels in `tab_stackfreq` when using `sjlabelled::as_label(keep.labels = TRUE)`

tab_stackfreq is really nice for displaying rated data.

However, I found a problem when using this with factor columns which have retained their labels. When using tab_stackfreq in this case, the value labels appear twice in the columns of the table.

For example, this works fine initially.

library(sjlabelled)
library(sjPlot)

likert_4 <- data.frame(
  q1 = sample(1:4, 500, replace = TRUE, prob = c(0.2, 0.3, 0.1, 0.4)),
  q2 = sample(1:4, 500, replace = TRUE, prob = c(0.5, 0.25, 0.15, 0.1)),
  q3 = sample(1:4, 500, replace = TRUE, prob = c(0.25, 0.1, 0.4, 0.25))
)
labs <- c("Independent" = 1, "Slightly dependent" = 2,
          "Dependent" = 3, "Severely dependent" = 4)

likert_4$q1 <- sjlabelled::add_labels(likert_4$q1, labels = labs)
likert_4$q2 <- sjlabelled::add_labels(likert_4$q2, labels = labs)
likert_4$q3 <- sjlabelled::add_labels(likert_4$q3, labels = labs)

sjPlot::tab_stackfrq(items = likert_4)
# 	Independent	Slightly dependent	Dependent	Severely dependent
# q1	20.20 %	31.00 %	9.80 %	39.00 %
# q2	49.40 %	25.20 %	17.00 %	8.40 %
# q3	29.40 %	9.20 %	35.00 %	26.40 %

But if the labelled numeric columns are then converted to factor while retaining the labels, then the table gives duplicates for each value with zero frequencies for the duplicates.

likert_4$q1 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q2 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)
likert_4$q3 <- sjlabelled::as_label(likert_4$q1, keep.labels = TRUE)

sjPlot::tab_stackfrq(items = likert_4)
#  	Independent	Slightly dependent	Dependent	Severely dependent	Dependent	Independent	Severely dependent	Slightly dependent
# q1	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %
# q2	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %
# q3	20.20 %	31.00 %	9.80 %	39.00 %	0.00 %	0.00 %	0.00 %	0.00 %

I often find it important to keep the labels to allow me to convert back to numeric in the same way.

The problem seems to originate from the call to sjlabelled::get_labels() here which returns the duplicates i.e.

sjlabelled::get_labels(
  likert_4$q1,
  attr.only = F,
  values = "n",
  non.labelled = T
)
#                    1                    2                    3                    4            Dependent          Independent   Severely dependent   Slightly dependent 
#        "Independent" "Slightly dependent"          "Dependent" "Severely dependent"          "Dependent"        "Independent" "Severely dependent" "Slightly dependent"

Maybe there could be a separate case here for factors with labels? If non.labelled = T then it returns the correct labels, but I guess this change may affect other cases?
It would be great if tab_stackfreq could work with labelled factor columns as it's really useful to keep the labels when you are often switching between numeric and factor.
Many thanks!

sjlabelled::write_spss(): Codes change from R to SPSS

Hi,
thanks for your work on sjlabelled!

I just noticed that saving a data frame with labelled data via write_spss() changes the codes of the variables such that the lowest numeric code (in my case -9 for a certain kind of missing) becomes 1 and all the other codes are sequentially ordered -> 1, 2, 3, 4, 5, 6, 7 when the codes were -9, -8, 0, 1, 2, 3, 4 before. I did not define any codes as missing.

Is that the default behavior of write_spss()? Is it possible to keep the codes the same when exporting labelled data from R to SPSS? I can send a reprex later if necessary/helpful.

Thanks and all the best, G

`as_label` does not work for a variable with on NAs

For instance

a <- c(NA)
set_labels(a, labels = c(a = 1))

results in

Warning messages:
1: In min(x, na.rm = TRUE) :
  no non-missing arguments to min; returning I
nf
2: In max(x, na.rm = TRUE) :
  no non-missing arguments to max; returning -
Inf
3: In set_labels_helper(x = .dat, labels = lab
els, force.labels = force.labels,  :
  Can't set value labels for "x". Infinite val
ue range.

R session aborted when trying to import a .dta file

The R session is aborted in RStudio when I try to read a .dta file using read_stata(). I am using R version 4.1.0 and sjlabelled version 1.1.8 with Windows 10 as OS. The file has been generated with Stata version 15. This Stackoverflow post suggests that this may be related to changes in ReadStat which haven makes use of (I have also commented on a similar issue that has been opened for haven).

as_labeld(data,drop.na=FALSE) changed from version 1.1.3 to version 1.1.6, intended?

Thanks for the nice package and sorry, for the email we have written before. We just figured out that the correct way to report bugs is to write you a message here. Not sure, if the following thing we found is really a bug or if it was intended:

In the 1.1.3 version sjlabelled package, the following argument "as_label(data, drop.na=FALSE)" was keeping user-defined missing labels in the data-set and converting the values of variables to their labels. However, this is changed in the current version, which 1.1.6. With the last version, "drop.na=FALSE" deletes the user-defined missing labels and merges them into system-missings. That is why we had to change our code to drop.na=TRUE so we keep the user-defined missing labels in our dataset, when converting the values to their associated labels. Could you please let us know if this was intended?

Best wishes,
Ann-Kristin (and Team)

write_stata drops negative value labels

Hi,

writing a Stata dataframe replaces negative value labels:

library(sjmisc)
library(tidyverse)
library(sjlabelled)
data(efc)

efc <- efc %>%
  select(c160age, e17age) %>%
  rec(rec = "15:30=-99 [young]; 31:55=-98 [middle]; 56:max=3 [old]")

frq(efc$c160age_r)

# carer' age (x) <numeric> 
# total N=908  valid N=901  mean=-51.98  sd=50.38
 
 val  label frq raw.prc valid.prc cum.prc
 -99  young  48    5.29      5.33    5.33
 -98 middle 442   48.68     49.06   54.38
   3    old 411   45.26     45.62  100.00
  NA     NA   7    0.77        NA      NA


tmp <- tempdir()
write_stata(efc, paste0(tmp,"/df.dta"))
efc2 <- read_stata(paste0(tmp,"/df.dta"))

frq(efc2$c160age_r)

# carer' age (x) <numeric> 
# total N=908  valid N=901  mean=2.40  sd=0.59
 
 val  label frq raw.prc valid.prc cum.prc
   1  young  48    5.29      5.33    5.33
   2 middle 442   48.68     49.06   54.38
   3    old 411   45.26     45.62  100.00
  NA     NA   7    0.77        NA      NA

I'm note sure whether this is related to tidyverse/haven#367, because for that issue, the Stata file seems to be correct. I checked the file produced with write_stata and it already contains the incorrect labels.

sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.3

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2       sjstats_0.14.2-4   sjlabelled_1.0.10  sjmisc_2.7.1.9000 
 [5] sjPlot_2.4.1.9000  forcats_0.3.0      stringr_1.3.0      dplyr_0.7.4       
 [9] purrr_0.2.4        readr_1.1.1        tidyr_0.8.0        tibble_1.4.2      
[13] ggplot2_2.2.1.9000 tidyverse_1.2.1   

loaded via a namespace (and not attached):
  [1] TH.data_1.0-8       minqa_1.2.4         colorspace_1.3-2    modeltools_0.2-21  
  [5] ggridges_0.5.0      rsconnect_0.8.5     rprojroot_1.3-2     htmlTable_1.11.2   
  [9] estimability_1.3    snakecase_0.9.1     base64enc_0.1-3     rstudioapi_0.7     
 [13] glmmTMB_0.2.0       mvtnorm_1.0-7       lubridate_1.7.2     coin_1.2-2         
 [17] xml2_1.2.0          codetools_0.2-15    splines_3.4.4       mnormt_1.5-5       
 [21] knitr_1.20.2        effects_4.0-1       bayesplot_1.5.0     Formula_1.2-2      
 [25] jsonlite_1.5        nloptr_1.0.4        ggeffects_0.3.2     broom_0.4.4        
 [29] cluster_2.0.6       compiler_3.4.4      httr_1.3.1          emmeans_1.1.3      
 [33] backports_1.1.2     assertthat_0.2.0    Matrix_1.2-12       lazyeval_0.2.1     
 [37] survey_3.33-2       cli_1.0.0           acepack_1.4.1       htmltools_0.3.6    
 [41] tools_3.4.4         coda_0.19-1         gtable_0.2.0        glue_1.2.0         
 [45] reshape2_1.4.3      Rcpp_0.12.16        carData_3.0-1       cellranger_1.1.0   
 [49] nlme_3.1-131        psych_1.8.3.3       lmtest_0.9-36       lme4_1.1-17        
 [53] rvest_0.3.2         devtools_1.13.5     stringdist_0.9.4.7  MASS_7.3-48        
 [57] zoo_1.8-1           scales_0.5.0.9000   hms_0.4.2           parallel_3.4.4     
 [61] sandwich_2.4-0      pwr_1.2-2           TMB_1.7.12          RColorBrewer_1.1-2 
 [65] yaml_2.1.16         curl_3.1            gridExtra_2.3       memoise_1.1.0      
 [69] rpart_4.1-12        latticeExtra_0.6-28 stringi_1.1.7       highr_0.6          
 [73] checkmate_1.8.5     rlang_0.2.0.9001    pkgconfig_2.0.1     arm_1.10-1         
 [77] evaluate_0.10.1     lattice_0.20-35     prediction_0.3.2    bindr_0.1          
 [81] htmlwidgets_1.0     tidyselect_0.2.4    plyr_1.8.4          magrittr_1.5       
 [85] R6_2.2.2            Hmisc_4.1-1         multcomp_1.4-8      pillar_1.2.2       
 [89] haven_1.1.0         foreign_0.8-69      withr_2.1.2         survival_2.41-3    
 [93] abind_1.4-5         nnet_7.3-12         modelr_0.1.1        crayon_1.3.4       
 [97] rmarkdown_1.8       grid_3.4.4          readxl_1.0.0        data.table_1.10.4-3
[101] git2r_0.21.0        digest_0.6.15       xtable_1.8-2        stats4_3.4.4       
[105] munsell_0.4.3

add `ordered` option to `as_label()`

Thanks for all your work on sjlabelled. Of all the options to working with labelled data in R, I like your approach the most as it enables me to keep the labels no matter what I do with the data in between.

Depending on what I am doing at the moment, I typically like to switch in between the numeric/value representation as an numeric vector and a factor vector with the labels as factor levels. I can do this using

 vector_factor <- sjlabelled::as_label(vector_numeric, keep.labels  = TRUE)
 vector_numeric <- sjlabelled::as_numeric(vector_factor, use.labels = TRUE)

But when I need the factor to be ordered, I have to use ordered() on the factor vector and lose the labels thus cannot convert back to numeric. Would you consider adding a ordered option to as_label() that would return an ordered factor with pertaining the labels and being easily converted back to numeric using as_numeric().

Thanks so much!

(Edit: Realized that the other functionality I was asking for is already possible with the argument prefix. Sorry for not reading the documentation more thoroughly!)

as_labelled missing additional classes used by haven

Issue

Rstudio prints labelled data nicely in a format that's easier to read. Using as_labelled to convert factors to labelled data messes with this formatting.

packageVersion("haven")
#> [1] '2.3.1'
packageVersion("sjlabelled")
#> [1] '1.1.7'


# SAV data read with the latest version of haven.
x <- structure(
  c(2, 1, 2, 2, 1, 1), 
  label = "What is your gender?", 
  format.spss = "F1.0", 
  display_width = 1L, 
  labels = c(Male = 1, Female = 2), 
  class = c("haven_labelled", "vctrs_vctr", "double")
)

RStudio prints labelled data differently than a regular r console.

# <labelled<double>[6]>: What is your gender?
#  [1] 2 1 2 2 1 1
#
# Labels:
#  value  label
# 1   Male
# 2 Female

The print formatting is lost when using as_labelled.

y <- structure(
  c(1L, 1L, 1L, 1L, 1L, 1L), 
  .Label = c("setosa", "versicolor", "virginica"), 
  class = "factor"
)
y <- sjlabelled::as_labelled(y)

Prints like it does in a regular R console.

# [1] 1 1 1 1 1 1
# attr(,"labels")
# setosa versicolor  virginica 
# 1          2          3 
# attr(,"class")
# [1] "haven_labelled"

Solution

as_labelled should add two additional classes.

class(sjlabelled::as_labelled(y))
#> [1] "haven_labelled"
class(x)
#> [1] "haven_labelled" "vctrs_vctr"     "double"

class(y) <- c("haven_labelled", "vctrs_vctr", "double")

# Displays correctly in the Rstudio Terminal.
# <labelled<double>[6]>
#   [1] 1 1 1 1 1 1

# Labels:
#   value      label
# 1     setosa
# 2 versicolor
# 3  virginica

Not a huge issue, but a small cosmetic thing that’s easily fixed. I haven't noticed any problems with the latest version of haven so I assume this wouldn't break anything. I’ll submit a PR and you can decide what to do about it.

Thanks!

^{Created on 2020-10-01 by the reprex package (v0.3.0)}

restore support for quasi-quotation in set_labels

set_labels no longer appears to offer support for quasi-quotation.

the argument ref.lvl in sjmisc::to_factor() not working

Hi, Daniel, I recently found that the argument is not effective any more and seems not recognizing labels.
Example data to download here

load(kao06)
library(sjmisc)
kao06$partyID <- rec(kao06$L2B,
rec="98:hi=NA; else=copy",
val.labels = c("partyA","partyB","partyC",
"partyD", "partyE", "others"),
as.num=F)

not working

kao06$partyID <- to_factor(kao06$partyID, ref.lvl="partyB")
kao06.mod.2 <- glm(turnout ~ partyID, family=binomial, data=kao06)
summary(kao06.mod.2)

not working

kao06$partyID <- to_factor(kao06$partyID, ref.lvl="partyA")
summary(kao06.mod.2)

not working

kao06$partyID <- to_factor(kao06$partyID, ref.lvl=2)
summary(kao06.mod.2)

this traditional method works:

kao06$partyID <- ref_lvl(kao06$partyID, lvl=2)
summary(kao06.mod.2)

write_spss not working with integer vectors

Hi, I noticed a problem when we label integer vectors and export data with write_spss.

labelling numeric vectors

library(rio)
library(sjlabelled)
library(tidyverse)

data <- mtcars %>% 
  as_tibble() %>%
  set_labels(vs:carb, labels = c(zero=0, one=1, two=2, three=3, four=4))
data %>% glimpse()
#> Rows: 32
#> Columns: 11
#> $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
#> $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
#> $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
#> $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
#> $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
#> $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

data %>% write_spss("car.sav")
#> Tidying value labels. Please wait...
#> Writing spss file to 'car.sav'. Please wait...

Everything seems fine.

labelling integer vectors

However, when we set labels to integer vectors:

data <- mtcars %>% 
  as_tibble() %>%
  mutate(across(vs:carb, as.integer)) %>% 
  set_labels(vs:carb, labels = c(zero=0, one=1, two=2, three=3, four=4))
data %>% write_spss("car.sav")
#> Tidying value labels. Please wait...
#> Writing spss file to 'car.sav'. Please wait...
#> Error: Invalid input type, expected 'integer' actual 'double'

^{Created on 2021-06-10 by the reprex package (v2.0.0)}

Copy variable labels from dataframes with different number of columns

It would be nice if copy_labels could copy labels of variable with the same names from another dataframe, irrespectively of having the same number of columns.

Thank you.

write_spss error message when there are value labels

Dear Daniel Lüdecke,
thanks for your package, that helps when working with R and SPSS in one team.
After an update to R version 4.0.1 (2020-06-06) there is an issue in a special case.
I get an error for write_spss if there are value labels. Without value labels everything is still fine.

By the way, there is a similar issue with the rio-package. Maybe this will help to narrow down the problem.

Kind regards
Thomas

#MWE
#Errormessage exporting SAV when there are value labels

rm(list=ls())

variablename <- as.numeric(c(1:3))

test <- data.frame(variablename)

attr(test$variablename, "label") <- "Variablelabel"

with this attribute there is an error

attr(test$variablename, "labels") <- c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)

sjlabelled::write_spss(test, "test.sav")

without this attribute there is no error

attr(test$variablename, "labels") <- NULL

sjlabelled::write_spss(test, "test.sav")

sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tidyr_1.1.0 dplyr_1.0.0 stringr_1.4.0 sjlabelled_1.1.5

loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.14 purrr_0.3.4
[4] pander_0.6.3 lattice_0.20-41 haven_2.3.1
[7] tcltk_4.0.1 vctrs_0.3.1 summarytools_0.9.6
[10] generics_0.0.2 htmltools_0.4.0 yaml_2.2.1
[13] base64enc_0.1-3 rlang_0.4.6 pillar_1.4.4
[16] foreign_0.8-80 glue_1.4.1 pryr_0.1.4
[19] readxl_1.3.1 matrixStats_0.56.0 lifecycle_0.2.0
[22] plyr_1.8.6 sjmisc_2.8.5 cellranger_1.1.0
[25] zip_2.0.4 codetools_0.2-16 psych_1.9.12.31
[28] knitr_1.28 rio_0.5.16 forcats_0.5.0
[31] curl_4.3 parallel_4.0.1 Rcpp_1.0.4.6
[34] readr_1.3.1 backports_1.1.7 checkmate_2.0.0
[37] magick_2.3 tmvnsim_1.0-2 rapportools_1.0
[40] mnormt_2.0.0 hms_0.5.3 digest_0.6.25
[43] stringi_1.4.6 openxlsx_4.1.5 insight_0.8.5
[46] grid_4.0.1 tools_4.0.1 magrittr_1.5
[49] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3
[52] ellipsis_0.3.1 data.table_1.12.8 lubridate_1.7.9
[55] rstudioapi_0.11 R6_2.4.1 nlme_3.1-148
[58] compiler_4.0.1

Error : object 'get_note' is not exported by 'namespace:sjlabelled'

I have got this error when installing package sjPLot_2.4.1 or older versions:

Error : object 'get_note' is not exported by 'namespace:sjlabelled'
ERROR: lazy loading failed for package 'sjPlot'

I have installed other versions of sjlabelled but it does not work. I know there a new release for sjPlot, but I really need to use an older version.
Thanks in advance.

var_labels attaching labels to wrong variables

Hi, i noticed that when we try to set a variable name to a non-existing variable,
var_labels continue and set a label to wrong variables from that point on. If it is possible, it should keep the advice, but ignore the non-exstent one.
Thank you!

library(dplyr); library(sjlabelled)

# seting variable names (ok)
mtcars <- mtcars %>% as_tibble() 

mtcars %>%
  var_labels(mpg="mpg test", cyl="cyl test", disp="disp test", hp="hp test", drat="drat test",
             wt="wt test", qsec="qsec test", vs="vs test", am="am test", gear="gear test", 
             carb="carb test") %>%
  get_label()
#>         mpg         cyl        disp          hp        drat          wt 
#>  "mpg test"  "cyl test" "disp test"   "hp test" "drat test"   "wt test" 
#>        qsec          vs          am        gear        carb 
#> "qsec test"   "vs test"   "am test" "gear test" "carb test"

# seting variable name to a non-existent variable (nonexist)
mtcars %>%
  as_tibble() %>%
  var_labels(mpg="mpg test", cyl="cyl test", nonexist="NON EXISTENT", disp="disp test", hp="hp test", drat="drat test",
             wt="wt test", qsec="qsec test", vs="vs test", am="am test", gear="gear test", 
             carb="carb test") %>%
  get_label()
#> Warning: Following elements are no valid column names in `x`: nonexist
#>            mpg            cyl           disp             hp           drat 
#>     "mpg test"     "cyl test" "NON EXISTENT"    "disp test"      "hp test" 
#>             wt           qsec             vs             am           gear 
#>    "drat test"      "wt test"    "qsec test"      "vs test"      "am test" 
#>           carb 
#>    "gear test"

Created on 2018-08-25 by the reprex package (v0.2.0).

as_label() name collision with dplyr

Hi
Thanks for this nice package !
More an information than an issue :

It seems that a recent (?) update in dplyr introduced a function named as_label()
If we use both packages we have to prefix the function with the package name.

Therefore some code in the vignette has failed :
https://cran.r-project.org/web/packages/sjlabelled/vignettes/labelleddata.html
e. g. in § Adding value labels as factor values

  table(as_label(efc$e42dep),
        as_label(efc$e16sex)), 
  beside = T, 
  legend.text = T
)

But the same in
https://strengejacke.github.io/sjlabelled/articles/labelleddata.html is still fine

set_labells() has a label order issue

As shown in the practice, the label "22" in the example code cannot stay in the right place, compared to the result using ordered(labels=).

example data file here-- https://jumpshare.com/v/9P1DetXkvdpuYI9vnAkw+/TEDS2018.rda
example code here-- https://jumpshare.com/v/SRaNm9MfFLY9xvv5MyiH+/issues02.rmd

suggestion: set the read_spss() argument verbose from T to F

In a knitted Rmd file, the progress bar will be printed into the output file as the argument was set to T by default. I think this is not necessary to turn it on by default as it will make the document too long with the progressing messages.

Please also consider to apply this verbose=F to sjPlot::view_df(). Thank you.

add support for tidy select helpers in set_labels

set_labels appears to only offer limited support for tidy select helpers. While 'çontains' and 'starts_with', work, others such as 'any_of', 'where' and 'matches' are not recognised.

set_label does not overwrite variable label

Hi,

trying to use set_label in the example below does not override the variable label:

d <- read_stata('GLES_Vorwahlquerschnitt_ZA5700_v1-0-0.dta')
get_label(d$q119a)

"Parteiidentifikation (Version A)"


cdu <- rec(d$q119a, '1:3=1; else=0')
staerke <- rec(d$q120, '1=5; 2=4; 3=3; 4=2; 5=1; -97=0')
d$cdu_pid <- cdu * staerke %>% set_labels(
  c('keine PID' = 0, 'starke PID' = 5))  %>%  set_label('PID CDU/CSU')
get_label(d$cdu_pid)

"Parteiidentifikation (Version A)"

Rename a dataframe using labels

When working with functions/packages that don't support labels, it can be useful to set a dataframe's column names using the variable labels. E.g. the default plot() method for dataframes doesn't support dataframes, it will only show the column names by default.

I've written a rename_from_labels function that applies the variable labels to the dataframe column names, happy to work on a PR (including tests and documentation) if you think this would be useful and fit in the package.

Example:

library(sjlabelled)
library(dplyr)

rename_from_labels <- function(x) {
    labs <- sjlabelled::get_label(x)
    # Ignore variables without labels
    labs <- labs[labs != ""]
    new_names <- unname(labs)
    old_names <- names(labs)
    replace_vec <- setNames(old_names, new_names)
    dplyr::rename(x, !!! replace_vec)
}

data(iris)

iris <- iris %>%
    var_labels(
        Petal.Length = "Petal length (cm)",
        Petal.Width = "Petal width (cm)"
    )

# Show the variable names in the plot
iris %>%
    rename_from_labels() %>%
    plot()

^{Created on 2020-01-14 by the reprex package (v0.3.0)}

remove_labels does not work when factor levels are strings

When I try to remove an sj-label with sjlabelled::remove_labels from a factor whose levels are not numbers, but strings, all labels are deleted with a warning message (something like NA's introduced by coercion).

Is var_labels fully tidyverse and NSE compatible?

Hi Daniel,
I am using the sjlabelled package in a tidyverse environment. I am still dreaming of being able to use your package in a way that I can easily define labels in a separate dataframe and then write a function to apply these labels to my real dataset. On the way there I tried your new var_labels function which you flagged to use in a pipe-workflow. I have then tried the following simple procedure borrowed from the last paragraph in this dplyr vignette https://dplyr.tidyverse.org/articles/programming.html#quasiquotation that reads "Setting variable names". Do you think the following code should work with your package?

testdf <- data.frame(PSEUDOPATID = 1:5, x = "zero", y=2)

varname <- "PSEUDOPATID"
varlabel <- "Patient ID"

#first test - trying to assign labels from the two values
labelled_test <- testdf %>%
  var_labels(!!varname := varlabel)

#second test - trying to assign labels RHS only
labelled_test <- testdf %>%
  var_labels(PSEUDOPATID = varlabel)

#third test - trying to assign labels LHS only
labelled_test <- testdf %>%
  var_labels(!!varname := "Patient ID LHS unquote only")

#works manually
labelled_test <- testdf %>%
  var_labels(PSEUDOPATID = "Patient ID manual")

I am not 100% sure if all my examples are correct, but I would assume that if possible with your function, at least the third test example would work.

`get_label` don't output column name if it doesn't have a label

library(sjlabelled)

data(efc)
efc_small <- efc[1:2]

efc_label <- get_label(efc_small)

# Names for every column: Each one have a label
names(efc_label)
#> [1] "c12hour"  "e15relat"


set_label(efc_small[2]) <- ''

efc_label <- get_label(efc_small)

# Name for the first column: Second column doesn't have a label
names(efc_label) # I expect the same output that in line 9
#> [1] "c12hour" ""

lbl_df() incompatible with dev version of tibble

I'm seeing check failures with sjlabelled: https://github.com/tidyverse/tibble/blob/e17970597fe63132c42229c22a6c06d653637351/revdep/problems.md#sjlabelled.

tibble is now using pillar to format the columns, this changes the internals of the return value for trunc_mat(). It seems that print.lbl_df() is relying on these internals.

I'd like to suggest that we work on a better interface in pillar to integrate additional header information in r-lib/pillar#49. So far I was lacking a use case, but labeled data frames seem like a good fit.

`write_spss` Error: SPSS only supports levels with <= 120 characters

Hello there,
I can't find a way around an issue where string variables imported from SPSS via
df <- read_spss(path = "fin.sav", verbose = F, atomic.to.fac = F)

get exported out as strings back to SPSS file using:

write_spss(df, "df.sav")

Even thou I don't touch string variables while cleaning data in R, there is always this problem:

$ Q56.0                 <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ MTurk_Code            <chr> "2178656", "466017", "4774295", "2489282", "8603893", "7867333", "3413926", "6734111", "7472350", "…
$ batch                 <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ CyberCond             <chr> "", "", "", "", "Exclusion", "", "", "", "", "", "", "", "", "Inclusion", "", "", "Exclusion", "", …
$ VAR00001              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, …
> write_spss(df, "df.sav")
Tidying value labels. Please wait...
Writing spss file to 'df.sav'. Please wait...
Error: SPSS only supports levels with <= 120 characters
Problems: `name`, `Q57`, `Q56.0`

The only way around I know of is using haven::write_sav() but then I loose all labels.

different behavior for character vs. numeric data, when using named vector for labels argument of set_labels()?

Using 1.1.4.

The documentation for set_labels suggests that the format for a named vector being passed to the labels argument is:

c([desiredlabel1] = [datavalue1], ..., [desiredlabel{n}] = [datavalue{n}])

e.g. from Examples to set_labels():

# assign labels with named vector
dummy <- sample(1:4, 40, replace = TRUE)
dummy <- set_labels(dummy, labels = c("very low" = 1, "very high" = 4))

This implies that the actual data values to be labelled are the elements of the labels, and the labels to be applied are the names of those elements. However, in practice, this gets reversed when the thing being labeled is a character vector.

See below:

# numeric version
numvec <- 
  set_labels(1:4, 
  labels = c(a = 1, b = 2, c = 3, d = 4)
  )

numvec
[1] 1 2 3 4
attr(,"labels")
   a b c d   <-- labels are from the names attributes of vector passed to "labels"
   1 2 3 4   <-- data values that are labelled come from elements vector passed to "labels"

get_labels(numvec)

# character version
charvec <- 
     set_labels(c("one", "two", "three", "four"), 
     labels = c(a = "one", b = "two", c = "three", d = "four")
     )

charvec
[1] "one" "two" "three" "four"
attr(,"labels")
   one  two  three  four    <-- here, labels come from the *elements* of vector passed to "labels"
   "a" "b" "c" "d"              <-- meanwhile,  data values that are labelled come from the *names* of vector passed to "labels"

when this is done with character vectors, get_labels() then produces the wrong values for "labels", providing the values in the data instead:

# This is fine:
get_labels(numvec)
[1] "a" "b" "c" "d"

# This is not:
get_labels(charvec)
[1] "one" "two" "three" "four"  <-- again, these are the values, not the labels

Is this a mistake, or is there something about intended behavior I'm not understanding?

It's showing up as an issue for me b/c I have a situation where the labels are generally serving multiple roles for Stata compatibility and helping merge datasets, but in a few cases I also want to provide metadata about more complex classification to my userbase, e.g.:

set_labels(c("TT", "CC", "TC", "CT", "TX", "XT", "CX", "XC", "XX"),
           labels = c(Treatment = "TT", Control = "CC",
                      Mixed = "CT", Mixed = "TC", 
                      PartialTreatment = "TX", PartialTreatment = "XT",
                      PartialControl = "TX", PartialControl = "XT",
                      Missing = "XX")
)

custom attributes for values and variables

Hi. Will you please consider using custom variable and value attribute names for collecting the variable and value labels?

I have been able to use the "get_labels" function to obtain the value labels contained in the label attribute from the readspss "read.sav" function, but I am not able to get the "get_label" function to obtain the variable labels contained in the var.label attribute from the same function.

Thank you in advance.

var_labels() function won't work with x as a variable name

When var_labels() is used to create labels on a frame that has x as a column name, the function won't work is one attempts to assign a label to x. Based on the code for the function, it appears that this situation is due to the fact that x is also the name of the data frame used internally in the function. The attached reprex uses the example from the vignette on cran; effectively the same code is in the help system for the function var_labels().

Reading the reprex below can be a bit painful, and the user can simply modify the vignette or help system example by using the variable names x, y, and z instead of the a, b, and c used in the examples.

reprex follows

# Minimal example to illustrate an issue with the var_labels function

# Example generated by William Anderson
#[email protected]
# August 1, 2021


require(rlang) 
#> Loading required package: rlang
require(sjlabelled)
#> Loading required package: sjlabelled
#> 
#> Attaching package: 'sjlabelled'
#> The following objects are masked from 'package:rlang':
#> 
#>     as_character, as_label
# I typically use the whole tidyverse package, but am using just rlang here to keep the example minimal
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] sjlabelled_1.1.8 rlang_0.4.11    
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.27     withr_2.4.2       magrittr_2.0.1    reprex_2.0.0     
#>  [5] evaluate_0.14     highr_0.9         stringi_1.6.2     cli_3.0.1        
#>  [9] rstudioapi_0.13   fs_1.5.0          rmarkdown_2.9     tools_4.1.0      
#> [13] stringr_1.4.0     glue_1.4.2        xfun_0.24         yaml_2.2.1       
#> [17] compiler_4.1.0    htmltools_0.5.1.1 insight_0.14.2    knitr_1.33

# code from the vignette on cran.
# The help system for the var_labels has essentially the same example, but also uses the pipe

dummy <- data.frame(
  a = sample(1:4, 10, replace = TRUE),
  b = sample(1:4, 10, replace = TRUE),
  c = sample(1:4, 10, replace = TRUE)
)

# simple usage
test <- var_labels(dummy, a = "first variable", c = "third variable")

attr(test$a, "label")
#> [1] "first variable"
attr(test$b, "label")
#> NULL
attr(test$c, "label")
#> [1] "third variable"
# outputs agree with the vignette

# modify the frame, by changing variable names

dumby <- data.frame(
  x = sample(1:4, 10, replace = TRUE),
  y = sample(1:4, 10, replace = TRUE),
  z = sample(1:4, 10, replace = TRUE)
)

testb <- var_labels(dumby, x = "first variable", z = "third variable")
#> Warning: Following elements are no valid column names in `x`: ,z
# DOES NOT WORK
# produces warning message, and testb is not give a data frame

testbb <- var_labels(dumby, y = "second variable", z = "third variable")
attr(testbb$x, "label")
#> NULL
attr(testbb$y, "label")
#> [1] "second variable"
attr(testbb$z, "label")
#> [1] "third variable"
# works 

# Conclusion: The var-labels() function gives unexpected results if one attempts to create labels on the variable x.
require(reprex)
#> Loading required package: reprex

^{Created on 2021-08-01 by the reprex package (v2.0.0)}

Please consider set F as default for as.num and atomic.to.fac

I have been using survey/poll data with sjlabelled and wondering if it better serves users who are dealing with facotrs (previous spss users).

It will save much of time of tuning it back everytime when cleaning and recoding variables.

Hopefully, the two arguments will be reconsidered to set to:
as.num = F
atomic.to.fac = F

Thank you.

strengejacke / sjlabelled Goto Github PK

sjlabelled's Introduction

Hello there!

Scientific and social media profiles

Statistical skills

Data wrangling and preparation

Statistics and regression modelling

Data Visualization

sjlabelled's People

Stargazers

Watchers

Forkers

sjlabelled's Issues

Issue

Solution

not working

not working

not working

this traditional method works:

labelling numeric vectors

labelling integer vectors

with this attribute there is an error

without this attribute there is no error

Recommend Projects

Recommend Topics

Recommend Org