本文共 6938 字,大约阅读时间需要 23 分钟。
http://www.strengejacke.de/sjPlot/labelleddata/
This document shows basic usage of the package and how to work with labelled data.
Ressources:
Download package from
Developer snapshot at
Submission of bug reports and issues at
(back to )
Basically, this package covers three domains of functionality:
reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages
hence, sjmisc also includes function to work with labelled data
frequently applied recoding and variable conversion tasks
In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:
factor(c("low", "high", "mid", "high", "low"))
## [1] low high mid high low ## Levels: high low mid
Reading SPSS-data (from haven, foreign or sjmisc), keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjmisc-package:
library(sjmisc)data(efc)str(efc$e42dep)
## atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...## - attr(*, "label")= chr "elder's dependency"## - attr(*, "labels")= Named num [1:4] 1 2 3 4## ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
While all plotting and table functions of the make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:
library(sjmisc)data(efc)barplot(table(efc$e42dep, efc$e16sex), beside = T, legend.text = T)
As you can see in the above figure, the plot has neither axis nor legend labels.
to_label
is a sjmisc-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.
barplot(table(to_label(efc$e42dep), to_label(efc$e16sex)), beside = T, legend.text = T)
to_factor
is a convenient replacement of as.factor
, which converts a numeric vector into a factor, but keeps the value and variable label attributes.
There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:
get_label()
to get variable labels
get_labels()
to get value labels
set_label()
to set variable labels (add them as vector attribute)
set_labels()
to set value labels (add them as vector attribute)
With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.
barplot(table(to_label(efc$e42dep), to_label(efc$e16sex)), beside = T, legend.text = T, main = get_label(efc$e42dep))
get_label(efc)
would return all data.frame’s variable labels. And get_labels(efc)
would return a list with all value labels of all data.frame’s variables.
Converting labelled
vectors into factor
s usually drops label attributes (e.g. using as_factor
) or replaces values with the associated labels (like to_label
does). If you want to convert a labelled vector into a numeric factor, but keep the label attributes (including variable labels), use to_factor
.
Functions like lm
simply copy these attributes and store these information in the returned object; see following example from the sjPlot
-package:
library(sjPlot)## #refugeeswelcomedata(efc)# make education categoricalefc$c172code <- to_factor(efc$c172code)fit <- lm(barthtot ~ c160age + c12hour + c172code + c161sex, data = efc)
sjt.lm(fit, group.pred = TRUE)
Total score BARTHEL INDEX | ||||
B | CI | p | ||
(Intercept) | 87.54 | 76.34 – 98.75 | <.001 | |
carer’ age | -0.21 | -0.35 – -0.07 | .004 | |
average number of hours of care per week | -0.28 | -0.32 – -0.24 | <.001 | |
carer’s level of education | ||||
intermediate level of education | 1.37 | -3.12 – 5.85 | .550 | |
high level of education | -1.64 | -7.22 – 3.93 | .564 | |
carer’s gender | -0.39 | -4.49 – 3.71 | .850 | |
Observations | 821 | |||
R2 / adj. R2 | .271 / .266 |
Looking at str(fit$frame)
shows us that both variable and value label attributes are still there. Packages like sjPlot
make use of this feature and automatically label the table output (like seen above).
The base subset
function drops label attributes (or vector attributes in general) when subsetting data. Since version 1.0.3 of the sjmisc-package, there are handy functions to deal with this problem: copy_labels
and remove_labels
.
copy_labels
adds back labels to a subsetted data frame based on the original data frame. And remove_labels
removes all label attributes.
efc.sub <- subset(efc, subset = e16sex == 1, select = c(4:8))str(efc.sub)
## 'data.frame': 296 obs. of 5 variables:## $ e17age : num 74 68 80 72 94 79 67 80 76 88 ...## $ e42dep : num 4 4 1 3 3 4 3 4 2 4 ...## $ c82cop1: num 4 3 3 4 3 3 4 2 2 3 ...## $ c83cop2: num 2 4 2 2 2 2 1 3 2 2 ...## $ c84cop3: num 4 4 1 1 1 4 2 4 2 4 ...
efc.sub <- copy_labels(efc.sub, efc)str(efc.sub)
## 'data.frame': 296 obs. of 5 variables:## $ e17age : atomic 74 68 80 72 94 79 67 80 76 88 ...## ..- attr(*, "label")= Named chr "elder' age"## .. ..- attr(*, "names")= chr "e17age"## $ e42dep : atomic 4 4 1 3 3 4 3 4 2 4 ...## ..- attr(*, "label")= Named chr "elder's dependency"## .. ..- attr(*, "names")= chr "e42dep"## ..- attr(*, "labels")= Named num 1 2 3 4## .. ..- attr(*, "names")= chr "independent" "slightly dependent" "moderately dependent" "severely dependent"## $ c82cop1: atomic 4 3 3 4 3 3 4 2 2 3 ...## ..- attr(*, "label")= Named chr "do you feel you cope well as caregiver?"## .. ..- attr(*, "names")= chr "c82cop1"## ..- attr(*, "labels")= Named num 1 2 3 4## .. ..- attr(*, "names")= chr "never" "sometimes" "often" "always"## $ c83cop2: atomic 2 4 2 2 2 2 1 3 2 2 ...## ..- attr(*, "label")= Named chr "do you find caregiving too demanding?"## .. ..- attr(*, "names")= chr "c83cop2"## ..- attr(*, "labels")= Named num 1 2 3 4## .. ..- attr(*, "names")= chr "Never" "Sometimes" "Often" "Always"## $ c84cop3: atomic 4 4 1 1 1 4 2 4 2 4 ...## ..- attr(*, "label")= Named chr "does caregiving cause difficulties in your relationship with your friends?"## .. ..- attr(*, "names")= chr "c84cop3"## ..- attr(*, "labels")= Named num 1 2 3 4## .. ..- attr(*, "names")= chr "Never" "Sometimes" "Often" "Always"
When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjmisc-package supports this feature and offers useful functions for these tasks.