博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Working with labelled Data
阅读量:6381 次
发布时间:2019-06-23

本文共 6938 字,大约阅读时间需要 23 分钟。

http://www.strengejacke.de/sjPlot/labelleddata/

Working with labelled Data {sjmisc}

This document shows basic usage of the  package and how to work with labelled data.

Ressources:

  • Download package from 

  • Developer snapshot at 

  • Submission of bug reports and issues at 

(back to )

The sjmisc-Package

Basically, this package covers three domains of functionality:

  • reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages

  • hence, sjmisc also includes function to work with labelled data

  • frequently applied recoding and variable conversion tasks

Labelled Data

In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:

factor(c("low", "high", "mid", "high", "low"))
## [1] low  high mid  high low ## Levels: high low mid

Reading SPSS-data (from haven, foreign or sjmisc), keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjmisc-package:

library(sjmisc)data(efc)str(efc$e42dep)
##  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...##  - attr(*, "label")= chr "elder's dependency"##  - attr(*, "labels")= Named num [1:4] 1 2 3 4##   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

While all plotting and table functions of the  make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:

library(sjmisc)data(efc)barplot(table(efc$e42dep, efc$e16sex),         beside = T,         legend.text = T)

unnamed-chunk-3-1.png

As you can see in the above figure, the plot has neither axis nor legend labels.

Adding value labels as factor values

to_label is a sjmisc-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.

barplot(table(to_label(efc$e42dep),              to_label(efc$e16sex)),         beside = T,         legend.text = T)

unnamed-chunk-4-1.png

to_factor is a convenient replacement of as.factor, which converts a numeric vector into a factor, but keeps the value and variable label attributes.

Getting and setting value and variable labels

There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:

  • get_label() to get variable labels

  • get_labels() to get value labels

  • set_label() to set variable labels (add them as vector attribute)

  • set_labels() to set value labels (add them as vector attribute)

With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.

barplot(table(to_label(efc$e42dep),              to_label(efc$e16sex)),         beside = T,         legend.text = T,        main = get_label(efc$e42dep))

unnamed-chunk-5-1.png

get_label(efc) would return all data.frame’s variable labels. And get_labels(efc) would return a list with all value labels of all data.frame’s variables.

Another example

Converting labelled vectors into factors usually drops label attributes (e.g. using as_factor) or replaces values with the associated labels (like to_label does). If you want to convert a labelled vector into a numeric factor, but keep the label attributes (including variable labels), use to_factor.

Functions like lm simply copy these attributes and store these information in the returned object; see following example from the sjPlot-package:

library(sjPlot)## #refugeeswelcomedata(efc)# make education categoricalefc$c172code <- to_factor(efc$c172code)fit <- lm(barthtot ~ c160age + c12hour + c172code + c161sex,           data = efc)
sjt.lm(fit, group.pred = TRUE)
Total score BARTHEL INDEX
B CI p
(Intercept)
87.54 76.34 – 98.75 <.001
carer’ age
-0.21 -0.35 – -0.07 .004
average number of hours of care per week
-0.28 -0.32 – -0.24 <.001
carer’s level of education
intermediate level of education
1.37 -3.12 – 5.85 .550
high level of education
-1.64 -7.22 – 3.93 .564
carer’s gender
-0.39 -4.49 – 3.71 .850
Observations
821
R2 / adj. R2
.271 / .266

Looking at str(fit$frame) shows us that both variable and value label attributes are still there. Packages like sjPlot make use of this feature and automatically label the table output (like seen above).

Restore labels from subsetted data

The base subset function drops label attributes (or vector attributes in general) when subsetting data. Since version 1.0.3 of the sjmisc-package, there are handy functions to deal with this problem: copy_labels and remove_labels.

copy_labels adds back labels to a subsetted data frame based on the original data frame. And remove_labelsremoves all label attributes.

Losing labels during subset

efc.sub <- subset(efc, subset = e16sex == 1, select = c(4:8))str(efc.sub)
## 'data.frame':    296 obs. of  5 variables:##  $ e17age : num  74 68 80 72 94 79 67 80 76 88 ...##  $ e42dep : num  4 4 1 3 3 4 3 4 2 4 ...##  $ c82cop1: num  4 3 3 4 3 3 4 2 2 3 ...##  $ c83cop2: num  2 4 2 2 2 2 1 3 2 2 ...##  $ c84cop3: num  4 4 1 1 1 4 2 4 2 4 ...

Add back labels

efc.sub <- copy_labels(efc.sub, efc)str(efc.sub)
## 'data.frame':    296 obs. of  5 variables:##  $ e17age : atomic  74 68 80 72 94 79 67 80 76 88 ...##   ..- attr(*, "label")= Named chr "elder' age"##   .. ..- attr(*, "names")= chr "e17age"##  $ e42dep : atomic  4 4 1 3 3 4 3 4 2 4 ...##   ..- attr(*, "label")= Named chr "elder's dependency"##   .. ..- attr(*, "names")= chr "e42dep"##   ..- attr(*, "labels")= Named num  1 2 3 4##   .. ..- attr(*, "names")= chr  "independent" "slightly dependent" "moderately dependent" "severely dependent"##  $ c82cop1: atomic  4 3 3 4 3 3 4 2 2 3 ...##   ..- attr(*, "label")= Named chr "do you feel you cope well as caregiver?"##   .. ..- attr(*, "names")= chr "c82cop1"##   ..- attr(*, "labels")= Named num  1 2 3 4##   .. ..- attr(*, "names")= chr  "never" "sometimes" "often" "always"##  $ c83cop2: atomic  2 4 2 2 2 2 1 3 2 2 ...##   ..- attr(*, "label")= Named chr "do you find caregiving too demanding?"##   .. ..- attr(*, "names")= chr "c83cop2"##   ..- attr(*, "labels")= Named num  1 2 3 4##   .. ..- attr(*, "names")= chr  "Never" "Sometimes" "Often" "Always"##  $ c84cop3: atomic  4 4 1 1 1 4 2 4 2 4 ...##   ..- attr(*, "label")= Named chr "does caregiving cause difficulties in your relationship with your friends?"##   .. ..- attr(*, "names")= chr "c84cop3"##   ..- attr(*, "labels")= Named num  1 2 3 4##   .. ..- attr(*, "names")= chr  "Never" "Sometimes" "Often" "Always"

Conclusion

When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjmisc-package supports this feature and offers useful functions for these tasks.

本文转自 h2appy  51CTO博客,原文链接:http://blog.51cto.com/h2appy/1878103,如需转载请自行联系原作者
你可能感兴趣的文章
细说Nullable<T>类型
查看>>
oracle 插入表数据的4种方式
查看>>
7.Ajax
查看>>
Linux vi/vim编辑器常用命令与用法总结
查看>>
对于 url encode decode js 和 c# 有差异
查看>>
mysql 修改列为not null报错Invalid use of NULL value
查看>>
epoll源码分析
查看>>
朱晔和你聊Spring系列S1E4:灵活但不算好用的Spring MVC
查看>>
Java使用Try with resources自动关闭资源
查看>>
china-pub十一周年庆,多重优惠隆重登场,千万别错过哟!
查看>>
HDU 3068 最长回文(manacher算法)
查看>>
二叉树
查看>>
手把手教你如何安装水晶易表——靠谱的安装教程
查看>>
Python单例模式(Singleton)的N种实现
查看>>
221. Maximal Square
查看>>
MySQL基础
查看>>
LeetCode35.搜索插入位置 JavaScript
查看>>
5个让人赞不绝口的微信小程序,拒绝占用手机内存!
查看>>
Spring Security整合KeyCloak保护Rest API
查看>>
POS概述
查看>>