Data for cleaning {epicalc}R Documentation

Dataset for practicing cleaning, labelling and recoding

Description

The data come from clients of a family planning clinic.

For all variables except id: 9, 99, 99.9, 888, 999 represent missing values

Usage

data(Planning)

Format

A data frame with 251 observations on the following 11 variables.

ID
a numeric vector: ID code
AGE
a numeric vector
RELIG
a numeric vector: Religion

1 = Buddhist
2 = Muslim

PED
a numeric vector: Patient's education level

1 = none
2 = primary school
3 = secondary school
4 = high school
5 = vocational school
6 = university
7 = other

INCOME
a numeric vector: Monthly income in Thai Baht

1 = nil
2 = < 1,000
3 = 1,000-4,999
4 = 5,000-9,999
5 = 10,000

AM
a numeric vector: Age at marriage
REASON
a numeric vector: Reason for family planning

1 = birth spacing
2 = enough children
3 = other

BPS
a numeric vector: systolic blood pressure
BPD
a numeric vector: diastolic blood pressure
WT
a numeric vector: weight (Kg)
HT
a numeric vector: height (cm)

Examples

data(Planning)
des(Planning)

# Change var. name to lowercase
names(Planning) <- tolower(names(Planning)) 
use(Planning)
des()

# Check for duplication of 'id'
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215

# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216

# Correct the wrong one
id[duplicated(id)] <- 216

[Package epicalc version 2.10.1.1 Index]