spc {zipfR} | R Documentation |
In the zipfR
library, spc
objects are used to represent
a word frequency spectrum (either an observed spectrum or the expected
spectrum of a LNRE model at a given sample size).
With the spc
constructor function, an object can be initialized
directly from the specified data vectors. It is more common to read
an observed spectrum from a disk file with read.spc
or
compute an expected spectrum with lnre.spc
, though.
spc
objects should always be treated as read-only.
spc(Vm, m=1:length(Vm), VVm=NULL, N=NA, V=NA, VV=NA, m.max=0, expected=!missing(VVm))
m |
integer vector of frequency classes m (if omitted,
Vm is assumed to list the first k frequency classes
V_1, \ldots, V_k) |
Vm |
vector of corresponding class sizes V_m (may be fractional for expected frequency spectrum E[V_m]) |
VVm |
optional vector of estimated variances Var[V_m] (for expected frequency spectrum only) |
N, V |
total sample size N and vocabulary size V of
frequency spectrum. While these values are usually determined
automatically from m and Vm , they are required for an
incomplete frequency spectrum that does not list all non-empty
frequency classes. |
VV |
variance Var[V] of expected
vocabulary size. If VVm is specified, VV should
also be given. |
m.max |
highest frequency class m listed in incomplete
spectrum. If m.max is set, N and V also have
to be specified, and all non-zero frequency classes up to
m.max have to be included in the input vectors. Frequency
classes above m.max in the input will automatically be
deleted. |
expected |
set to TRUE if the frequency spectrum
represents expected values E[V_m] of the class sizes according
to some LNRE model (this is automatically triggered when the
VVm argument is specified). |
A spc
object is a data frame with the following variables:
m
Vm
VVm
The following attributes are used to store additional information about the frequency spectrum:
m.max
m.max
N, V
m
and
Vm
, but they are essential for an incomplete spectrum.
VV
hasVariances
is TRUE
. Note that VV
may
have the value NA
is the user failed to specify it.
expected
TRUE
, frequency spectrum lists
expected class sizes E[V_m] (rather than observed
sizes V_m). Note that the VVm
variable is only
allowed for an expected frequency spectrum.
hasVariances
VVm
variable is present
An object of class spc
representing the specified frequency
spectrum. This object should be treated as read-only (although such
behaviour cannot be enforced in R).
read.spc
, write.spc
,
spc.vector
, sample.spc
,
spc2tfl
, tfl2spc
,
lnre.spc
, plot.spc
Generic methods supported by spc
objects are
print
, summary
, N
,
V
, Vm
, VV
, and
VVm
.
Implementation details and non-standard arguments for these methods
can be found on the manpages print.spc
,
summary.spc
, N.spc
, V.spc
,
etc.
## load Brown imaginative prose spectrum and inspect it data(BrownImag.spc) summary(BrownImag.spc) print(BrownImag.spc) plot(BrownImag.spc) N(BrownImag.spc) V(BrownImag.spc) Vm(BrownImag.spc,1) Vm(BrownImag.spc,1:5) ## compute ZM model, and generate PARTIAL expected spectrum ## with variances for a sample of 10 million tokens zm <- lnre("zm",BrownImag.spc) zm.spc <- lnre.spc(zm,1e+7,variances=TRUE) ## inspect extrapolated spectrum summary(zm.spc) print(zm.spc) plot(zm.spc,log="x") N(zm.spc) V(zm.spc) VV(zm.spc) Vm(zm.spc,1) VVm(zm.spc,1) ## generate an artificial Zipfian-looking spectrum ## and take a look at it zipf.spc <- spc(round(1000/(1:1000)^2)) summary(zipf.spc) plot(zipf.spc) ## see manpages of lnre, and the various *.spc mapages ## for more examples of spc usage