8
dmst.zones
plot(nypoly, col = color.clusters(out))
## End(Not run)
dmst.zones
Determine zones using the dynamic minimum spanning tree scan test
of Assuncao et al. (2006)
Description
dmst.zones determines the zones that produce the largest test statistic using a greedy algorithm.
Specifically, starting individually with each region as a starting zone, new (connected) regions are
added to the current zone in the order that results in the largest likelihood ratio test statistic. This is
used to implement the dynamic minimum spanning tree (dmst) scan test of Assuncao et al. (2006).
Usage
dmst.zones(coords, cases, pop, w, ex = sum(cases)/sum(pop) * pop,
ubpop = 0.5, ubd = 1, lonlat = FALSE, parallel = FALSE,
maxonly = FALSE)
Arguments
coords
An n × 2 matrix of centroid coordinates for the regions.
cases
The number of cases observed in each region.
pop
The population size associated with each region.
w
A binary spatial adjacency matrix.
ex
The expected number of cases for each region. The default is calculated under
the constant risk hypothesis.
ubpop
The upperbound of the proportion of the total population to consider for a clus-
ter.
ubd
The upperbound for the radius of a cluster. This should be a proportion in (0, 1].
The value is the proportion of the maximum intercentroid distance between any
two locations in
coords. See Details.
lonlat
The default is
FALSE, which specifies that Euclidean distance should be used.If
lonlat is TRUE, then the great circle distance is used to calculate the inter-
centroid distance.
parallel
A logical indicating whether the test should be parallelized using the
parallel::mclapply function.
Default is
TRUE. If TRUE, no progress will be reported.
maxonly
A logical value indicating whether to return only the maximum test statistic
across all candidate zones. Default is
FALSE.
dweights
9
Details
The test is performed using the spatial scan test based on the Poisson test statistic and a fixed number
of cases. The first cluster is the most likely to be a cluster. If no significant clusters are found, then
the most likely cluster is returned (along with a warning).
Every zone considered must have a total population less than
ubpop * sum(pop). Addition-
ally, the maximum intercentroid distance for the regions within a zone must be no more than
ubd * the maximum intercentroid distance across all regions.
Value
Returns a list of zones to consider for clustering that includes the location id of each zone and the
associated test statistic, number of cases, expected number of cases, and the population in the zone.
If
maxonly = TRUE, then only the maximum test statistic across all of these zones is returned.
Author(s)
Joshua French
References
Assuncao, R.M., Costa, M.A., Tavares, A. and Neto, S.J.F. (2006). Fast detection of arbitrarily
shaped disease clusters, Statistics in Medicine, 25, 723-742.
Examples
data(nydf)
data(nyw)
coords = as.matrix(nydf[,c("longitude", "latitude")])
# find zone with max statistic starting from each individual region
max_zones = dmst.zones(coords, cases = floor(nydf$cases),
nydf$pop, w = nyw, ubpop = 0.25,
ubd = .25, lonlat = TRUE)
head(max_zones)
dweights
Distance-based weights
Description
dweights constructs a distance-based weights matrix. The dweights function can be used to con-
struct a weights matrix
w using the method of Tango (1995), Rogerson (1999), or a basic style.
Usage
dweights(coords, kappa = 1, lonlat = FALSE, type = "basic",
cases = NULL, pop = NULL)
10
dweights
Arguments
coords
An n × 2 matrix of centroid coordinates for the regions.
kappa
A positive constant related to strength of spatial autocorrelation.
lonlat
The default is
FALSE, which specifies that Euclidean distance should be used.If
lonlat is TRUE, then the great circle distance is used to calculate the inter-
centroid distance.
type
The type of weights matrix to construct. Current options are
"basic", "tango",
and
"rogerson". Default is "basic". See Details.
cases
The number of cases observed in each region.
pop
The population size associated with each region.
Details
coords is used to construct an n × n distance matrix d.
If
type = "basic", then w
ij
= exp(−d
ij
/κ).
If
type = "rogerson", then w
ij
= exp(−d
ij
/κ)/
(cases
i
/pop
i
∗ cases
j
/pop
j
).
If
type = "tango", then w
ij
= exp(−4 ∗ d
2
ij
/κ
2
).
Value
Returns an n × n matrix of weights.
Author(s)
Joshua French
References
Tango, T. (1995) A class of tests for detecting "general" and "focused" clustering of rare diseases.
Statistics in Medicine. 14:2323-2334.
Rogerson, P. (1999) The Detection of Clusters Using A Spatial Version of the Chi-Square Goodness-
of-fit Test. Geographical Analysis. 31:130-147
See Also
tango.test
Examples
data(nydf)
coords = as.matrix(nydf[,c("longitude", "latitude")])
w = dweights(coords, kappa = 1)