Spatial Correspondence of Areal Distributions Quadrat and nearest-neighbor analysis deal with a single distribution of points

Yüklə 483 b.
ölçüsü483 b.

Spatial Correspondence of Areal Distributions

  • Quadrat and nearest-neighbor analysis deal with a single distribution of points

  • Often, we want to measure the distribution of two or more variables

  • The coefficient of Areal correspondence and chi-square statistics perform these tasks

Coefficient of Areal Correspondence

  • Simple measure of the extent to which two distributions correspond to one another

    • Compare wheat farming to areas of minimal rainfall
  • Based on the approach of overlay analysis

Overlay Analysis

  • Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other

Coefficient of Areal Correspondence

  • CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region

Result of CAC

  • Where there is no correspondence, CAC is equal to 0

  • Where there is total correspondence, CAC is equal to 1

  • CAC provides a simple measure of the extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship

Resemblance Matrix

  • Proposed by Court (1970)

  • Advantages over CAC

    • Limits are –1 to +1 with a perfect negative correspondence given a value of –1
    • Sampling distribution is roughly normal, so you can test for statistical significance

Chi-Square Statistic

  • Measures the strength of association between two distributions

  • Class Example

    • Relationship between wheat yield and precipitation
    • Two maps showing high and low yields and high and low precipitation


  • By combining distribution on one map we can better understand the relationship between the two distributions

  • In this example we are using a grid

    • The finer the grid, the more precise the measurement
  • Four possibilities exist

    • Low rainfall, low yield
    • Low rainfall, high yield
    • High rainfall, low yield
    • High rainfall, high yield



  • Create a table of expected frequencies using probability statistics (% High rain * # of high yield cells)

    • Row total * column total / table total

Compute Chi-Square

  • Therefore, in our example we have

Interpreting Chi Square

  • Zero indicates no relationship

  • Large numbers indicate stronger relationship

  • Or, a table of significance can be consulted to determine if the specific value is statistically significant

  • The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so.  In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.

If you don’t have Chi-Square values

  • Yule’s Q

  • Value of Yule’s Q always lies between –1 and +1

  • Value of 0 indicates no relationship

  • Value of +1 indicates a positive relationship

  • Value of –1 indicates a negative relationship

Analysis of Election 2000

  • Polygon to Polygon

  • Point to Polygon

Election 2000 Results

  • Join Count Analysis

  • Table 1. Expected vs. Actual Joins of Adjacent Counties Voting for the Same Candidate

  • Expected Actual Expected Expected Actual

  • Gore/Gore Joins Gore/Gore Joins Bush/Bush Joins Bush/Bush Joins

  • 438 879 5516 6253

  • ZGore/Gore 15.47; ZBush/Bush 8.75

  • Overlay Analysis

  • Table 2. Cities Falling Inside a County Won by Either Bush or Gore

  • Expected Expected Observed Observed Z Z

  • Gore Bush Gore Bush Gore Bush

  • Large (> 75K) 66 238 184 119 267 272

  • Medium (50-75K) 54 196 147 98 470 55

  • Small (<50K) 544 1273 2030 1236 4,998 3

  • No City 427 1588 347 1690 18 29

Election 2000 Results

  • There was obvious spatial autocorrelation in the way way people voted. That is, Bush counties and Gore counties were highly clustered

  • Also, there are a very high correlation between urbanized counties voting for Gore, and non-urbanized counties voting for Bush

Analysis of Environmental Justice

  • Point in Polygon Analysis

  • By

  • Greg Thorhaug css620 project – Spring 2001

Erie Chi-Squared


  • Spatial Data Analysis is possible, through basic statistical methods

  • More in-depth analysis is possible using spatial statistics

  • GIS software may be used to prepare data for statistical analysis

  • Spatial data analysis techniques provide a powerful tool for analyzing GIS data, and enable users to solve creative problems

Cross Tabulation

  • Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.

  • You can see that the resulting cross tabulation provides a pixel, by pixel comparison of the interpreted land cover types with the two dates. So, for the upper left hand cell, the 1980 land use was A, and the 2000 land cover also indicated the value of A. Therefore, this is a match between the 1980 data and 2000 data. However, in the lower right cell you can see that the 1980 data indicated a value of C, while the 2000 value was B. This is not a match, and would indicate an error between the two sources.

  • We can now quantify the results into a matrix as shown below. This matrix, is oftentimes called a confusion matrix

Confusion Matrix

  • The matrix on the right shows the comparison of the two hypothetical data sets. The 1980 data set and the 2000 data set .

  • As an example, geographic features that were classified as A on the map in 1980, and actually were still be A in 2000, represent the upper left hand matrix with the value 2 (there were two pixels that met this criteria). This means that 2 units in the overall map that were A, actually is A. Similarly, the same exists for the classifications of B and C.

  • But, there may have been times where the 1980 value was A and the 2000 value was B. In this case, the 2 represented in the top row of the matrix says that there are 2 units of something that was A in 1980, but is now B in 2000.

  • We can begin to add these number up, by adding an additional row and column. But what do these numbers tell us?

Comparing the maps

  • The bottom row tells us that there were two cells that were A, five cells that were B, and two cells that were C. The rightmost column tells us that we mapped four cells as A, three cells as B, and 2 cells as C. Adding up the Diagonal cells says there were 5 cells where we actually got it right.

  • So, the overall map comparison is really a function of:

    • Total cells on the diagonal / total number of cells.
      • (2 + 2 + 1) / (2 + 2 + 0 +0 + 2 + 1 + 0 + 1+1) = 5/9 = .55% agreement

Other Accuracy Assessment

  • The total correspondence of our example is 55%. But, that only tells us part of the story. What if we were really interested in classification B? Where there changes in classification B? Even here, there are two different ways of interpreting that question:

    • If I were interested in mapping all the areas of B, how well did I get them all? This is called the map Producer’s Accuracy. That is, how well did we produce a map of classification B.
    • If I were to use the map to find B, how successful would I be? This is called the Map User’s Accuracy. That is, much confidence should a user of the map have for a given classification.
  • To compute the map user’s accuracy, we would divide the total number correct within a row with the total number in the whole row. Staying with our example of classification B:

    • We said that we had two cells where B was correct. However, we actually said that there were three cells that contained B (in other words, we incorrectly called a cell B, when it should have been C). Therefore, we have:
      • 2 correct B values / 3 total values = .66 user’s accuracy.
    • This means that if we were to use this map and look for the classification of B, we would be correct 66% of the time.
  • To compute the map producer’s accuracy, we would divide the total number of correct within a column with the total number in the whole column. Staying with our example of classification B:

    • We said that we had two cells where B was correct. However, we actually said that there were five cells that should have been B. Therefore, we have:
      • 2 correct B values / 5 total values that should be B = .4 producer’s accuracy
    • This means that the map produced only 40% of all the B’s that were out there.

User and Producer Accuracy

  • To test your understanding of all this, compute the user’s and producer’s accuracy for classifications A and C.

  • This also gives us some indication of the nature of the errors. For instance, it appears that we confused classification A with classification B (we said on two occasions that B was A). By understanding the nature of the errors, perhaps we can go back, look over our process and correct for that mistake.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə