Spatial Correspondence of Areal Distributions Quadrat and nearest-neighbor analysis deal with a single distribution of points

Yüklə 483 b.

tarix	30.04.2018
ölçüsü	483 b.
	#40714

Spatial Correspondence of Areal Distributions

Quadrat and nearest-neighbor analysis deal with a single distribution of points
Often, we want to measure the distribution of two or more variables
The coefficient of Areal correspondence and chi-square statistics perform these tasks

Coefficient of Areal Correspondence

Simple measure of the extent to which two distributions correspond to one another

Compare wheat farming to areas of minimal rainfall

Based on the approach of overlay analysis

Overlay Analysis

Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other

Coefficient of Areal Correspondence

CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region

Result of CAC

Where there is no correspondence, CAC is equal to 0
Where there is total correspondence, CAC is equal to 1
CAC provides a simple measure of the extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship

Resemblance Matrix

Proposed by Court (1970)
Advantages over CAC

Limits are –1 to +1 with a perfect negative correspondence given a value of –1
Sampling distribution is roughly normal, so you can test for statistical significance

Chi-Square Statistic

Measures the strength of association between two distributions
Class Example

Relationship between wheat yield and precipitation
Two maps showing high and low yields and high and low precipitation

Chi-Square

By combining distribution on one map we can better understand the relationship between the two distributions
In this example we are using a grid

The finer the grid, the more precise the measurement

Four possibilities exist

Low rainfall, low yield
Low rainfall, high yield
High rainfall, low yield
High rainfall, high yield

Chi-Square

Record the total number of occurrences into a table of observed frequencies

Chi-Square

Create a table of expected frequencies using probability statistics (% High rain * # of high yield cells)

Row total * column total / table total

Compute Chi-Square

Therefore, in our example we have

Interpreting Chi Square

Zero indicates no relationship
Large numbers indicate stronger relationship
Or, a table of significance can be consulted to determine if the specific value is statistically significant
The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.

If you don’t have Chi-Square values

Yule’s Q
Value of Yule’s Q always lies between –1 and +1
Value of 0 indicates no relationship
Value of +1 indicates a positive relationship
Value of –1 indicates a negative relationship

Analysis of Election 2000

Polygon to Polygon
Point to Polygon

Election 2000 Results

Join Count Analysis
Table 1. Expected vs. Actual Joins of Adjacent Counties Voting for the Same Candidate
Expected Actual Expected Expected Actual
Gore/Gore Joins Gore/Gore Joins Bush/Bush Joins Bush/Bush Joins
438 879 5516 6253
ZGore/Gore 15.47; ZBush/Bush 8.75
Overlay Analysis
Table 2. Cities Falling Inside a County Won by Either Bush or Gore
Expected Expected Observed Observed Z Z
Gore Bush Gore Bush Gore Bush
Large (> 75K) 66 238 184 119 267 272
Medium (50-75K) 54 196 147 98 470 55
Small (<50K) 544 1273 2030 1236 4,998 3
No City 427 1588 347 1690 18 29

Election 2000 Results

There was obvious spatial autocorrelation in the way way people voted. That is, Bush counties and Gore counties were highly clustered
Also, there are a very high correlation between urbanized counties voting for Gore, and non-urbanized counties voting for Bush

Analysis of Environmental Justice

Point in Polygon Analysis
By
Greg Thorhaug css620 project – Spring 2001

Erie Chi-Squared

Summary

Spatial Data Analysis is possible, through basic statistical methods
More in-depth analysis is possible using spatial statistics
GIS software may be used to prepare data for statistical analysis
Spatial data analysis techniques provide a powerful tool for analyzing GIS data, and enable users to solve creative problems

Cross Tabulation

Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.
You can see that the resulting cross tabulation provides a pixel, by pixel comparison of the interpreted land cover types with the two dates. So, for the upper left hand cell, the 1980 land use was A, and the 2000 land cover also indicated the value of A. Therefore, this is a match between the 1980 data and 2000 data. However, in the lower right cell you can see that the 1980 data indicated a value of C, while the 2000 value was B. This is not a match, and would indicate an error between the two sources.
We can now quantify the results into a matrix as shown below. This matrix, is oftentimes called a confusion matrix

Confusion Matrix

The matrix on the right shows the comparison of the two hypothetical data sets. The 1980 data set and the 2000 data set .
As an example, geographic features that were classified as A on the map in 1980, and actually were still be A in 2000, represent the upper left hand matrix with the value 2 (there were two pixels that met this criteria). This means that 2 units in the overall map that were A, actually is A. Similarly, the same exists for the classifications of B and C.
But, there may have been times where the 1980 value was A and the 2000 value was B. In this case, the 2 represented in the top row of the matrix says that there are 2 units of something that was A in 1980, but is now B in 2000.
We can begin to add these number up, by adding an additional row and column. But what do these numbers tell us?

Comparing the maps

The bottom row tells us that there were two cells that were A, five cells that were B, and two cells that were C. The rightmost column tells us that we mapped four cells as A, three cells as B, and 2 cells as C. Adding up the Diagonal cells says there were 5 cells where we actually got it right.
So, the overall map comparison is really a function of:

Total cells on the diagonal / total number of cells.

(2 + 2 + 1) / (2 + 2 + 0 +0 + 2 + 1 + 0 + 1+1) = 5/9 = .55% agreement

Other Accuracy Assessment

The total correspondence of our example is 55%. But, that only tells us part of the story. What if we were really interested in classification B? Where there changes in classification B? Even here, there are two different ways of interpreting that question:

If I were interested in mapping all the areas of B, how well did I get them all? This is called the map Producer’s Accuracy. That is, how well did we produce a map of classification B.
If I were to use the map to find B, how successful would I be? This is called the Map User’s Accuracy. That is, much confidence should a user of the map have for a given classification.

To compute the map user’s accuracy, we would divide the total number correct within a row with the total number in the whole row. Staying with our example of classification B:

We said that we had two cells where B was correct. However, we actually said that there were three cells that contained B (in other words, we incorrectly called a cell B, when it should have been C). Therefore, we have:

2 correct B values / 3 total values = .66 user’s accuracy.

This means that if we were to use this map and look for the classification of B, we would be correct 66% of the time.

To compute the map producer’s accuracy, we would divide the total number of correct within a column with the total number in the whole column. Staying with our example of classification B:

We said that we had two cells where B was correct. However, we actually said that there were five cells that should have been B. Therefore, we have:

2 correct B values / 5 total values that should be B = .4 producer’s accuracy

This means that the map produced only 40% of all the B’s that were out there.

User and Producer Accuracy

To test your understanding of all this, compute the user’s and producer’s accuracy for classifications A and C.
This also gives us some indication of the nature of the errors. For instance, it appears that we confused classification A with classification B (we said on two occasions that B was A). By understanding the nature of the errors, perhaps we can go back, look over our process and correct for that mistake.

Yüklə 483 b.

Dostları ilə paylaş:

Spatial Correspondence of Areal Distributions Quadrat and nearest-neighbor analysis deal with a single distribution of points

Spatial Correspondence of Areal Distributions

Quadrat and nearest-neighbor analysis deal with a single distribution of points

Often, we want to measure the distribution of two or more variables

The coefficient of Areal correspondence and chi-square statistics perform these tasks

Coefficient of Areal Correspondence

Simple measure of the extent to which two distributions correspond to one another

Based on the approach of overlay analysis

Overlay Analysis

Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other

Coefficient of Areal Correspondence

CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region

Result of CAC

Where there is no correspondence, CAC is equal to 0

Where there is total correspondence, CAC is equal to 1

CAC provides a simple measure of the extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship

Resemblance Matrix

Proposed by Court (1970)

Advantages over CAC

Chi-Square Statistic

Measures the strength of association between two distributions

Class Example

Chi-Square

By combining distribution on one map we can better understand the relationship between the two distributions

In this example we are using a grid

Four possibilities exist

Chi-Square

Record the total number of occurrences into a table of observed frequencies

Chi-Square

Create a table of expected frequencies using probability statistics (% High rain * # of high yield cells)

Compute Chi-Square

Therefore, in our example we have

Interpreting Chi Square

Zero indicates no relationship

Large numbers indicate stronger relationship

Or, a table of significance can be consulted to determine if the specific value is statistically significant

The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.

If you don’t have Chi-Square values

Yule’s Q

Value of Yule’s Q always lies between –1 and +1

Value of 0 indicates no relationship

Value of +1 indicates a positive relationship

Value of –1 indicates a negative relationship

Analysis of Election 2000

Polygon to Polygon

Point to Polygon

Election 2000 Results

Join Count Analysis

Table 1. Expected vs. Actual Joins of Adjacent Counties Voting for the Same Candidate

Expected Actual Expected Expected Actual

Gore/Gore Joins Gore/Gore Joins Bush/Bush Joins Bush/Bush Joins

438 879 5516 6253

ZGore/Gore 15.47; ZBush/Bush 8.75

Overlay Analysis

Table 2. Cities Falling Inside a County Won by Either Bush or Gore

Expected Expected Observed Observed Z Z

Gore Bush Gore Bush Gore Bush

Large (> 75K) 66 238 184 119 267 272

Medium (50-75K) 54 196 147 98 470 55

Small (<50K) 544 1273 2030 1236 4,998 3

No City 427 1588 347 1690 18 29

Election 2000 Results

There was obvious spatial autocorrelation in the way way people voted. That is, Bush counties and Gore counties were highly clustered

Also, there are a very high correlation between urbanized counties voting for Gore, and non-urbanized counties voting for Bush

Analysis of Environmental Justice

Point in Polygon Analysis

By

Greg Thorhaug css620 project – Spring 2001

Erie Chi-Squared

Summary

Spatial Data Analysis is possible, through basic statistical methods

More in-depth analysis is possible using spatial statistics

GIS software may be used to prepare data for statistical analysis

Spatial data analysis techniques provide a powerful tool for analyzing GIS data, and enable users to solve creative problems

Cross Tabulation

Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.

We can now quantify the results into a matrix as shown below. This matrix, is oftentimes called a confusion matrix

Confusion Matrix

The matrix on the right shows the comparison of the two hypothetical data sets. The 1980 data set and the 2000 data set .

But, there may have been times where the 1980 value was A and the 2000 value was B. In this case, the 2 represented in the top row of the matrix says that there are 2 units of something that was A in 1980, but is now B in 2000.