Create a table of expected frequencies using probability statistics (% High rain * # of high yield cells)
Row total * column total / table total
Therefore, in our example we have
Interpreting Chi Square
Zero indicates no relationship
Large numbers indicate stronger relationship
Or, a table of significance can be consulted to determine if the specific value is statistically significant
The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.
Table 2. Cities Falling Inside a County Won by Either Bush or Gore
Expected Expected Observed Observed Z Z
Gore Bush Gore Bush Gore Bush
Large (> 75K) 66 238 184 119 267 272
Medium (50-75K) 54 196 147 98 470 55
Small (<50K) 544 1273 2030 1236 4,998 3
No City 427 1588 347 1690 18 29
Election 2000 Results
There was obvious spatial autocorrelation in the way way people voted. That is, Bush counties and Gore counties were highly clustered
Also, there are a very high correlation between urbanized counties voting for Gore, and non-urbanized counties voting for Bush
Analysis of Environmental Justice
Point in Polygon Analysis
Greg Thorhaug css620 project – Spring 2001
Spatial Data Analysis is possible, through basic statistical methods
More in-depth analysis is possible using spatial statistics
GIS software may be used to prepare data for statistical analysis
Spatial data analysis techniques provide a powerful tool for analyzing GIS data, and enable users to solve creative problems
Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.
You can see that the resulting cross tabulation provides a pixel, by pixel comparison of the interpreted land cover types with the two dates. So, for the upper left hand cell, the 1980 land use was A, and the 2000 land cover also indicated the value of A. Therefore, this is a match between the 1980 data and 2000 data. However, in the lower right cell you can see that the 1980 data indicated a value of C, while the 2000 value was B. This is not a match, and would indicate an error between the two sources.
We can now quantify the results into a matrix as shown below. This matrix, is oftentimes called a confusion matrix
The matrix on the right shows the comparison of the two hypothetical data sets. The 1980 data set and the 2000 data set .
As an example, geographic features that were classified as A on the map in 1980, and actually were still be A in 2000, represent the upper left hand matrix with the value 2 (there were two pixels that met this criteria). This means that 2 units in the overall map that were A, actually is A. Similarly, the same exists for the classifications of B and C.
But, there may have been times where the 1980 value was A and the 2000 value was B. In this case, the 2 represented in the top row of the matrix says that there are 2 units of something that was A in 1980, but is now B in 2000.
We can begin to add these number up, by adding an additional row and column. But what do these numbers tell us?
Comparing the maps
The bottom row tells us that there were two cells that were A, five cells that were B, and two cells that were C. The rightmost column tells us that we mapped four cells as A, three cells as B, and 2 cells as C. Adding up the Diagonal cells says there were 5 cells where we actually got it right.
So, the overall map comparison is really a function of:
Total cells on the diagonal / total number of cells.
The total correspondence of our example is 55%. But, that only tells us part of the story. What if we were really interested in classification B? Where there changes in classification B? Even here, there are two different ways of interpreting that question:
If I were interested in mapping all the areas of B, how well did I get them all? This is called the map Producer’s Accuracy. That is, how well did we produce a map of classification B.
If I were to use the map to find B, how successful would I be? This is called the Map User’s Accuracy. That is, much confidence should a user of the map have for a given classification.
To compute the map user’s accuracy, we would divide the total number correct within a row with the total number in the whole row. Staying with our example of classification B:
We said that we had two cells where B was correct. However, we actually said that there were three cells that contained B (in other words, we incorrectly called a cell B, when it should have been C). Therefore, we have:
2 correct B values / 3 total values = .66 user’s accuracy.
To compute the map producer’s accuracy, we would divide the total number of correct within a column with the total number in the whole column. Staying with our example of classification B:
We said that we had two cells where B was correct. However, we actually said that there were five cells that should have been B. Therefore, we have:
2 correct B values / 5 total values that should be B = .4 producer’s accuracy
This means that the map produced only 40% of all the B’s that were out there.
User and Producer Accuracy
To test your understanding of all this, compute the user’s and producer’s accuracy for classifications A and C.
This also gives us some indication of the nature of the errors. For instance, it appears that we confused classification A with classification B (we said on two occasions that B was A). By understanding the nature of the errors, perhaps we can go back, look over our process and correct for that mistake.