Measures of Dispersion and Shape in Appraisal Data

Chapter: Measures of Dispersion and Shape in Appraisal Data
Introduction
This chapter delves into the essential concepts of dispersion and shape in the context of statistical analysis for real estate appraisal. Understanding these measures is crucial for assessing the variability and distributional characteristics of appraisal data, enabling informed decisions and reliable conclusions. Measures of dispersion quantify the spread or variability within a dataset, while measures of shape describe the asymmetry and peakedness of the distribution. These measures are pivotal for determining the appropriateness of various statistical techniques and for drawing meaningful insights from appraisal data.
Measures of Dispersion
Measures of dispersion provide information about the spread or variability of data points in a dataset. A dataset with high dispersion indicates that the values are widely scattered, whereas low dispersion suggests that the values are clustered closely together.
-
Standard Deviation and Variance
The standard deviation and variance are fundamental measures of dispersion that consider all data points in a dataset.
-
Variance: The variance measures the average squared deviation of each data point from the mean. A higher variance indicates greater variability in the data.
-
Population Variance (σ²): Calculated as the sum of squared deviations from the population mean (μ), divided by the population size (N).
σ² = Σ(xᵢ - μ)² / N
where:
- xᵢ = individual data point
- μ = population mean
- N = population size
-
Sample Variance (S²): Calculated as the sum of squared deviations from the sample mean (X), divided by the sample size minus 1 (n-1).
S² = Σ(xᵢ - X)² / (n - 1)
where:
- xᵢ = individual data point
- X = sample mean
- n = sample size
-
-
Standard Deviation: The standard deviation is the square root of the variance. It provides a more interpretable measure of dispersion in the original units of the data.
-
Population Standard Deviation (σ): The square root of the population variance.
σ = √[Σ(xᵢ - μ)² / N]
-
Sample Standard Deviation (S): The square root of the sample variance.
S = √[Σ(xᵢ - X)² / (n - 1)]
-
-
Practical Application: Consider a dataset of sale prices for comparable properties. A high standard deviation indicates a wider range of sale prices, suggesting greater variability in the market. A low standard deviation suggests more uniformity in sale prices.
- Experiment: Collect sale prices of 20 similar residential properties in two different neighborhoods. Calculate the standard deviation for each neighborhood. The neighborhood with the higher standard deviation has more price variability.
-
-
Coefficient of Variation
The coefficient of variation (CV) is a relative measure of dispersion that expresses the standard deviation as a percentage of the mean. This allows for comparison of variability across datasets with different units or scales.
-
Formula:
CV = (S / X) * 100%
where:
- S = sample standard deviation
- X = sample mean
-
Practical Application: When comparing the variability of land values and building values, the CV provides a standardized measure, as the absolute values and units of land and building values are different.
- Example: If the mean land value is $100,000 with a standard deviation of $10,000 and the mean building value is $200,000 with a standard deviation of $15,000, the CV for land is 10% and the CV for the building is 7.5%. Land values are relatively more variable.
-
-
Range
The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset.
-
Formula:
Range = Maximum Value - Minimum Value
-
Practical Application: The range can quickly indicate the spread of property sizes or ages in a sample of comparable properties.
- Limitations: The range is sensitive to extreme values and does not reflect the distribution of data points within the dataset.
-
-
Interquartile Range (IQR)
The interquartile range (IQR) measures the spread of the middle 50% of the data. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1).
-
Quartiles: Quartiles divide an ordered dataset into four equal parts.
- Q1: The value below which 25% of the data falls. Position = (n+1)/4
- Q2: The median, the value below which 50% of the data falls. Position = 2(n+1)/4
- Q3: The value below which 75% of the data falls. Position = 3(n+1)/4
If the position is not an integer, interpolation should be used.
-
Formula:
IQR = Q3 - Q1
-
Practical Application: The IQR is less sensitive to extreme values than the range and provides a more robust measure of dispersion. Useful in skewed distributions.
- Example: In a dataset of property taxes, the IQR can indicate the range of typical tax amounts, excluding unusually high or low values.
-
Measures of Shape
Measures of shape describe the overall form of a distribution, including its symmetry (skewness) and peakedness (kurtosis).
-
Skewness
Skewness measures the asymmetry of a distribution. A symmetrical distribution has zero skewness.
- Left Skew (Negative Skew): The tail extends to the left, and the mean is less than the median. Indicates a concentration of data points on the higher end of the scale.
- Right Skew (Positive Skew): The tail extends to the right, and the mean is greater than the median. Indicates a concentration of data points on the lower end of the scale.
-
Formula: One common measure of skewness is:
Skewness = [Σ(xᵢ - X)³ / n] / S³
where:
- xᵢ = individual data point
- X = sample mean
- n = sample size
- S = sample standard deviation
-
Practical Application: In real estate, skewness can arise in datasets of property values. A right skew might indicate a few very expensive properties skewing the overall distribution, while a left skew might suggest more lower-priced properties.
- Experiment: Gather selling prices of homes in a specific area. Create a histogram. A long tail extending to the right indicates a positive skew.
-
Kurtosis
Kurtosis measures the “peakedness” of a distribution and the thickness of its tails. It describes how concentrated the data is around the mean and the likelihood of extreme values.
- Mesokurtic: A distribution with kurtosis similar to the normal distribution (kurtosis ≈ 3).
- Leptokurtic: A distribution that is more peaked than the normal distribution, with heavier tails (kurtosis > 3). Indicates a higher concentration of values near the mean and more extreme outliers.
- Platykurtic: A distribution that is flatter than the normal distribution, with thinner tails (kurtosis < 3). Indicates a lower concentration of values near the mean and fewer extreme outliers.
-
Formula:
Kurtosis = {[Σ(xᵢ - X)⁴ / n] / S⁴} - 3
where:
- xᵢ = individual data point
- X = sample mean
- n = sample size
- S = sample standard deviation
-
Practical Application: Kurtosis can help assess the stability of property values. A leptokurtic distribution might suggest greater risk, as extreme value changes are more likely.
- Experiment: Compare price volatility of stocks with differing kurtosis. A stock with high kurtosis may be more susceptible to drastic price swings.
-
Box and Whisker Plots
Box and whisker plots (or boxplots) are graphical representations of data that display the five-number summary: minimum, Q1, median, Q3, and maximum. They provide a visual representation of the distribution’s shape, including skewness and potential outliers.
- Interpretation: The length of the box (IQR) indicates the spread of the middle 50% of the data. The position of the median within the box reveals skewness. Whiskers extending from the box show the range of the data (excluding outliers), and outliers are typically plotted as individual points beyond the whiskers.
- Practical Application: Boxplots are useful for comparing the distributions of property values across different neighborhoods or time periods.
- Example: Comparing boxplots of appraisal values performed by two different appraisers can reveal biases or inconsistencies in their valuations.
Assessing Normality
Normality is a crucial assumption for many statistical tests. While perfectly normal data is rare in real-world appraisal, assessing the degree of departure from normality is essential.
-
Visual Inspection: Histograms, boxplots, and normal probability plots (Q-Q plots) provide visual assessments of normality.
-
Normal Probability Plots (Q-Q Plots): Plot observed data values against the expected values from a normal distribution. If the data is normally distributed, the points will fall along a straight line. Departures from the line indicate deviations from normality.
-
Statistical Tests for Normality:
- Kolmogorov-Smirnov Test: Tests whether a sample comes from a specific distribution.
- Shapiro-Wilk Test: A powerful test specifically designed for normality.
-
P-value: The p-value represents the probability of observing the given sample if the null hypothesis (data is normally distributed) is true. A low p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, suggesting that the data is not normally distributed.
Parametric vs. Nonparametric Statistics
The distributional characteristics of the data influence the choice between parametric and nonparametric statistical methods.
- Parametric Statistics: Assume that the data follows a specific distribution (e.g., normal distribution). These tests (e.g., t-tests, ANOVA) are generally more powerful if the assumptions are met.
- Nonparametric Statistics: Do not rely on specific distributional assumptions. These tests (e.g., Mann-Whitney U test, Kruskal-Wallis test) are suitable when the data is not normally distributed or when dealing with small sample sizes.
Central Limit Theorem
The central limit theorem (CLT) states that the distribution of sample means will approach a normal distribution as the sample size increases, regardless of the shape of the population distribution. This is pivotal for inference, because even if the original data distribution is not normal, the sampling distribution of the mean will tend towards normality as sample size grows.
Conclusion
Understanding measures of dispersion and shape is fundamental for analyzing appraisal data effectively. These measures provide insights into the variability and distributional characteristics of datasets, guiding the selection of appropriate statistical methods and enabling meaningful interpretations. By carefully assessing dispersion, skewness, kurtosis, and normality, appraisers can enhance the reliability and validity of their analyses and conclusions.
Chapter Summary
This chapter on “Measures of Dispersion and Shape in Appraisal Data” from a real estate appraisal statistical analysis training course focuses on understanding the variability and distribution characteristics of appraisal data to improve the accuracy and reliability of statistical inferences. The core scientific points cover measures of dispersion, including standard deviation and variance, which quantify the spread of data around the mean. Standard deviation, specifically, is highlighted for its utility in further statistical analysis and inference making. The chapter explains how these measures can be compared to known distributions, like the normal distribution, to assess the suitability of parametric statistical methods. The coefficient of variation (CV) is introduced as a tool for comparing relative dispersion across different datasets by standardizing the standard deviation to the sample mean. The range and interquartile range are presented as additional measures of dispersion, offering simpler methods for assessing data spread.
The discussion extends to measures of shape, primarily skewness and kurtosis, which are crucial for determining the symmetry and peakedness of the data distribution, respectively. Skewness reveals the concentration of data (left or right skew), while kurtosis indicates the height and tail thickness of the distribution. Box and whisker plots and histograms are used as visual aids to assess skewness. Quantitative tests and normal probability plots are employed to assess the departure from normality.
The chapter emphasizes the importance of assessing normality because many parametric statistical tests rely on the assumption of normally distributed data. It acknowledges that real property data often comes in small sample sizes. The chapter also touches on the use of nonparametric statistics as valid regardless of the underlying population data distribution in cases where data significantly deviates from normality or when dealing with small sample sizes where the central limit theorem cannot be reliably applied. When extreme values distort the arithmetic mean, the median is seen as a better indicator of central tendency.
In conclusion, the chapter underscores that understanding data dispersion and shape is essential for selecting appropriate statistical methods and drawing valid inferences in real estate appraisal. By accurately assessing these characteristics, appraisers can improve the robustness and reliability of their analyses, whether employing parametric or nonparametric approaches.