Dispersion, Shape, and Normality in Appraisal Data

Chapter 14: Dispersion, Shape, and Normality in Appraisal Data
Introduction
This chapter explores the concepts of dispersion, shape, and normality in the context of real estate appraisal data. Understanding these statistical properties is crucial for selecting appropriate analytical techniques and making valid inferences about property values.
14.1 Measures of Dispersion
Measures of dispersion quantify the variability or spread of data. They are essential for comparing different datasets and assessing the reliability of statistical inferences.
14.1.1 Standard Deviation and Variance
The standard deviation (S for a sample, σ for a population) and variance (S², σ²) are fundamental measures of dispersion, taking into account all data points.
- Standard Deviation: Represents the average distance of data points from the mean.
-
Population Standard Deviation (σ):
σ = √[ Σ(xᵢ - μ)² / N ]
where:
- xᵢ = individual data point
- μ = population mean
- N = population size
- Sample Standard Deviation (S):
S = √[ Σ(xᵢ - X)² / (n - 1) ]
where:
- xᵢ = individual data point
- X = sample mean
- n = sample size
Note the use of (n-1) in the sample standard deviation formula. This is Bessel’s correction, used to provide an unbiased estimate of the population standard deviation when using sample data.
2. Variance: The square of the standard deviation. It represents the average squared deviation from the mean.
* Population Variance (σ²):σ² = Σ(xᵢ - μ)² / N
* Sample Variance (S²):S² = Σ(xᵢ - X)² / (n - 1)
3. Example: Using the garden apartment rent data, the sample standard deviation can be calculated as shown in Table 14.2 in the provided document.
-
14.1.2 Coefficient of Variation
The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean, allowing for comparisons of dispersion between datasets with different units or scales.
-
Formula:
CV = (S / X) * 100
where:
- S = sample standard deviation
- X = sample mean
-
Application: A higher CV indicates greater relative dispersion.
14.1.3 Range
The range is the difference between the maximum and minimum values in a dataset.
-
Calculation: Range = Maximum Value - Minimum Value
-
Interpretation: A larger range indicates greater variability.
-
Relationship to Standard Deviation: In a normal distribution, the range is approximately equal to 6 standard deviations (+3S to -3S).
14.1.4 Interquartile Range
The interquartile range (IQR) measures the spread of the middle 50% of the data.
-
Quartiles: Divide the ordered data into four equal parts.
- Q1 (First Quartile): The value below which 25% of the data falls.
- Q2 (Second Quartile): The median, below which 50% of the data falls.
- Q3 (Third Quartile): The value below which 75% of the data falls.
-
IQR Calculation: IQR = Q3 - Q1
-
Determining Quartiles
- Position Point Calculation:
- Q1 = (n+1)/4 ordered observation
- Q2 = median
- Q3 = 3(n+1)/4 ordered observation
- Decision Rules:
- If the position point calculation is an integer, then the ordered observation occupying that position point is the quartile boundary.
- If the position point is halfway between two integers, then the midpoint between the next-largest and next-smallest ordered observation is the quartile boundary.
- If the position point is neither an integer nor halfway between two integers, then the position point is rounded to the nearest integer and the corresponding ordered observation is the quartile boundary.
- Position Point Calculation:
- Relationship to Standard Deviation: For a normal distribution, the IQR is approximately equal to 1.33 standard deviations.
14.2 Measures of Shape
Measures of shape describe the symmetry and peakedness of a data distribution. These measures are essential for determining how closely a data distribution resembles a normal distribution.
14.2.1 Skewness
Skewness measures the asymmetry of a distribution.
-
Types of Skewness:
- Symmetrical Distribution: Mean = Median, skewness = 0.
- Left-Skewed (Negative Skew): Mean < Median, skewness < 0. Data is concentrated on the right side of the distribution. The tail extends to the left.
- Right-Skewed (Positive Skew): Mean > Median, skewness > 0. Data is concentrated on the left side of the distribution. The tail extends to the right.
-
Skewness Calculation:
Skewness = [ n / ((n-1)(n-2)) ] * Σ [ (xᵢ - X) / S ]³
where:
- X = sample mean
- n = sample size
- S = sample standard deviation
- xᵢ = individual data point
-
Graphical Depiction: Histograms and box plots visually represent skewness.
14.2.2 Kurtosis
Kurtosis measures the “peakedness” of a distribution and the thickness of its tails.
-
Types of Kurtosis:
- Mesokurtic: Kurtosis ≈ 3. Normal distribution.
- Leptokurtic: Kurtosis > 3. More peaked than normal with heavier tails.
- Platykurtic: Kurtosis < 3. Flatter than normal with thinner tails.
-
Software Calculation: Kurtosis is typically calculated using statistical software packages (e.g., Excel, Minitab, SPSS).
-
Interpretation: Higher kurtosis indicates a distribution with more extreme values.
14.3 Normality
Normality refers to whether a dataset follows a normal distribution (bell curve). Many statistical tests assume normality.
14.3.1 Assessing Normality
Several methods can be used to assess normality:
- Visual Inspection: Histograms, box plots, and normal probability plots.
- Measures of Shape: Skewness and kurtosis values close to zero and three, respectively, suggest normality.
-
Empirical Rule: In a normal distribution, approximately:
- 68% of data falls within ±1 standard deviation of the mean.
- 95% of data falls within ±2 standard deviations of the mean.
- 99.7% of data falls within ±3 standard deviations of the mean.
-
Normality Tests: Formal statistical tests, such as the Kolmogorov-Smirnov test, Shapiro-Wilk test, and Anderson-Darling test, can be used to assess normality.
- Hypothesis Testing: Normality tests involve hypothesis testing. The null hypothesis (H₀) is that the data is normally distributed. A low p-value (typically < 0.05) indicates that the null hypothesis should be rejected, suggesting that the data is not normally distributed.
-
Normal Probability Plots (Q-Q Plots): These plots graph the data against the expected values from a normal distribution. If the data is normally distributed, the points will fall approximately along a straight line. Deviations from the line indicate departures from normality.
14.4 Parametric vs. Nonparametric Statistics
The assumption of normality is critical for parametric statistical tests.
-
Parametric Statistics: These tests (e.g., t-tests, ANOVA) assume that the data is normally distributed. If the data significantly deviates from normality, the results of these tests may be unreliable, particularly with small sample sizes.
-
Nonparametric Statistics: These tests (e.g., Mann-Whitney U test, Wilcoxon signed-rank test) do not assume normality. They can be used when the data is not normally distributed or when the sample size is small.
14.5 Central Limit Theorem and Inference
The Central Limit Theorem (CLT) states that the distribution of sample means approaches a normal distribution, regardless of the underlying population distribution, as the sample size increases.
-
Implications: The CLT allows us to use parametric statistics to make inferences about population means, even if the population is not normally distributed, provided the sample size is sufficiently large (typically n > 30).
-
Practical Application: In appraisal, the CLT can be used to estimate the average market value of properties, even if individual property values are not normally distributed, by collecting a sufficiently large sample of comparable sales.
Conclusion
Understanding dispersion, shape, and normality is essential for sound statistical analysis of appraisal data. By assessing these properties, appraisers can select appropriate statistical techniques and draw valid conclusions about property values and market trends. When the assumptions of parametric tests are not met, nonparametric methods provide alternative tools for analysis.
Chapter Summary
This chapter, “Dispersion, Shape, and Normality in Appraisal Data,” focuses on understanding and analyzing the distribution of appraisal data to determine the appropriateness of various statistical inference methods. It covers key concepts and techniques related to data dispersion, shape, and normality testing, emphasizing their implications for real estate appraisal.
The chapter begins by explaining measures of dispersion, specifically standard deviation and variance, highlighting their importance in gauging the variability within a dataset. Standard deviation, in particular, is emphasized for its ability to facilitate further statistical treatment and inference. It also introduces the coefficient of variation as a tool for comparing dispersion across different datasets by standardizing it to each sample’s mean. The range and interquartile range are presented as additional measures of dispersion, with their relationship to the standard deviation under a normal distribution being discussed.
The chapter then delves into measures of shape, with a focus on skewness and kurtosis. Skewness indicates the asymmetry of the distribution, with left-skewness signifying a concentration of data on the right and a mean less than the median, and vice versa for right-skewness. Kurtosis describes the “peakedness” of the distribution, with higher kurtosis indicating a more peaked distribution and lower kurtosis indicating a flatter distribution. Box and whisker plots and histograms are introduced as visual tools for assessing skewness.
Normality is a key concept, as many parametric statistical tests rely on the assumption that the data is normally distributed. While appraisal data is seldom perfectly normal, the chapter provides methods for assessing the degree of departure from normality. These include examining the proportions of observations within specific standard deviations from the mean, analyzing the range and interquartile range relative to expected values under normality, and employing quantitative normality tests like the Kolmogorov-Smirnov (KS) test. Normal probability plots are introduced as a visual aid, where deviations from a straight line indicate departures from normality. The importance of the p-value from normality tests is explained: a low p-value suggests that the data may not have been drawn from a normally distributed population.
The chapter concludes by emphasizing that understanding the distribution of appraisal data is crucial for selecting appropriate statistical methods. If data deviates significantly from normality, particularly with small sample sizes, nonparametric tests or inferences on medians (rather than means) may be more suitable. The chapter highlights the importance of being able to apply a non-parametric test when working with small-sized samples. Ultimately, the principles discussed help appraisers make sound inferences and draw meaningful conclusions from their data.