Measures of Dispersion and Shape: Inferential Statistics in Real Estate Appraisal

Chapter: Measures of Dispersion and Shape: Inferential Statistics in Real Estate Appraisal
This chapter delves into measures of dispersion and shape, crucial components of inferential statistics that are essential for real estate appraisal. We will explore how these measures help us understand the variability and distribution of data, and how this knowledge informs our ability to make inferences about larger populations based on sample data.
1. Introduction to Measures of Dispersion
Measures of dispersion quantify the amount of variation or spread within a dataset. They provide insights into how closely the data points cluster around the central tendency (e.g., mean or median). Comparing dispersion measures against known distributions like the normal distribution is vital for determining the appropriateness of parametric statistical methods.
- Importance of Dispersion:
- Indicates the homogeneity or heterogeneity of data.
- Helps assess the reliability of measures of central tendency.
- Allows comparison of variability between different datasets.
- Informs the selection of appropriate statistical tests (parametric vs. non-parametric).
2. Standard Deviation and Variance
These are the two fundamental measures of dispersion, considering the distribution of all data points. Standard deviation (SD) is particularly important because it allows for further statistical manipulation and inference, helping us understand the uncertainty associated with our estimations.
2.1. Definitions and Formulas
- Variance: A measure of the average squared deviation of data points from the mean.
-
Population Variance (ฯยฒ):
ฯยฒ = ฮฃ[(xแตข - ฮผ)ยฒ] / N
Where:
* xแตข represents each individual data point.
* ฮผ is the population mean.
* N is the population size.
* ฮฃ indicates summation across all data points.
* Sample Variance (Sยฒ):Sยฒ = ฮฃ[(xแตข - X)ยฒ] / (n - 1)
Where:
* xแตข represents each individual data point in the sample.
* X is the sample mean.
* n is the sample size.
* The denominator (n-1) represents the degrees of freedom.
* Standard Deviation: The square root of the variance. It represents the average distance of data points from the mean in the original units of measurement.
* Population Standard Deviation (ฯ):ฯ = โ[ฮฃ[(xแตข - ฮผ)ยฒ] / N]
* Sample Standard Deviation (S):S = โ[ฮฃ[(xแตข - X)ยฒ] / (n - 1)]
-
2.2. Practical Application and Interpretation
A larger standard deviation indicates greater variability in the dataset. In real estate appraisal, a high standard deviation of comparable sales prices suggests a less reliable indication of value for the subject property, meaning that there is a high level of variance in the market that can affect the predictability of sales price.
2.3 Example
Let’s use the garden-level apartment rents data given by the PDF file. We have the following sample:
Monthly Rent: \$600, \$650, \$695, \$710, \$715, \$730, \$735, \$735, \$760, \$760, \$785, \$800, \$800, \$805, \$815, \$820, \$820, \$825, \$825, \$825, \$825, \$850, \$850, \$850, \$850, \$850, \$850, \$860, \$860, \$890, \$890, \$920, \$920, \$930, \$970, \$995
-
Calculate the Sample Mean (X):
X = \$29,370/36 = \$815.83 -
Calculate the Sample Standard Deviation (S):
S = โ[ฮฃ[(xแตข - X)ยฒ] / (n - 1)] = \$84.71
3. Coefficient of Variation
The coefficient of variation (CV) is a relative measure of dispersion. It expresses the standard deviation as a percentage of the mean. This is particularly useful when comparing the variability of datasets with different units or substantially different means.
3.1. Formula
CV = (S / X) * 100%
Where:
- S is the sample standard deviation.
- X is the sample mean.
3.2. Interpretation
A higher CV indicates greater relative variability.
3.3 Example
Using the apartment rent data, the coefficient of variation is:
CV = (\$84.71 / \$815.83) * 100% = 10.38%
This allows to directly compare the variabily with other samples.
4. Range
The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in the dataset.
4.1. Formula
Range = Maximum Value - Minimum Value
4.2. Limitations
- Sensitive to outliers.
- Only considers the extreme values, ignoring the distribution of the rest of the data.
4.3. Example
For the apartment rent data, the range is:
Range = \$995 - \$600 = \$395
5. Interquartile Range
The interquartile range (IQR) measures the spread of the middle 50% of the data. It is more robust to outliers than the range.
5.1. Quartiles
Quartiles divide the ordered dataset into four equal parts:
- Q1 (First Quartile): The value below which 25% of the data falls.
- Q2 (Second Quartile): The median, the value below which 50% of the data falls.
- Q3 (Third Quartile): The value below which 75% of the data falls.
5.2. Calculation
IQR = Q3 - Q1
5.3. Decision Rules
- Rule 1: If the position point calculation is an integer, then the ordered observation occupying that position point is the quartile boundary.
- Rule 2: If the position point is halfway between two integers, then the midpoint between the next-largest and next-smallest ordered observation is the quartile boundary.
- Rule 3: If the position point is neither an integer nor halfway between two integers, then the position point is rounded to the nearest integer and the corresponding ordered observation is the quartile boundary.
5.4. Example
Using the apartment rent data (36 observations):
-
Calculate the positions of Q1, Q2, and Q3:
- Q1 position = (37 + 4) = 9.25, rounded down to 9th ordered observation
- Q2 = median
- Q3 position = 3(37 + 4) = 27.75, rounded up to the 28th ordered observation
-
Identify the corresponding values:
- Q1 = \$760
- Q2 = \$825
- Q3 = \$860
-
Calculate the IQR:
- IQR = \$860 - \$760 = \$100
6. Measures of Shape
Measures of shape describe the overall form of a data distribution. These are essential to determine how close a data distribution is to normal and the extent to which extreme values are distorting the difference between the median and the mean. The most important measures of shape are skewness and kurtosis.
6.1. Skewness
Skewness refers to the asymmetry of a distribution.
- Symmetrical Distribution: Mean = Median, Skewness = 0.
- Left-Skewed (Negatively Skewed): Mean < Median, Skewness < 0 (tail extends to the left).
- Right-Skewed (Positively Skewed): Mean > Median, Skewness > 0 (tail extends to the right).
Formula:
Skewness = {n / [(n-1)(n-2)]} * ฮฃ[(xแตข - X) / S]ยณ
Where:
* X = sample mean
* n = sample size
* S = sample standard deviation
6.2. Kurtosis
Kurtosis describes the “peakedness” or “tailedness” of a distribution.
- Mesokurtic: Kurtosis โ 3 (normal distribution).
- Leptokurtic: Kurtosis > 3 (more peaked, heavier tails).
- Platykurtic: Kurtosis < 3 (flatter, lighter tails).
6.3 Example
The Skewness for the apartment rent data is -0.312 (Excel formula: SKEW.P). Kurtosis = 0.42 (Excel formula: KURT). It is important to consider if the data is close to normal.
7. Assessing Normality
Many statistical tests rely on the assumption of normality. Several methods can be used to assess whether a dataset is approximately normally distributed:
- Visual Inspection:
- Histograms: Check for a bell-shaped curve.
- Box Plots: Examine symmetry and outliers.
- Normal Probability Plots (Q-Q Plots): Data points should fall close to a straight line.
- Numerical Measures:
- Skewness and Kurtosis: Values should be close to zero (for skewness) and three (for kurtosis).
- Statistical Tests:
- Kolmogorov-Smirnov Test: Tests the null hypothesis that the data comes from a specified distribution (e.g., normal).
- Shapiro-Wilk Test: Another test for normality, often more powerful than Kolmogorov-Smirnov for small sample sizes.
- Standard deviation ranges of normality:
- Approximately 68% of the observations are expected to lie within ยฑ 1 standard deviation of the mean.
- Approximately 80% within ยฑ 1.28 standard deviations of the mean.
- Approximately 95% within ยฑ 2 standard deviations of the mean.
7.1. Interpreting Normality Test Results
The p-value from normality tests indicates the probability of observing the data if it were drawn from a normal distribution. A small p-value (typically less than 0.05) suggests that the data is not normally distributed.
8. Parametric vs Nonparametric Statistics
-
Parametric statistics: are based on the assumption that the data follows a specific distribution, such as the normal distribution. These tests are more powerful and sensitive than nonparametric tests, but they require that the underlying assumptions of the test are met.
-
Nonparametric statistics: do not rely on any assumptions about the distribution of the data. These tests are useful when the data does not follow a normal distribution, when the sample size is small, or when the data is measured on a nominal or ordinal scale.
9. Central Limit Theorem and Inference
The Central Limit Theorem (CLT) states that the distribution of sample means will approach a normal distribution as the sample size increases, regardless of the shape of the population distribution. This is crucial for inferential statistics, as it allows us to make inferences about population parameters (e.g., mean) based on sample data, even if the population distribution is not normal.
- Importance of CLT in Appraisal: Even if the distribution of property values is not perfectly normal, the distribution of sample means of comparable sales will tend to be normal, allowing us to use parametric statistical tests for inference.
10. Conclusion
Measures of dispersion and shape are critical tools for understanding the characteristics of real estate data. By carefully analyzing these measures, appraisers can:
- Assess the reliability of their estimates.
- Select appropriate statistical methods.
- Make sound inferences about property values and market trends.
- Understand the uncertainty associated with these estimations.
This knowledge is essential for making informed decisions and providing credible appraisal opinions.
Chapter Summary
This chapter, “Measures of Dispersion and Shape: Inferential Statistics in Real Estate Appraisal,” focuses on how these statistical concepts are crucial for making inferences about property values and market trends based on sample data. Understanding dispersion and shape allows appraisers to determine the appropriateness of parametric statistical methods that assume a normal distribution.
Measures of Dispersion: The chapter emphasizes standard deviation and variance as fundamental measures that quantify the variability within a dataset. Standard deviation is particularly important because it facilitates further statistical analysis and allows for statements about the uncertainty associated with inferences. The chapter highlights how the standard deviation can be used to determine if the data is close to normally distributed. The coefficient of variation (CV) is introduced as a tool for comparing dispersion across different datasets by standardizing it to each sample’s mean. The range and interquartile range (IQR) are presented as additional measures of dispersion, useful for understanding the spread of data.
Measures of Shape: These are critical for assessing the normality of data distribution and whether extreme values unduly influence the relationship between the mean and median. Skewness, indicating asymmetry, and kurtosis, reflecting the “peakedness” of the distribution, are discussed. The chapter details how to visually assess skewness using box and whisker plots and histograms and how to interpret skewness values. It explains that data far from normally distributed may invalidate inferential tests relying on normality assumptions (e.g., t-tests, F-tests) when sample sizes are small.
Inferential Statistics and Normality: The chapter emphasizes that many common inferential statistical tests rely on the assumption that data is normally distributed. While data is seldom perfectly normal, the measures of shape and quantitative normality tests such as the Kolmogorov-Smirnov test, can assist in determining if the departures from normality are extreme enough to preclude the use of parametric tests. The p-value from such tests is used to determine the probability of obtaining the observed sample if the underlying population were truly normal. A low p-value (typically below 5%) suggests the data may not be from a normally distributed population. In such cases, or when dealing with very small samples and unknown population distributions, the chapter suggests considering non-parametric statistical tests to infer population characteristics.
Stratified Random Sampling: The use of stratified random sampling is described as a technique to improve inferences on population parameters. By ensuring that the sample’s composition mirrors known proportions within the population (e.g., bedroom configurations in apartment units), stratified sampling can reduce variability and improve the accuracy of estimations.
Implications for Real Estate Appraisal: The chapter underscores that real property datasets often involve small samples. Therefore, understanding measures of dispersion and shape is vital for selecting appropriate statistical techniques. Appraisers can use these tools to assess the reliability of their sample data, determine whether parametric or non-parametric tests are suitable, and ultimately make more informed inferences about property values and market trends. By examining data distributions and identifying skewness or kurtosis, appraisers can better understand potential biases and uncertainties in their analyses.