Dispersion, Shape, and Normality in Appraisal Data

Dispersion, Shape, and Normality in Appraisal Data

Dispersion, Shape, and Normality in Appraisal Data

Introduction

This chapter delves into the crucial aspects of data analysis that inform the reliability and validity of real estate appraisal conclusions. Understanding the dispersion, shape, and normality of appraisal data is essential for selecting appropriate statistical methods and drawing accurate inferences about property values.

Measures of Dispersion

Measures of dispersion quantify the variability within a dataset. They provide insight into how spread out the data points are around the central tendency, informing the choice of statistical methods.

Standard Deviation and Variance

Standard deviation and variance are fundamental measures of dispersion that consider all data points in a dataset.

  1. Variance: Measures the average squared deviation of each data point from the mean.

    • Population Variance (σ2): The average squared deviation from the population mean.
      σ2 = Σ(xi - μ)2 / N
      where:
      xi = each individual data point in the population
      μ = population mean
      N = population size
    • Sample Variance (S2): An estimate of the population variance calculated from a sample. It uses n-1 in the denominator to provide an unbiased estimate.
      S2 = Σ(xi - X)2 / (n-1)
      where:
      xi = each individual data point in the sample
      X = sample mean
      n = sample size
  2. Standard Deviation: The square root of the variance. Expressed in the same units as the original data. This makes the standard deviation easier to interpret than the variance.

    • Population Standard Deviation (σ): The square root of the population variance.
      σ = √[Σ(xi - μ)2 / N]
    • Sample Standard Deviation (S): The square root of the sample variance.
      S = √[Σ(xi - X)2 / (n-1)]

    Example: Given a sample of sales prices for comparable properties, calculating the standard deviation helps understand how much individual sales prices deviate from the average sales price. A high standard deviation suggests a wide range of prices, potentially indicating a less homogenous market or the presence of outliers.

Coefficient of Variation

The coefficient of variation (CV) is a relative measure of dispersion, expressing the standard deviation as a percentage of the mean. It is useful for comparing the variability of datasets with different units or scales.

CV = (S / X) * 100

where:
S = sample standard deviation
X = sample mean

Example: Comparing the CV of sale prices in two different neighborhoods. Even if the standard deviations are different, the CV reveals which neighborhood has relatively more price variability compared to its average sale price.

Range

The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset.

Range = Maximum Value - Minimum Value

While easy to calculate, the range is highly sensitive to outliers and doesn’t provide information about the distribution of data points within the range.

Example: The range of prices of comparable properties indicates the total price spread.

Interquartile Range

The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1) of a dataset. It represents the range containing the middle 50% of the data. It is less sensitive to extreme values than the range.

IQR = Q3 - Q1

Quartile Calculation:

  1. Order the data from smallest to largest.
  2. Calculate Quartile Positions:

    • Q1 position = (n + 1) / 4
    • Q2 position (Median) = (n + 1) / 2
    • Q3 position = 3(n + 1) / 4

    where n is the sample size.

  3. Determine Quartile Values:

    • If the position is an integer, the quartile is the value at that position in the ordered data.
    • If the position is not an integer, use linear interpolation between the two adjacent values in the ordered data.

Example: If you have 10 data points of prices of comparable properties:
\$200,000, \$220,000, \$230,000, \$240,000, \$250,000, \$260,000, \$270,000, \$280,000, \$290,000, \$300,000

Q1 position = (10+1)/4 = 2.75
Q3 position = 3(10+1)/4 = 8.25

Q1 = \$220,000 + 0.75(\$230,000-\$220,000) = \$227,500
Q3 = \$280,000 + 0.25
(\$290,000 - \$280,000) = \$282,500

IQR = \$282,500 - \$227,500 = \$55,000

Measures of Shape

Measures of shape describe the overall form of a data distribution, particularly its symmetry and peakedness. This helps assess how well the data fits a normal distribution.

Skewness

Skewness measures the asymmetry of a distribution.

  1. Symmetrical Distribution: Mean = Median. Skewness = 0.
  2. Left-Skewed (Negatively Skewed) Distribution: Mean < Median. Tail extends to the left. Values are concentrated on the higher side of the mean.
  3. Right-Skewed (Positively Skewed) Distribution: Mean > Median. Tail extends to the right. Values are concentrated on the lower side of the mean.

Formula for Sample Skewness:

Skewness = [n / ((n-1)(n-2))] * Σ[(xi - X) / S]3

where:
xi = each individual data point in the sample
X = sample mean
n = sample size
S = sample standard deviation

Example: In appraisal data, sale prices might be right-skewed due to a few high-value properties pulling the mean above the median. This means that most properties sell for less than the average.

Kurtosis

Kurtosis measures the “peakedness” or tail heaviness of a distribution.

  1. Mesokurtic: Kurtosis ≈ 3. Normal distribution.
  2. Leptokurtic: Kurtosis > 3. More peaked than a normal distribution, with heavier tails. Indicates more extreme values.
  3. Platykurtic: Kurtosis < 3. Flatter than a normal distribution, with thinner tails. Indicates fewer extreme values.

Formula for Sample Kurtosis (Excess Kurtosis):
kurtosis = { n(n+1) / (n-1)(n-2)(n-3) } * Σ[ (xi - X)/S]^4 - {3(n-1)^2 / (n-2)(n-3) }

A kurtosis value of 0 indicates that the kurtosis is the same as a normal distribution.

where:
xi = each individual data point in the sample
X = sample mean
n = sample size
S = sample standard deviation

Example: A dataset of land values might be leptokurtic if there are a few exceptionally high-priced parcels.

Box and Whisker Plots

Box and whisker plots are graphical representations of the five-number summary (minimum, Q1, median, Q3, maximum). They provide a visual display of the distribution’s shape, skewness, and potential outliers.

  • The “box” represents the interquartile range (IQR), spanning from Q1 to Q3.
  • A line within the box marks the median.
  • “Whiskers” extend from the box to the minimum and maximum values within a specified range (often 1.5 times the IQR).
  • Outliers are plotted as individual points beyond the whiskers.

Normality

Normality refers to whether a dataset follows a normal distribution. The normal distribution is a symmetrical, bell-shaped distribution characterized by its mean and standard deviation. Many statistical tests assume that the data is normally distributed. If the assumption of normality is violated, the results of these tests may not be reliable.

Assessing Normality

  1. Visual Inspection:

    • Histograms: Should resemble a bell shape.
    • Normal Probability Plots (Q-Q Plots): Data points should fall along a straight diagonal line if the data is normally distributed. Deviations from the line indicate non-normality.
  2. Quantitative Tests:

    • Shapiro-Wilk Test: Tests the null hypothesis that the data comes from a normally distributed population. A small p-value (typically less than 0.05) suggests that the data is not normally distributed, and you reject the null hypothesis.
    • Kolmogorov-Smirnov Test: Another test for normality. Similar interpretation to the Shapiro-Wilk test.
    • Anderson-Darling Test: Tests if a sample of data came from a population with a specific distribution.

Addressing Non-Normality

If data is not normally distributed, several options are available:

  1. Transformations: Apply mathematical functions to the data to make it more normal. Common transformations include:

    • Log Transformation: Useful for right-skewed data.
    • Square Root Transformation: Also useful for right-skewed data, particularly when dealing with counts.
    • Box-Cox Transformation: A more general transformation that can handle various types of non-normality.
  2. Non-Parametric Tests: Use statistical tests that do not rely on the assumption of normality. These tests are often less powerful than parametric tests but are more robust when the normality assumption is violated. Examples include:

    • Mann-Whitney U Test: For comparing two independent groups.
    • Wilcoxon Signed-Rank Test: For comparing two related samples.
    • Kruskal-Wallis Test: For comparing three or more independent groups.
  3. Central Limit Theorem (CLT) If you have a large sample size, the CLT states that the distribution of sample means will approach a normal distribution, regardless of the underlying population distribution. This can justify the use of parametric tests even if the original data is not normally distributed. A sample size of 30 or greater is often cited as a guideline, but the required sample size depends on the extent of non-normality.

Practical Applications in Appraisal

Understanding dispersion, shape, and normality is crucial for:

  1. Selecting Comparable Properties: Identifying and excluding outliers to ensure a homogenous sample.
  2. Adjusting Comparable Sales: Applying appropriate statistical methods to adjust sale prices for differences in features or market conditions.
  3. Determining the Appropriate Weighting of Indicators of Value: Giving more weight to the sales comparison approach if the sales data is reliable (i.e., normally distributed) or adjusting comparable sales based on statistical methods if the data is skewed.
  4. Estimating the Range of Value: Constructing confidence intervals to reflect the uncertainty in the appraisal estimate.

Conclusion

Analyzing the dispersion, shape, and normality of appraisal data is essential for conducting sound and reliable appraisals. By understanding these concepts and applying appropriate statistical methods, appraisers can improve the accuracy and credibility of their value conclusions.

Chapter Summary

This chapter on “Dispersion, Shape, and Normality in Appraisal Data” from a real estate appraisal statistical analysis course covers essential concepts for understanding and interpreting appraisal data. The core focus is on assessing the variability, form, and distribution characteristics of data sets, particularly in relation to the normal distribution.

The chapter begins by discussing measures of dispersion, including standard deviation and variance, emphasizing their role in determining the applicability of parametric statistical methods. Standard deviation, as a fundamental measure, allows for statistical inferences and assessments of uncertainty. The coefficient of variation (CV) is introduced as a tool for comparing dispersion across different data sets by standardizing it relative to the mean. The range and interquartile range are presented as simpler measures of spread, useful for initial data examination.

The shape of the data distribution is then examined using measures like skewness and kurtosis. Skewness indicates the asymmetry of the distribution, differentiating between left-skewed (mean < median) and right-skewed (mean > median) data. Kurtosis describes the “peakedness” of the distribution relative to the normal distribution. Box and whisker plots and histograms are employed as visual aids to assess skewness and overall data shape.

The concept of normality is explored, highlighting that appraisal data rarely perfectly fits a normal distribution. Quantitative tests for normality, such as the Komolgorov-Smirnov test, and normal probability plots are introduced to assess the degree of departure from normality. The p-value associated with these tests helps determine whether the hypothesis of a normally distributed population can be rejected.

The chapter emphasizes the importance of understanding these characteristics to select appropriate statistical methods. If the data deviates significantly from normality, particularly with small sample sizes, nonparametric statistical methods may be more suitable. The chapter highlights that when extreme values distort the mean, the median may be a more representative measure of central tendency.

In conclusion, this chapter equips real estate appraisers with the statistical knowledge to analyze the dispersion, shape, and normality of appraisal data. This understanding is critical for making informed decisions about the selection and application of statistical tests, ultimately leading to more reliable and accurate appraisal inferences.

Explanation:

-:

No videos available for this chapter.

Are you ready to test your knowledge?

Google Schooler Resources: Exploring Academic Links

...

Scientific Tags and Keywords: Deep Dive into Research Areas