Dispersion, Shape, and Normality in Appraisal Data

Chapter: Dispersion, Shape, and Normality in Appraisal Data

Introduction

This chapter explores measures of dispersion, shape, and normality, which are crucial for understanding and interpreting appraisal data. These concepts help determine the suitability of various statistical methods and provide insights into the characteristics of the underlying population.

1. Measures of Dispersion

Measures of dispersion quantify the spread or variability within a dataset. They are essential for comparing data sets and determining whether parametric inferential statistics can be used.

1.1. Standard Deviation and Variance

The standard deviation and variance are fundamental measures of dispersion that consider all data points in a dataset.

Standard Deviation: Measures the typical deviation of data points from the mean. A higher standard deviation indicates greater variability.
- Population Standard Deviation (σ):
  σ = √[ Σ(xi - μ)² / N ]
  
  where:
  - xi = individual data point
  - μ = population mean
  - N = population size
- Sample Standard Deviation (S):
  S = √[ Σ(xi - X)² / (n - 1) ]
  
  where:
  - xi = individual data point
  - X = sample mean
  - n = sample size
Variance: The square of the standard deviation, providing a measure of the average squared deviation from the mean.
- Population Variance (σ²):
  σ² = Σ(xi - μ)² / N
- Sample Variance (S²):
  S² = Σ(xi - X)² / (n - 1)

Example:

Consider a sample of monthly rents for garden-level apartments: $600, $650, $695, $710, $715, $730, $735, $735, $760, $760, $785, $800, $800, $805, $815, $820, $820, $825, $825, $825, $825, $850, $850, $850, $850, $850, $850, $860, $860, $890, $890, $920, $920, $930, $970, $995.

The sample mean (X) is $815.83.

The sample standard deviation (S) is $84.71 (calculated as in provided “Table 14.2 Sample Standard Deviation (S) Calculation”).

The sample variance (S²) is $7175.76.

Practical Application:

In appraisal, standard deviation can be used to evaluate the consistency of comparable sales data. A high standard deviation might indicate significant differences among the comparables, requiring careful consideration of adjustments.

1.2. Coefficient of Variation

The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean. It is useful for comparing the variability of different datasets, especially when the means are different.

Formula:
CV = (S / X) * 100%

where:
- S = sample standard deviation
- X = sample mean

Example:

For the apartment rent data, the CV is:
CV = ($84.71 / $815.83) * 100% = 10.38%

Practical Application:

The CV allows appraisers to compare the relative variability of different property characteristics (e.g., price per square foot vs. gross rent multiplier) to identify which factors exhibit the most dispersion.

1.3. Range

The range is the difference between the maximum and minimum values in a dataset. It is a simple measure of dispersion but is sensitive to outliers.

Calculation:
Range = Maximum Value - Minimum Value

Example:

For the apartment rent data, the range is:
Range = $995 - $600 = $395

Practical Application:

The range provides a quick assessment of the overall spread of data. It can be used to identify potential errors or unusual values that warrant further investigation.

1.4. Interquartile Range

The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). It represents the range of the middle 50% of the data and is less sensitive to outliers than the range.

Quartiles:
- Q1: The value below which 25% of the data falls.
- Q2 (Median): The value below which 50% of the data falls.
- Q3: The value below which 75% of the data falls.
Calculating Quartile Positions:
Q1 = (n+1)/4 ordered observation
Q2 = median
Q3 = 3(n+1)/4 ordered observation
Calculation:
IQR = Q3 - Q1

Example:

For the apartment rent data:
* Q1 = $760 (The 9th ordered observation)
* Q2 = $825
* Q3 = $860 (The 28th ordered observation)

IQR = $860 - $760 = $100

Practical Application:

The IQR helps assess the spread of the central portion of the data and is useful for identifying potential outliers.

2. Measures of Shape

Measures of shape describe the symmetry and peakedness of a distribution. They are crucial for assessing how closely a dataset resembles a normal distribution.

2.1. Skewness

Skewness measures the asymmetry of a distribution.

Symmetrical Distribution: Mean = Median; Skewness = 0.
Left-Skewed (Negative Skewness): Mean < Median; the tail is longer on the left side.
Right-Skewed (Positive Skewness): Mean > Median; the tail is longer on the right side.
Formula:

Skewness = [n / ((n - 1) * (n - 2))] * Σ[(xi - X) / S]³

where:
- xi = individual data point
- X = sample mean
- n = sample size
- S = sample standard deviation

Example:

For the apartment rent data, skewness is approximately -0.312, indicating slight left skewness.

Practical Application:

In appraisal, skewness can indicate whether sales prices are clustered around the lower or higher end of the market. Positive skewness may indicate an abundance of higher-priced properties.

2.2. Kurtosis

Kurtosis measures the “peakedness” of a distribution.

Mesokurtic: Kurtosis = 3 (Normal distribution).
Leptokurtic: Kurtosis > 3 (More peaked than a normal distribution).
Platykurtic: Kurtosis < 3 (Less peaked than a normal distribution).

Practical Application:

Kurtosis provides insights into the concentration of data around the mean and the presence of extreme values. High kurtosis might indicate a market with fewer properties clustered near the average.

3. Assessing Normality

Normality refers to whether the distribution of a dataset resembles a normal (bell-shaped) distribution. Many statistical tests rely on the assumption of normality.

3.1. Visual Inspection

Histograms: Visually assess the shape of the distribution.
Box Plots: Examine skewness and identify potential outliers.
Normal Probability Plots (Q-Q Plots): Assess how closely the data points fall along a straight line. Deviations from the line indicate departures from normality.

3.2. Quantitative Tests

Kolmogorov-Smirnov (K-S) Test: Tests whether a sample comes from a specified distribution.
Shapiro-Wilk Test: Tests whether a sample comes from a normally distributed population (more powerful than K-S test for many situations).
Anderson-Darling Test: Another test for normality.

Practical Application:

In appraisal, normality tests can help determine if parametric statistical tests, such as t-tests and F-tests, can be used reliably with a particular dataset. If the data is not normally distributed, non-parametric tests may be more appropriate.

3.3. Central Limit Theorem

The central limit theorem (CLT) states that the distribution of sample means will approach a normal distribution as the sample size increases, regardless of the underlying population distribution.

Practical Application:

The CLT allows appraisers to use parametric tests on sample means even if the underlying population is not normally distributed, provided the sample size is sufficiently large (typically n > 30).

4. Parametric vs. Nonparametric Statistics

Parametric Statistics: Assume that the data comes from a specific distribution (often the normal distribution). They are generally more powerful than nonparametric tests when their assumptions are met.
Nonparametric Statistics: Do not require assumptions about the underlying population distribution. They are useful for small samples or when the normality assumption is violated.

Conclusion

Understanding dispersion, shape, and normality is essential for proper data analysis in real estate appraisal. These concepts help determine the appropriate statistical methods to use and provide valuable insights into the characteristics of appraisal data.

Chapter Summary

Summary: Dispersion, Shape, and Normality in Appraisal Data

This chapter focuses on understanding the distribution of appraisal data and its implications for statistical analysis. It emphasizes the importance of examining dispersion, shape, and normality to determine the appropriateness of different statistical methods.

Main Scientific Points:

Dispersion: Measures of dispersion, such as standard deviation and variance, quantify the variability within a dataset. A higher standard deviation indicates greater dispersion. The coefficient of variation (CV) allows for relative comparisons of dispersion between different datasets by standardizing the standard deviation to the sample mean. The range and interquartile range provide simpler measures of spread.
Shape: Measures of shape, particularly skewness and kurtosis, describe the symmetry and peakedness of the distribution. Skewness indicates whether the data is concentrated more on one side of the mean (left-skewed or right-skewed). Kurtosis describes the “peakedness” of the distribution relative to a normal distribution (mesokurtic, leptokurtic, or platykurtic). Box and whisker plots and histograms are useful visual tools for assessing shape.
Normality: Normality refers to whether the data follows a normal distribution. While appraisal data may approximate a normal distribution, it is seldom perfect. Quantitative tests for normality and normal probability plots assess the degree of departure from normality.
Parametric vs. Nonparametric Statistics: The chapter differentiates between parametric and nonparametric statistical methods. Parametric statistics rely on assumptions about the underlying population distribution (often normality), while nonparametric statistics do not. Nonparametric methods are particularly useful for small sample sizes or when the population distribution is unknown.
Central Tendency: The chapter discusses the importance of understanding the central tendency of the data and how measures of shape, such as skewness, can influence the appropriateness of using the mean versus the median as an indicator of central tendency.

Conclusions:

Analyzing dispersion, shape, and normality is crucial for selecting appropriate statistical techniques for appraisal data.
Data that deviates significantly from normality may require nonparametric methods, especially with small sample sizes.
Extreme values and skewness can distort the mean, making the median a more suitable measure of central tendency in some cases.

Implications:

Appraisers must understand these concepts to properly analyze market data, support their opinions, and avoid misapplication of statistical techniques.
The choice of statistical methods directly impacts the validity and reliability of appraisal conclusions.
By understanding the characteristics of their data, appraisers can make more informed decisions about which statistical tools to use and how to interpret the results.

Dispersion, Shape, and Normality in Appraisal Data

Chapter Summary

Explanation:

-:

Your Progress

Google Schooler Resources: Exploring Academic Links

Explore Related Research

Scientific Tags and Keywords: Deep Dive into Research Areas

Dispersion, Shape, and Normality in Appraisal Data

Chapter Summary

Explanation:

-:

Your Progress

Related Course Chapters:

Related Articles

English Articles

Google Schooler Resources: Exploring Academic Links

Explore Related Research

Scientific Tags and Keywords: Deep Dive into Research Areas