Measures of Central Tendency, Dispersion, and Shape in Real Estate Appraisal

Measures of Central Tendency, Dispersion, and Shape in Real Estate Appraisal

Chapter: Measures of Central Tendency, Dispersion, and Shape in Real Estate Appraisal

Introduction

Statistical analysis is a crucial tool in real estate appraisal, providing a framework for understanding market trends, property characteristics, and value indicators. This chapter focuses on fundamental statistical concepts: measures of central tendency, dispersion, and shape. These measures provide concise summaries of data sets, enabling appraisers to make informed judgments and support their opinions of value.

1. Measures of Central Tendency

Measures of central tendency describe the typical or average value within a dataset. They offer a single, representative number that summarizes the overall level of a variable. Common measures include the mean, median, and mode. The applicability of each measure depends on the nature of the data and the specific question being addressed. Moreover, the sample central tendency can be used to infer the corresponding population central tendency.

1.1. Arithmetic Mean (Average)

  • Definition: The arithmetic mean, often simply called the “mean” or “average,” is calculated by summing all values in a dataset and dividing by the number of values.

  • Formula:

    • Population Mean (ฮผ): ฮผ = (ฮฃxแตข) / N
    • Sample Mean (Xฬ„): Xฬ„ = (ฮฃxแตข) / n

    Where:
    * xแตข represents each individual value in the dataset.
    * N is the population size.
    * n is the sample size.
    * ฮฃ represents the summation operation.

  • Properties:

    • Sensitive to extreme values (outliers). An outlier can significantly skew the mean.
    • Easy to calculate and understand.
    • Used in many statistical calculations and inference methods.
  • Practical Application: Calculating the average price per square foot of comparable properties in a neighborhood to estimate the subject property’s value.

  • Example:
    Suppose you have the following sales prices for five comparable properties: $300,000, $320,000, $330,000, $340,000, and $500,000. The mean sales price is:

    Xฬ„ = ($300,000 + $320,000 + $330,000 + $340,000 + $500,000) / 5 = $358,000

    The outlier, $500,000, heavily influences the mean.

1.2. Median

  • Definition: The median is the middle value in a dataset when the values are arranged in ascending or descending order. If there is an even number of values, the median is the average of the two middle values.

  • Calculation:

    1. Sort the dataset in ascending order.
    2. If the number of values (n) is odd, the median is the value at position (n+1)/2.
    3. If the number of values (n) is even, the median is the average of the values at positions n/2 and (n/2) + 1.
  • Properties:

    • Resistant to extreme values. Outliers have little to no impact on the median.
    • Represents the “typical” value in a dataset.
    • Useful when the data contains skewed distributions.
  • Practical Application: Determining the typical sales price in a market where there are a few very expensive or inexpensive properties.

  • Example:
    Using the same sales prices as above: $300,000, $320,000, $330,000, $340,000, and $500,000.

    1. Sorted data: $300,000, $320,000, $330,000, $340,000, $500,000.
    2. The median is the middle value, $330,000.

    The median ($330,000) is lower than the mean ($358,000), indicating a positive skew in the data.

1.3. Mode

  • Definition: The mode is the value that appears most frequently in a dataset. A dataset can have no mode (if all values are unique), one mode (unimodal), or multiple modes (bimodal, trimodal, etc.).

  • Calculation: Simply count the frequency of each value in the dataset and identify the value with the highest frequency.

  • Properties:

    • Easy to determine.
    • Useful for categorical data or datasets with discrete values.
    • May not be a good measure of central tendency if the frequencies of all the values are nearly equal.
  • Practical Application: Identifying the most common house size (number of bedrooms, square footage) in a particular area.

  • Example:
    In a dataset of house sizes (square footage): 1200, 1400, 1500, 1500, 1600, 1700, 1700, 1700, 1800.

    The mode is 1700 square feet because it appears most frequently (three times).

2. Measures of Dispersion

Measures of dispersion quantify the spread or variability of data points in a dataset. They indicate how much the individual values deviate from the central tendency. Common measures include range, variance, standard deviation, and coefficient of variation. Measures of dispersion are useful because they can be compared to the characteristics of a known distribution, such as the normal distribution, to determine whether a particular set of parametric inferential statistics can be used. Measures of dispersion also facilitate comparison of two data sets to determine which is more variable.

2.1. Range

  • Definition: The range is the difference between the maximum and minimum values in a dataset.

  • Formula: Range = Maximum value - Minimum value

  • Properties:

    • Simple to calculate.
    • Highly sensitive to outliers.
    • Provides a basic understanding of data spread.
  • Practical Application: Quickly assessing the price range of properties in a specific neighborhood.

  • Example:
    Consider the following property prices: $250,000, $280,000, $300,000, $320,000, $450,000.

    Range = $450,000 - $250,000 = $200,000

2.2. Variance

  • Definition: Variance measures the average squared deviation of each value from the mean. It quantifies the overall dispersion of data points around the mean.

  • Formulas:

    • Population Variance (ฯƒยฒ): ฯƒยฒ = ฮฃ(xแตข - ฮผ)ยฒ / N
    • Sample Variance (Sยฒ): Sยฒ = ฮฃ(xแตข - Xฬ„)ยฒ / (n-1)

    Where:
    * xแตข represents each individual value in the dataset.
    * ฮผ is the population mean.
    * Xฬ„ is the sample mean.
    * N is the population size.
    * n is the sample size.

  • Properties:

    • Always non-negative.
    • Sensitive to outliers due to the squared deviations.
    • Expressed in squared units of the original data.
    • Variance is simply the square of the standard deviation.
  • Practical Application: Comparing the price volatility of different real estate markets.

2.3. Standard Deviation

  • Definition: The standard deviation is the square root of the variance. It measures the average distance of data points from the mean, expressed in the original units of the data.

  • Formulas:

    • Population Standard Deviation (ฯƒ): ฯƒ = โˆš[ฮฃ(xแตข - ฮผ)ยฒ / N]
    • Sample Standard Deviation (S): S = โˆš[ฮฃ(xแตข - Xฬ„)ยฒ / (n-1)]
  • Properties:

    • Always non-negative.
    • Sensitive to outliers.
    • Provides a more interpretable measure of spread than variance.
    • Lends itself to further statistical treatment, allowing inferences to be drawn and statements to be made regarding the degree of uncertainty associated with an inference.
  • Practical Application: Assessing the reliability of a sales price prediction model.

  • Empirical Rule (68-95-99.7 Rule): For a normally distributed dataset, approximately:

    • 68% of the data falls within one standard deviation of the mean (ฮผ ยฑ ฯƒ).
    • 95% of the data falls within two standard deviations of the mean (ฮผ ยฑ 2ฯƒ).
    • 99.7% of the data falls within three standard deviations of the mean (ฮผ ยฑ 3ฯƒ).

2.4. Coefficient of Variation (CV)

  • Definition: The coefficient of variation (CV) is the ratio of the standard deviation to the mean, expressed as a percentage. It measures relative variability, allowing comparison of dispersion across datasets with different means.

  • Formula: CV = (S / Xฬ„) * 100%

    Where:
    * S is the sample standard deviation.
    * Xฬ„ is the sample mean.

  • Properties:

    • Dimensionless (no units), enabling comparison across datasets with different scales.
    • Higher CV indicates greater relative variability.
  • Practical Application: Comparing the price dispersion of properties in different neighborhoods with varying average prices.

  • Example:

    • Neighborhood A: Mean price = $400,000, Standard deviation = $50,000
    • Neighborhood B: Mean price = $800,000, Standard deviation = $75,000

    CV(A) = ($50,000 / $400,000) * 100% = 12.5%
    CV(B) = ($75,000 / $800,000) * 100% = 9.375%

    Neighborhood A has greater relative price variability than Neighborhood B.

2.5. Interquartile Range

  • A data setโ€™s ordered array can be divided into four subsets of identical size by identifying quartiles.

  • Definition: Fifty percent of the ordered array of data falls between Q1 and Q3 in this interquartile range.

  • Formulas:

    • Quartile 1 (Q1): Q1 = (n+1)/4 ordered observation
    • Quartile 2 (Q2): Q2 = median ordered observation
    • Quartile 3 (Q3): Q3 = 3(n+1)/4 ordered observation
  • Calculation:

    The following decision rules also apply:

    1. If the position point calculation is an integer, then the ordered observation occupying that position point is the quartile boundary.

    2. If the position point is halfway between two integers, then the midpoint between the next-largest and next-smallest ordered observation is the quartile boundary.

    3. If the position point is neither an integer nor halfway between two integers, then the position point is rounded to the nearest integer and the corresponding ordered observation is the quartile boundary.

  • Properties:

    • When data is normally distributed, the interquartile range should be approximately equal to 1.33 standard deviations.

3. Measures of Shape

Measures of shape describe the symmetry and peakedness of a data distribution. They help determine how closely the data resembles a normal distribution and whether extreme values influence the central tendency. Measures of shape are essential for determining how close to normal a data distribution is and the extent to which extreme values are distorting the difference between the median and the mean. The normal distribution, which is the basis for many statistical inferences, is symmetricalโ€”i.e., its median and mean are equal.

3.1. Skewness

  • Definition: Skewness measures the asymmetry of a distribution. A symmetrical distribution has a skewness of zero. A left-skewed (negatively skewed) distribution has a longer tail on the left side, while a right-skewed (positively skewed) distribution has a longer tail on the right side.

  • Formula:

    Skewness = [n / ((n-1)(n-2))] * ฮฃ[(xแตข - Xฬ„) / S]ยณ

    Where:
    * Xฬ„ is the sample mean.
    * S is the sample standard deviation.
    * n is the sample size.

  • Interpretation:

    • Skewness = 0: Symmetrical distribution. The mean and median are approximately equal.
    • Skewness < 0: Left-skewed distribution. The mean is typically less than the median.
    • Skewness > 0: Right-skewed distribution. The mean is typically greater than the median.
  • Practical Application: Identifying the presence of outliers that could distort the average sales price in a market.

  • Example: If a dataset of property prices has a high skewness, there are some properties with substantially higher or lower prices than the average property.

3.2. Kurtosis

  • Definition: Kurtosis measures the “peakedness” or “tailedness” of a distribution. It describes the concentration of data around the mean and the heaviness of the tails.

  • Types of Kurtosis:

    • Mesokurtic: Kurtosis โ‰ˆ 3. A distribution with kurtosis similar to that of the normal distribution.
    • Leptokurtic: Kurtosis > 3. A distribution with a sharper peak and heavier tails than the normal distribution. More data is concentrated around the mean, and there are more extreme values.
    • Platykurtic: Kurtosis < 3. A distribution with a flatter peak and thinner tails than the normal distribution. Data is more dispersed around the mean, and there are fewer extreme values.
  • Practical Application: Assessing the riskiness of real estate investments. Leptokurtic distributions suggest higher risk due to the greater probability of extreme returns (both positive and negative).

3.3. Box and Whisker Plots

  • Definition: A box and whisker plot (or boxplot) is a graphical representation of a dataset’s five-number summary: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.

  • Interpretation:

    • The box represents the interquartile range (IQR), containing the middle 50% of the data.
    • The median is marked within the box.
    • The whiskers extend from the box to the minimum and maximum values (or to a defined range, with outliers plotted as individual points).
    • The shape of the box and the length of the whiskers provide insights into the skewness and spread of the data.
  • Practical Application: Visually comparing the distributions of property prices across different areas or property types.

3.4. Histograms

  • Definition: Histograms are graphic depictions of a frequency or percentage distribution.

  • Interpretation:

    • If the data distribution is symmetrical, the distributions to the right and left of center would be mirror images. Instead, the left side extends farther from the center (i.e., left skewness).

4. Normality Testing

Normality tests assess whether a dataset follows a normal distribution. Several statistical tests and graphical methods can be used. Quantitative tests for normality and normal probability plots are useful for assessing the degree of departure from normality.

4.1. Kolmogorov-Smirnov (KS) Test and Shapiro-Wilk Test:
These are statistical tests that assess whether a sample comes from a population with a specific distribution (often, the normal distribution). A p-value is generated. If the p-value is below a chosen significance level (e.g., 0.05), the null hypothesis (that the data is normally distributed) is rejected.

4.2. Normal Probability Plots (Q-Q Plots):
These plots compare the quantiles of the dataset to the quantiles of a standard normal distribution. If the data is normally distributed, the points will fall approximately along a straight line. Deviations from the straight line indicate departures from normality.

5. Parametric vs. Nonparametric Statistics

  • Parametric Statistics:
    These statistical methods rely on assumptions about the underlying distribution of the population data. Many parametric tests assume a normal distribution. Examples include t-tests, ANOVA, and linear regression. If the normality assumption is violated, the results of these tests may be unreliable.
  • Nonparametric Statistics:
    These statistical methods do not rely on strong assumptions about the underlying distribution of the population data. They are often used when the normality assumption is violated or when the data is ordinal or nominal. Examples include the Mann-Whitney U test, Wilcoxon signed-rank test, and Kruskal-Wallis test.

6. Central Limit Theorem and Inference

  • The central limit theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem is crucial for statistical inference because it allows us to make inferences about population parameters (e.g., population mean) based on sample statistics, even when the population distribution is not normal.

Conclusion

Measures of central tendency, dispersion, and shape are essential tools for real estate appraisers. They provide a framework for summarizing and analyzing data, identifying trends, and making informed decisions. By understanding these fundamental statistical concepts, appraisers can improve the accuracy and reliability of their opinions of value.

Chapter Summary

This chapter on “Measures of Central Tendency, Dispersion, and Shape in Real Estate Appraisal” provides a foundational understanding of statistical concepts crucial for analyzing real estate data. It emphasizes the use of these measures to understand the characteristics of a sample and to make inferences about the larger population from which the sample is drawn.

The chapter begins by explaining measures of central tendency, like the mean, median, and mode, noting how the sample central tendency can be used to infer population characteristics. Stratified random sampling is introduced as a method to improve inferences by ensuring sample variability aligns with the underlying population.

Next, measures of dispersion such as standard deviation, variance, coefficient of variation, range, and interquartile range are discussed. These measures quantify the variability within a dataset and facilitate comparisons between different datasets. The standard deviation is highlighted for its role in statistical inference and uncertainty assessment. The chapter clarifies how to calculate sample and population standard deviation and variance. It also explains how the coefficient of variation allows relative comparisons of dispersion.

The importance of assessing the shape of the data distribution is then emphasized, with a focus on skewness and kurtosis. Skewness describes the asymmetry of the distribution (left or right skew), while kurtosis indicates the “peakedness” or tail thickness. Tools like box and whisker plots and histograms are presented as visual aids for understanding skewness. Quantitative measures for skewness and kurtosis are also mentioned, noting how these are easily generated by spreadsheet programs and statistical software.

The chapter explains how evaluating these shape measures helps in determining if the data conforms to a normal distribution. While perfect normality is rare, the degree of departure from normality is crucial because many parametric statistical tests rely on this assumption. Tests for normality, such as the Komolgorov-Smirnov test, are introduced, along with the interpretation of p-values.

Finally, the chapter touches upon the central limit theorem and inference, highlighting that if data distribution is too far from normal, then inferential tests based on assumptions of normality may not be applicable to small samples. If extreme values in the data are distorting the arithmetic mean, then the median is likely to be a better indicator of central tendency. This underscores the importance of understanding these statistical measures for sound real estate appraisal practices.

Explanation:

-:

No videos available for this chapter.

Are you ready to test your knowledge?

Google Schooler Resources: Exploring Academic Links

...

Scientific Tags and Keywords: Deep Dive into Research Areas