Descriptive Statistics: Dispersion and Shape

Descriptive Statistics: Dispersion and Shape
Introduction
This chapter expands on descriptive statistics, focusing on measures of dispersion and shape. Understanding these measures is crucial for real estate appraisal because they provide insights into the variability and distribution of property data, such as sales prices or rental rates. This understanding informs the selection of appropriate statistical methods and the interpretation of results.
Measures of Dispersion
Measures of dispersion quantify the spread or variability within a dataset. High dispersion indicates data points are widely scattered, while low dispersion indicates they cluster closely around the central tendency. These measures are vital because they help assess the reliability and representativeness of sample data.
Standard Deviation and Variance
The standard deviation and variance are fundamental measures of dispersion that consider all data points in a dataset. They quantify the average deviation of data points from the mean.
1. Variance: The variance (σ² for population, S² for sample) is the average of the squared differences between each data point and the mean. Squaring the differences ensures that both positive and negative deviations contribute positively to the measure.
* **Population Variance (σ²):**
σ² = Σ[(Xi - μ)²] / N
Where:
* Xi represents each individual data point in the population.
* μ represents the population mean.
* N represents the population size.
* **Sample Variance (S²):**
S² = Σ[(Xi - X̄)²] / (n - 1)
Where:
* Xi represents each individual data point in the sample.
* X̄ represents the sample mean.
* n represents the sample size. The use of (n-1) provides an unbiased estimator for the population variance.
2. Standard Deviation: The standard deviation (σ for population, S for sample) is the square root of the variance. It represents the typical deviation of data points from the mean, expressed in the original units of measurement. This makes it more interpretable than the variance.
* **Population Standard Deviation (σ):**
σ = √σ² = √{Σ[(Xi - μ)²] / N}
* **Sample Standard Deviation (S):**
S = √S² = √{Σ[(Xi - X̄)²] / (n - 1)}
Example: Consider a dataset of monthly rents for garden-level apartments (as shown in Table 14.1 from the provided text):
$600, $650, $695, $710, $715, $730, $735, $735, $760, $760, $785, $800, $800, $805, $815, $820, $820, $825, $825, $825, $825, $850, $850, $850, $850, $850, $850, $860, $860, $890, $890, $920, $920, $930, $970, $995
Following the calculations in Table 14.2, the sample mean (X̄) is $815.83, and the sample standard deviation (S) is $84.71. This indicates that, on average, apartment rents in the sample deviate from the mean by $84.71.
Practical Applications and Related Experiments:
- Comparable Property Analysis: Calculate the standard deviation of sale prices for comparable properties. A lower standard deviation suggests a more consistent market and greater confidence in the indicated value.
- Rental Market Volatility: Track the standard deviation of rental rates over time. An increasing standard deviation may indicate increased market volatility or significant changes in property characteristics.
- Sensitivity Analysis: Evaluate how changes in input parameters (e.g., discount rate, vacancy rate) affect the standard deviation of projected property values. This helps assess the risk associated with different scenarios.
- Experiment: Collect data for the prices of similar properties across various neighborhoods. Calculate the standard deviation for each neighborhood and compare the values to understand price dispersion differences.
Coefficient of Variation
The coefficient of variation (CV) is a relative measure of dispersion that expresses the standard deviation as a percentage of the mean. It is particularly useful for comparing the variability of datasets with different means or units of measurement.
Formula:
CV = (S / X̄) * 100
Where:
- S is the sample standard deviation.
- X̄ is the sample mean.
Example: For the apartment rent data, the coefficient of variation is:
CV = ($84.71 / $815.83) * 100 = 10.38%
This means that the standard deviation is 10.38% of the mean rent.
Practical Applications:
- Comparing Investment Risk: Calculate the CV for different real estate investments (e.g., residential vs. commercial). A higher CV indicates greater relative risk.
- Analyzing Market Consistency: Compare the CV of sale prices in different neighborhoods. A lower CV suggests a more consistent and predictable market.
- Portfolio Diversification: Use the CV to assess the diversification benefits of combining different types of real estate assets in a portfolio.
- Experiment: Compute the mean and standard deviation for the sale prices of single-family homes and apartment complexes. Calculate and compare their Coefficients of Variation to assess the relative price variation.
Range
The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset.
Formula:
Range = Maximum Value - Minimum Value
Example: For the apartment rent data, the range is:
Range = $995 - $600 = $395
Practical Applications:
- Quick Assessment of Variability: Provides a quick indication of the potential spread in property values or rental rates.
- Identifying Outliers: A large range may indicate the presence of outliers that require further investigation.
- Market Surveys: The range of prices obtained from a market survey can provide a general sense of the price spectrum.
- Experiment: Record the daily high and low temperatures for a month. Calculate the range for each day to understand the daily temperature variation.
Interquartile Range
The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). It represents the range of the middle 50% of the data and is less sensitive to outliers than the range.
1. Quartiles:
* Q1: Divides the lowest 25% of values from the highest 75%. Position = (n+1)/4
* Q2: The median, divides the dataset in half. Position = 2(n+1)/4
* Q3: Divides the lowest 75% of values from the highest 25%. Position = 3(n+1)/4
If the quartile position is not a whole number, interpolation can be used.
Formula:
IQR = Q3 - Q1
Example: For the apartment rent data:
n = 36
* Q1 Position = (36+1)/4 = 9.25 ≈ 9th ordered observation = $760
* Q2 Position = 2(36+1)/4 = 18.5, Median = $825
* Q3 Position = 3(36+1)/4 = 27.75 ≈ 28th ordered observation = $860
IQR = $860 - $760 = $100
Practical Applications:
- Robust Measure of Dispersion: Provides a more stable measure of variability when outliers are present.
- Box Plot Construction: The IQR is used to construct box plots, which visually represent the distribution of data and identify potential outliers.
- Identifying Market Segments: The IQR can help identify price ranges that represent the core of a particular market segment.
- Experiment: Split a data set of home sale prices into different neighborhood zones. Compute and compare the IQR for each zone to determine price variation.
Measures of Shape
Measures of shape describe the symmetry and peakedness of a distribution. They help assess whether data follows a normal distribution, which is a crucial assumption for many statistical tests.
Skewness
Skewness measures the asymmetry of a distribution. A symmetrical distribution has a skewness of zero. A distribution is:
- Left-skewed (negatively skewed): The tail is longer on the left side, and the mean is less than the median.
- Right-skewed (positively skewed): The tail is longer on the right side, and the mean is greater than the median.
Formula:
Skewness = Σ[(Xi - X̄) / S]³ / (n - 1)
Where:
- Xi represents each individual data point in the sample.
- X̄ represents the sample mean.
- S represents the sample standard deviation.
- n represents the sample size.
Example: The skewness for the apartment rent data is -0.312, indicating a slight left skewness.
Practical Applications:
- Assessing Data Normality: Skewness is a key indicator of whether data is normally distributed.
- Identifying Market Anomalies: Significant skewness in sales price data may indicate market anomalies or unusual transactions.
- Appropriate Central Tendency: If the distribution is significantly skewed, the median may be a better representation of central tendency than the mean.
- Experiment: Record the income distribution of residents in an area. Calculate skewness to determine income distribution asymmetry.
Kurtosis
Kurtosis measures the “peakedness” of a distribution, specifically the concentration of data around the mean and the thickness of the tails.
- Mesokurtic: Kurtosis = 3. The distribution has a shape similar to the normal distribution.
- Leptokurtic: Kurtosis > 3. The distribution has a sharper peak and heavier tails than the normal distribution.
- Platykurtic: Kurtosis < 3. The distribution has a flatter peak and thinner tails than the normal distribution.
Practical Applications:
- Tail Risk Assessment: Leptokurtic distributions indicate a higher probability of extreme values (tail risk).
- Model Selection: Kurtosis helps select appropriate statistical models for data analysis.
- Market Efficiency: The distribution of price changes in an efficient market is expected to be mesokurtic. Deviations from this may indicate market inefficiencies.
- Experiment: Obtain stock price data. Calculate kurtosis to evaluate financial market distribution properties.
Normality Tests
Assessing normality is vital to decide if parametric statistical tests are suitable for a data set. Several tools help evaluate the proximity of a sample to a normal distribution.
Quantitative Tests for Normality
Tests like the Kolmogorov-Smirnov (KS) test, Shapiro-Wilk test, Anderson-Darling test, and Ryan-Joiner test compare a sample distribution to a normal distribution and provide a p-value.
- P-value: A p-value represents the probability of obtaining the observed sample data (or more extreme data) if the population is normally distributed.
- Interpretation:
- A large p-value (typically > 0.05) suggests that the sample data is consistent with a normal distribution.
- A small p-value (typically ≤ 0.05) suggests that the sample data deviates significantly from a normal distribution, indicating a non-normal population.
Normal Probability Plots
A normal probability plot (also called a Q-Q plot) visually compares the quantiles of a sample distribution to the quantiles of a normal distribution. If the data is normally distributed, the points on the plot will fall approximately along a straight line. Deviations from the straight line indicate departures from normality.
Practical Applications:
- Validating Statistical Assumptions: Verify that the data meets the normality assumption required for parametric statistical tests (e.g., t-tests, ANOVA).
- Data Transformation: If the data is not normally distributed, consider applying transformations (e.g., logarithmic transformation) to improve normality.
- Choosing Statistical Methods: If normality cannot be achieved, consider using nonparametric statistical methods, which do not require the normality assumption.
- Experiment: Compare residential property sales data with commercial property sales data. Analyze if they follow a normal distribution using different normality tests.
Central Limit Theorem and Inference
The Central Limit Theorem (CLT) states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem is crucial for statistical inference because it allows us to make inferences about population parameters (e.g., mean) even when the population distribution is unknown.
Practical Applications:
- Sample Size Determination: The CLT guides the determination of the sample size needed to achieve a desired level of precision in estimating population parameters.
- Confidence Interval Construction: Confidence intervals for population means can be constructed based on the assumption of normality of the sampling distribution.
- Hypothesis Testing: Hypothesis tests about population means can be conducted using t-tests or z-tests, which rely on the CLT.
- Experiment: Simulate random property sales price samples of increasing sizes and check if the distribution of sample means approaches normality.
Conclusion
Understanding and applying measures of dispersion and shape are essential skills for real estate appraisers. These measures provide valuable insights into the characteristics of property data, guide the selection of appropriate statistical methods, and inform the interpretation of results. By mastering these concepts, appraisers can improve the accuracy and reliability of their valuations and analyses.
Chapter Summary
This chapter on “Descriptive Statistics: Dispersion and Shape” in the context of real estate appraisal focuses on methods for characterizing the variability and form of data sets, essential for making sound statistical inferences. Measures of dispersion, including standard deviation and variance, quantify the spread of data around the mean. Standard deviation, being the square root of the variance, is particularly useful for further statistical analysis and inference. The chapter details the calculation of both sample and population standard deviation and variance, illustrating their use with a practical example of apartment rents.
The chapter explains how to use the standard deviation to understand the distribution of data, particularly in relation to the normal distribution, where approximately 68%, 80%, and 95% of observations fall within 1, 1.28, and 2 standard deviations of the mean, respectively. The coefficient of variation (CV) provides a relative measure of dispersion, standardizing it to the sample mean for comparisons across different datasets. The range (difference between the highest and lowest values) offers a quick but less comprehensive view of data spread. The interquartile range (IQR), representing the middle 50% of the data, is less sensitive to outliers and provides insight into the data’s central tendency.
Measures of shape, specifically skewness and kurtosis, help determine the normality of a data distribution. Skewness indicates the asymmetry of the distribution; a left-skewed distribution has a longer tail on the left (mean < median), while a right-skewed distribution has a longer tail on the right (mean > median). Kurtosis describes the “peakedness” of the distribution, with leptokurtic distributions being more peaked (kurtosis > 3) and platykurtic distributions being flatter (kurtosis < 3) than the normal (mesokurtic) distribution (kurtosis = 3). Box and whisker plots and histograms are presented as graphical tools for visualizing skewness.
The chapter emphasizes the importance of assessing normality. While real-world data rarely perfectly fits a normal distribution, understanding the degree of departure from normality is crucial. Quantitative tests for normality, such as the Komolgorov-Smirnov test, and normal probability plots are useful in this assessment. The chapter highlights that if data deviates significantly from normality, parametric tests that assume normality may not be appropriate, especially for small samples. In such cases, nonparametric tests, which do not rely on distributional assumptions, may be more suitable. Finally, the chapter alludes to the Central Limit Theorem and its role in making inferences, particularly when dealing with large sample sizes.