Selecting Distributions and Modeling Correlations in Monte Carlo Real Estate Analysis

Selecting Distributions and Modeling Correlations in Monte Carlo Real Estate Analysis

Selecting Distributions and Modeling Correlations in Monte Carlo Real Estate Analysis

Understanding the Importance of Distribution Selection

The accuracy and reliability of a Monte Carlo simulation hinge on selecting probability distributions that accurately represent the uncertainty inherent in the input variables. Choosing inappropriate distributions can lead to misleading results and flawed decision-making. Therefore, careful consideration must be given to the characteristics of each input variable when assigning a distribution.

  • Theoretical Consistency: The selected distribution should align with the theoretical properties of the variable being modeled. For example, variables that cannot take negative values (e.g., stock prices, interest rates, occupancy rates) should be modeled using distributions that are bounded at zero, such as the Lognormal or Exponential distribution.

  • Data Fit: The distribution should adequately fit the available data. This can be assessed using statistical techniques like goodness-of-fit tests (e.g., Chi-squared test, Kolmogorov-Smirnov test) and visual inspection of histograms and probability plots.

  • Continuous vs. Discrete Data: Determine if the data is continuous (can take any value within a range) or discrete (can only take specific values). While discrete data is often treated as continuous, especially with large sample sizes, using discrete distributions (e.g., Bernoulli, Poisson, Binomial) may be more appropriate in some cases.

  • Understanding the Data Generating Process: Consider the process that generates the data. Does the variable exhibit trends, seasonality, or other underlying structures? If so, these factors should be incorporated into the model, perhaps using regression analysis or other time-series techniques.

Common Probability Distributions in Real Estate Analysis

Several probability distributions are commonly used in Monte Carlo simulations for real estate investment analysis. Each distribution has its own characteristics and is suitable for modeling different types of variables.

  • Normal Distribution: Characterized by its bell-shaped curve, defined by its mean (μ) and standard deviation (σ). It’s useful for modeling variables where values are clustered around the mean, and deviations from the mean are equally likely in both directions.

    • Formula: Probability Density Function (PDF): f(x) = (1 / (σ * sqrt(2π))) * exp(-((x - μ)^2) / (2 * σ^2))
  • Lognormal Distribution: The logarithm of the variable follows a normal distribution. Useful for modeling variables that cannot be negative and exhibit positive skewness (i.e., a longer tail to the right), such as property values or rental rates.

    • Formula: PDF: f(x) = (1 / (x * σ * sqrt(2π))) * exp(-((ln(x) - μ)^2) / (2 * σ^2)) for x > 0
  • triangular Distribution: Defined by its minimum (a), maximum (b), and most likely (c) values. It’s useful when limited data is available and only these three values can be estimated.

    • Formula:
      • f(x) = (2 * (x - a)) / ((b - a) * (c - a)) for a <= x <= c
      • f(x) = (2 * (b - x)) / ((b - a) * (b - c)) for c <= x <= b
  • Uniform Distribution: All values within a specified range are equally likely. Useful when there is no information to suggest that any particular value is more probable than others.

    • Formula: PDF: f(x) = 1 / (b - a) for a <= x <= b
  • Exponential Distribution: Models the time until an event occurs, such as the time until a property is leased or the duration of a lease.

    • Formula: PDF: f(x) = λ * exp(-λx) for x >= 0, where λ is the rate parameter
  • Beta Distribution: Defined on the interval [0, 1], it’s useful for modeling probabilities or proportions, such as occupancy rates or loan-to-value ratios.

    • Formula: PDF: f(x) = (x^(α-1) * (1-x)^(β-1)) / B(α, β) for 0 <= x <= 1, where α and β are shape parameters and B(α, β) is the Beta function.
  • Discrete Distributions: For variables that can only take a limited number of values, consider using discrete distributions such as:

    • Bernoulli Distribution: Models the probability of success or failure (e.g., whether a tenant renews a lease).
    • Binomial Distribution: Models the number of successes in a fixed number of trials (e.g., the number of units leased in a building).
    • Poisson Distribution: Models the number of events occurring in a fixed period of time or space (e.g., the number of maintenance requests received per month).

Fitting Distributions to Data: Techniques and Considerations

When historical data is available, the chosen distribution should be fitted to the data as accurately as possible. Several techniques can be used for this purpose:

  1. Visual Inspection: Plot a histogram of the data and compare it to the shapes of various probability distributions. This can provide an initial indication of which distributions might be appropriate. Consider Box-whisker plots to visualize skewness of your data.

  2. Parameter Estimation: Estimate the parameters of the distribution (e.g., mean, standard deviation) using sample statistics calculated from the data.

  3. Goodness-of-Fit Tests: Perform statistical tests to assess how well the distribution fits the data. Common tests include:

    • Chi-squared Test: Compares the observed frequencies of data in different bins to the expected frequencies under the assumed distribution.
    • Kolmogorov-Smirnov (K-S) Test: Measures the maximum distance between the empirical cumulative distribution function (ECDF) of the data and the cumulative distribution function (CDF) of the assumed distribution.
    • Anderson-Darling Test: Similar to the K-S test but gives more weight to the tails of the distribution.
  4. Software Packages: Statistical software packages (e.g., R, Python, MATLAB) provide tools for fitting distributions to data and performing goodness-of-fit tests.

Modeling Correlations Among Variables

In real estate investment analysis, variables are often correlated. Ignoring these correlations can lead to inaccurate and unrealistic simulation results. For example, rental growth rates and vacancy rates are typically negatively correlated: higher rental growth is often associated with lower vacancy rates.

  • Correlation Coefficient: A measure of the linear relationship between two variables, ranging from -1 to +1. A value of +1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 indicates no linear correlation.

    • Formula: ρ(X, Y) = Cov(X, Y) / (σX * σY), where Cov(X, Y) is the covariance between variables X and Y, and σX and σY are their respective standard deviations.
  • Types of Correlation:

    • Positive Correlation: As one variable increases, the other tends to increase as well.
    • Negative Correlation: As one variable increases, the other tends to decrease.
    • No Correlation: The variables are independent of each other.
  • Methods for Modeling Correlation:

    1. Copulas: Copulas are mathematical functions that describe the dependence structure between random variables, independent of their marginal distributions. They allow you to model complex dependencies that go beyond simple linear correlation. Examples include Gaussian copulas, Student’s t-copulas, and Clayton copulas. The choice of copula depends on the nature of the dependence you want to model (e.g., tail dependence).

    2. Cholesky Decomposition: This method is used to generate correlated random numbers from a multivariate normal distribution. It involves decomposing the correlation matrix into a lower triangular matrix (Cholesky matrix) and using this matrix to transform independent random numbers into correlated ones.

      • Steps:
        1. Calculate the correlation matrix (C) of the random variables.
        2. Perform Cholesky decomposition on C to obtain a lower triangular matrix L such that C = L * L'.
        3. Generate a vector of independent standard normal random numbers (Z).
        4. Calculate the correlated random numbers (X) as X = μ + L * Z, where μ is the vector of means for the random variables.
    3. Correlation Matrix Adjustment: In practice, empirically derived correlation matrices are not always positive semi-definite which is required for Cholesky decomposition. Therefore, one must use matrix adjustment techniques (e.g. Higham 2002)

    4. Regression Analysis: If there is a causal relationship between variables, regression analysis can be used to model the dependence. The independent variables can be used to predict the dependent variable, and the error term in the regression can be modeled as a random variable. Be aware of potential simultaneity issues that can bias Ordinary Least Squares Regression

    5. Rank Correlation (Spearman’s Rho): This measures the monotonic relationship between two variables. This is useful when the relationship is non-linear, but consistent in direction. The formula involves ranking the data and measuring the correlation of the ranks.

    • Example:

      Let’s say you want to simulate correlated rental growth and exit cap rates. You estimate the correlation coefficient between these two variables to be -0.5. Using Cholesky decomposition, you can generate correlated random numbers for rental growth and cap rates, ensuring that the simulated values reflect the negative correlation between them. If the correlation is misspecified, the Monte Carlo simulation will not accurately reflect risk.

  1. Sensitivity Analysis: Conduct sensitivity analysis to assess the impact of different distribution choices and correlation assumptions on the simulation results. Vary the parameters of the distributions (e.g., mean, standard deviation, correlation coefficient) and observe how the output distributions (e.g., NPV, IRR) change.

  2. Scenario Analysis: Create different scenarios based on various combinations of input distributions and correlations. For example, a “best-case” scenario might involve optimistic distributions for rental growth and occupancy rates, with a strong negative correlation between vacancy rates and expense growth. A “worst-case” scenario might involve pessimistic distributions for these variables, with a weaker correlation.

  3. Validation: Validate the model by comparing the simulation results to historical data or expert opinions. If the model produces results that are significantly different from reality, it may be necessary to revise the distribution choices or correlation assumptions.

  4. Experiment: Consider an office building project. Create a Monte Carlo model with key input variables like rental growth, vacancy rates, construction costs, and exit cap rates. Assign appropriate distributions to these variables based on historical data and expert opinions. Experiment with different correlation assumptions between rental growth and vacancy rates, and between construction costs and interest rates. Analyze how the output distributions of NPV and IRR change under different scenarios.

  5. Modeling Stochastic Growth:

    In a deterministic world, prices might grow at an exponential rate µ. However, in a stochastic world where prices fluctuate randomly, the price in period T is:
    PT = P0 * e^((µ - 0.5 * σ^2) * T + σ * Z * sqrt(T))
    Where:
    * PT is a random lognormal variable.
    * Z is a standard normal random variable (mean of 0, standard deviation of 1).
    * σ is the standard deviation
    * P0 is the starting price

    Important Note: The 0.5 * σ^2 term is a volatility drag correction. Without this correction, the expected value of the stochastic process will be higher than simply growing the initial price at the rate µ. This correction becomes more important as volatility increases.

Chapter Summary

Summary

This chapter emphasizes the crucial role of selecting appropriate probability distributions and modeling correlations between random variables in Monte Carlo real estate analysis. It contrasts Monte Carlo with deterministic analysis and outlines the benefits of the former in capturing the full spectrum of possible outcomes.

  • Monte Carlo analysis uses a broad range of possible values for each variable, weighted by their probability of occurrence, providing a more realistic representation of uncertainty compared to deterministic scenarios.
  • Selecting the right distribution is paramount; it should align with financial theory (e.g., non-negative stock prices) and accurately reflect the underlying data. Treating discrete data as continuous is often acceptable when the observation count is high.
  • Regression analysis can be used to model underlying structures or trends within random variables, for example, modeling office construction starts as a function of prices, vacancy rates, and interest rates. Techniques like two-stage least squares are needed when error terms and explanatory variables are correlated.
  • Box-whisker plots offer a compact way to represent data distributions, especially those that are skewed or non-normal, providing a five-number summary (minimum, lower quartile, median, upper quartile, maximum) and displaying outliers.
  • When historical data is limited, triangular distributions can be used to estimate the minimum, maximum, and most likely outcomes for random variables.
  • Correlations between variables (e.g., employment growth, vacancy rates, rental changes, and cap rates) should be carefully considered. Failure to do so can lead to inaccurate risk assessments. Accounting for the correlation between rental growth and cap rates significantly impacts DCF calculations and promote values.
  • Correctly accounting for stochastic growth is crucial. The price in period T is defined by PT = P0 • e[(μ -0.5 • σ2) • T + σ• Z • T]. The greater the standard deviation, the more the distribution spreads out to the right over time.
  • Monte Carlo analysis offers insights into avoiding the “winner’s curse” in bidding wars by quantifying risk and revealing the potential impact of hidden information and volatility.

Explanation:

-:

No videos available for this chapter.

Are you ready to test your knowledge?

Google Schooler Resources: Exploring Academic Links

...

Scientific Tags and Keywords: Deep Dive into Research Areas