Identifying Model Complexity: A Machine Learning Framework
`All models are wrong, but some are useful' is an often used mantra, particularly when a model's ability to capture the full complexities of social life is questioned. However, an appropriate functional form is key to valid statistical inference, and under-estimating model complexity can lead to biased results. Unfortunately, it is unclear a-priori what the appropriate complexity of a functional form should be. I propose to use methods from machine learning to generate an estimate of the fit potential in a dataset. By comparing this fit potential with that from a functional form originally hypothesized by a researcher, a lack of model complexity in the latter can be identified. These flexible models can then be unpacked to generate understanding into the type of complexity missing. I illustrate the approach using simulations, and real-world case studies, and show how the framework is easy to implement, and leads to improved model specification.