Transportist: Ensembles Assemble
I believe the search for the single best model is becoming more and more misguided in a complex world with randomness, measurement error, observer bias, and so on.
Ensembles combine multiple individual models to improve predictive performance. They help to mitigate the limitations of single models and improve overall prediction accuracy. To better understand ensemble models, first consider a traditional model.
A traditional model can be represented as:
Here, y represents the output (predictions) and X represents the set of input features. The function f is a model that maps the input features (X) to the output (Y). This single model is used to make predictions based on the input data. While there may be variation around the central tendency, it is generally the central tendency that is reported.
In contrast, an ensemble model combines multiple models, each with their own unique data, estimation techniques, and functions, to produce a both a central tendency as well as alternative forecasts. The ensemble model can be represented as:
Here, y_i represents the output of each individual model (the subscript i) in the ensemble, and c is the combining rule that integrates the outputs (y_i) of all individual models to produce the final output (Y) the bold indicates that it is a matrix of outputs, not a single result. The instance of each individual component model can be represented as:
The components of the ensemble model can be explained as follows:
c: Combining rules - These are the methods used to merge the outputs from different models in the ensemble. Some common combining rules include majority voting, weighted averaging, and in the machine learning (ML) community: bagging, boosting, and stacking.1 However, unlike the narrow definitions of machine learning, ensembles can be formed not just from a single data set analysed by a single modeller, but from multiple data sets analysed by one or modellers, with not only ML but other types of models, using very different paradigms. They just need to align on the thing that is predicted.
d: Data (D) - In an ensemble, each individual model may be trained on different selection (d) of measurements, observations, or assumptions from the set of available Data (D). This helps to capture diverse patterns and improve the overall performance of the ensemble model. Even if only a single data set (d=D) is available, it can be partitioned differently, with multiple training and testing data sets to help reduce variance, as is done in ML.
e: Estimation technique (E) - This refers to the specific algorithm or approach (e) used to build the individual models (selected from the set of all possible Estimation techniques (E). The set of estimation techniques include regression, machine learning algorithms (e.g., decision trees, support vector machines, neural networks), and physical models, among others.
f: Function - This represents the selection and combination of variables within the individual models. The function can vary between models in the ensemble, allowing the ensemble to capture different relationships between input features and the output.
i: Model instance - Each individual model within the ensemble is considered a separate instance. The ensemble model combines the outputs from all instances to produce the final prediction.
Ensemble models combine the strengths of multiple individual models, each with their unique data, estimation techniques, and functions, to improve prediction accuracy and mitigate the limitations of single models. By using combining rules, ensemble models can effectively capture diverse patterns and relationships within the input data, leading to better overall performance. The evidence from our group is that it improves prediction. And giving the rise in computational performance, and the expansion of data and modeling methods, the continued reliance on a single model for understanding a complex system like human behaviour is looking backward.
Our work on ensembles
Wu, Hao, and Levinson, D. (2022) Ensemble Models of For-hire Vehicle Trips. Frontiers in Future Transportation. 3. [doi]
Wu, Hao, and Levinson, D. (2021) The Ensemble Approach to Forecasting: A Review and Synthesis. Transportation Research part C. Volume 132, 103357. [doi]
Ji, Ang and Levinson, D. (2020) Injury Severity Prediction from Two-vehicle Crash Mechanisms with Machine Learning and Ensemble Models. IEEE Open Journal of Intelligent Transportation Systems. [doi][VIDEO]
From a post on StackExchange:
Here is a short description of all three methods:
Bagging (stands for Bootstrap Aggregating) is a way to decrease the variance of your prediction by generating additional data for training from your original dataset using combinations with repetitions to produce multisets of the same cardinality/size as your original data. By increasing the size of your training set you can't improve the model predictive force, but just decrease the variance, narrowly tuning the prediction to expected outcome.
Boosting is a two-step approach, where one first uses subsets of the original data to produce a series of averagely performing models and then "boosts" their performance by combining them together using a particular cost function (=majority vote). Unlike bagging, in the classical boosting the subset creation is not random and depends upon the performance of the previous models: every new subsets contains the elements that were (likely to be) misclassified by previous models.
Stacking is a similar to boosting: you also apply several models to your original data. The difference here is, however, that you don't have just an empirical formula for your weight function, rather you introduce a meta-level and use another model/approach to estimate the input together with outputs of every model to estimate the weights or, in other words, to determine what models perform well and what badly given these input data.