Elsevier

Journal of Choice Modelling

Volume 32, September 2019, 100170
Journal of Choice Modelling

Software Paper
Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application

https://doi.org/10.1016/j.jocm.2019.100170Get rights and content

Abstract

The community of choice modellers has expanded substantially over recent years, covering many disciplines and encompassing users with very different levels of econometric and computational skills. This paper presents an introduction to Apollo, a powerful new freeware package for R that aims to provide a comprehensive set of modelling tools for both new and experienced users. Apollo also incorporates numerous post-estimation tools, allows for both classical and Bayesian estimation, and permits advanced users to develop their own routines for new model structures.

Introduction

Choice modelling techniques have been used across different disciplines for over four decades (see McFadden, 2000 for a retrospective and Hess and Daly, 2014 for recent contributions and applications across fields). For the majority of that time, the number of users of especially the most advanced models was rather small, and similarly, a small number of software packages was used by this community. In the last two decades, the pool of users of choice models has expanded dramatically, in terms of their number as well as the breadth of disciplines covered. At the same time, we have seen the development of new modelling approaches, and gains in computer performance as well as software availability have given an ever broader group of users access to ever more advanced models.

These developments have also seen a certain fragmentation of the community in terms of software, which in part runs along discipline lines. Notwithstanding the most advanced users who develop their own code for often their own models, there is first a split between the users of commercial software and those using freeware tools. Commercial packages have historically been computationally more powerful but may have more limitations in terms of available model structures or the possibility for customisation. On the other hand, freeware packages may have limitations in terms of performance and user friendliness but may benefit from more regular developments to accommodate new model structures.

A further key differentiation between packages is the link between user inputs and interface and the actual underlying methodology. Many existing packages, both freeware and commercial, are black box tools where the user has little or no knowledge of what goes on “under the hood”. While this has made advanced models accessible to a broader group of users, a disconnect between theory and software not only increases the risk of misinterpretations and misspecifications, but can also hide relevant nuances of the modelling process and mistakenly give the impression that choice models are “easy tools” to use. On the other hand, software that relies on users to code all components from scratch arguably imposes too high a bar in terms of access.

Existing software also almost exclusively allows the use of only either classical estimation techniques or Bayesian techniques. This fragmentation again runs largely in parallel with discipline boundaries and has only served to further contribute to the lack of interaction/dialogue between the classical and Bayesian communities. A final difference arises in terms of software environment. While commercial software usually provides a custom user interface, freeware options in general (though not exclusively) rely on existing statistical or econometric software and are made available as packages within these. The latter at times means that freeware packages are not really free to use (if the host software is not), while there are also cases of software being accessible only in either Windows or Linux, not both.

The above points served in large part as the motivation for the development of Apollo (cf. Fig. 1). Our aims were:

  • Free access: Apollo is a completely free package1 which does not rely on commercial statistical software as a host environment.

  • Big community: Apollo relies on R, a free software environment for statistical computing and graphics, which is very widely used across disciplines and works well across different operating systems (R Core Team, 2017).2

  • Transparent, yet accessible: Apollo is neither a blackbox nor does it require expert econometric skills. The user can see as much or as little detail of the underlying methodology as desired, but the link between inputs and outputs remains.

  • Ease of use: Apollo combines easy to use R functions with new intuitive functions without unnecessary jargon or complexity.

  • Modular nature: Apollo uses the same code structure independently of whether the simplest multinomial logit model is to be estimated, or a complex structure using random coefficients and combining multiple model components.

  • Fully customisable: Apollo provides functions for many well known models but the user is able to add new structures and still make use of the overall code framework. This for example extends to coding expectation-maximisation routines.

  • Discrete and continuous: Apollo incorporates functions not just for commonly used discrete choice models but also for a family of models that looks jointly at discrete and continuous choices.

  • Novel structures: Apollo goes beyond standard choice models by incorporating the ability to estimate Decision Field Theory (DFT) models, a popular accumulator model from mathematical psychology.

  • Classical and Bayesian: Apollo does not restrict the user to either classical or Bayesian estimation but easily allows changing from one to the other.

  • Easy multi-threading: Apollo allows users to split the computational work across multiple processors without making changes to the model code.

  • Not limited to estimation: Apollo provides a number of pre and post-estimation tools, including diagnostics as well as prediction/forecasting capabilities and posterior analysis of model estimates.

While Apollo is easy to use, we also remain of the opinion that users of choice modelling software should understand the actual process that happens during estimation. For this reason, the user needs to explicitly include or exclude calls to specific functions that are model and dataset specific. For example, in the case of repeated choice data, the user needs to include a call to a function that takes the product across choices for the same person (apollo_panelProd). Or in the case of a mixed logit model, the user needs to include a call to a function that averages across draws (apollo_avgInterDraws and/or apollo_avgIntraDraws). If calls to these functions are missing when needed, or if a user makes a call to a function that should not be used in the specific model, the code will fail, and provide the user with feedback about why this happened. This is in our view much better than the software permitting users to make mistakes and fixing them behind the scenes.

Apollo is the culmination of many years of development of individual choice modelling routines, starting with code developed by Hess while at Imperial College (cf. Hess, 2005) using Ox (Doornik, 2001). This code was gradually transitioned to R at the University of Leeds, with substantial further developments once Palma joined the team in Leeds, bringing with him ideas developed at Pontificia Universidad Católica de Chile (cf. Palma, 2016). No code is an island, and we have been inspired especially by ALogit (ALogit, 2016) and Biogeme (Bierlaire, 2003), and Apollo mirrors at least some of their features.

This paper presents a brief introduction to the capabilities of Apollo. We focus on the case of a hybrid choice model so as to give an illustration of the functionalities of the package. We illustrate this using both classical and Bayesian estimation and also explain a number of pre-estimation and post-estimation functions. Of course, in the context of an academic paper, we can only scrape the surface of the full level of detail, and furthermore, software packages change over time. For this reason, a more detailed manual (which also shows full details on function inputs) along with numerous examples (with data) and a user forum is available on the Apollo website (www.ApolloChoiceModelling.com). Users can also obtain help on specific functions directly in R, using e.g. ?apollo_mnl for help on the apollo_mnl function. The syntax in the present paper is for Apollo version 0.0.8, but should remain forward compatible where not otherwise noted in the online manual. We strongly recommend prospective users to study the actual manual in detail rather than just relying on the short overview in the present paper.

This paper does not include any comparisons with other packages in terms of capabilities or speed, so as not to risk misrepresentations but also given the growing number of freeware tools, some of which we might not be aware of. The code has been widely tested to ensure accuracy. In our view, any speed comparison offers little practical benefit. For simple models, there is a clear advantage for highly specialised code, while, for complex models, any benchmarking is impacted substantially by the specific implementation and degree of optimisation used.

The remainder of this paper is organised as follows. The following section briefly talks about installation. Section 3 discusses the econometric setup for our empirical example. Section 4 then presents the hybrid choice model application using classical estimation, with the Bayesian version covered in Section 5. A number of other functions are discussed in Section 6 before we present a summary in Section 7.

Section snippets

Installing Apollo

Apollo runs in R, with a minimum R version of 3.1.0. The easiest way to install Apollo is directly from CRAN using

This requires a working internet connection, but it has the benefit of installing all dependencies, i.e. other packages used by Apollo, automatically. Users of macOS (i.e. Apple computers) are advised to select the binary version of the package when prompted during installation. Alternatively, the source code of Apollo can be downloaded from www.ApolloChoiceModelling.com or

Empirical example setup

In this section, we describe the setup of the empirical example used in the remainder of this paper. The data file (apollo_drugChoiceData.csv) and the source files for the model using classical (hybrid_model_classical.r) or Bayesian (hybrid_model_bayesian.r) estimation are available from the software website (www.ApolloChoiceModelling.com).

Hybrid choice model example: classical estimation

In this section, we look at classical estimation of the hybrid choice model defined in Section 3. The structure of an Apollo model file varies across specifications, but a general overview is shown in Fig. 3.

Extension to Bayesian estimation

Apollo allows the user to replace classical estimation by Bayesian estimation, for all models. We do not provide details here on Bayesian theory but instead refer the reader to Lenk (2014) and the references therein. Bayesian estimation in Apollo makes use of the RSGHB package, and the user is referred to the documentation in Dumont and Keller (2019) for RSGHB-specific settings.

The key advantage for the user is that Apollo provides a wrapper around RSGHB so that the syntax in

Additional functionalities

Apollo provides many additional functionalities beyond those covered in this paper. A full overview is provided in the online manual and only some brief highlights are presented here.

Summary

In this paper, we have given a brief overview of the capabilities of Apollo. We have illustrated how a popular class of models, namely hybrid choice structures, can be easily implemented in Apollo and estimated using either classical or Bayesian estimation. Numerous functions are then available for processing of the results. Of course, a user can also make use of the many tabulation and plotting functions available in R for further analysis and formatting of model results.

Throughout the paper,

Acknowledgments

While the Apollo package is the results of many years of development, the core of this work was carried out under the umbrella of the European Research Council (ERC) funded consolidator grant 615596-DECISIONS. We are grateful to the many colleagues who provided suggestions and/or tested the code extensively, including Chiara Calastri, Romain Crasted dit Sourd, Andrew Daly, Jeff Dumont, Joe Molloy and Basil Schmid. We would like to especially thank Thijs Dekker for his contributions to

References (42)

  • A.R. Pinjari et al.

    A multiple discrete–continuous nested extreme value (mdcnev) model: formulation and application to non-worker activity time-use and timing behavior on weekdays

    Transp. Res. Part B Methodol.

    (2010)
  • A. Vij et al.

    How, when and why integrated choice and latent variable models are latently useful

    Transp. Res. Part B Methodol.

    (2016)
  • C.-H. Wen et al.

    The generalized nested logit model

    Transport. Res. Part B

    (2001)
  • M. Abou-Zeid et al.

    Hybrid choice models

  • ALogit

    ALOGIT 4.3

    (2016)
  • M. Bierlaire

    BIOGEME: a free package for the estimation of discrete choice models

  • M. Bierlaire et al.

    A heuristic for nonlinear global optimization

    Inf. J. Comput.

    (2010)
  • J.R. Busemeyer et al.

    Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment

    Psychol. Rev.

    (1993)
  • A. Daly et al.

    Improved multiple choice models

  • A.J. Daly et al.

    Using ordered attitudinal indicators in a latent variable choice model: a study of the impact of security on rail travel behaviour

    Transportation

    (2012)
  • J.A. Doornik

    Ox: an Object-Oriented Matrix Language

    (2001)
  • Cited by (387)

    View all citing articles on Scopus
    View full text