Abstract
Single-site recreation demand and dichotomous choice contingent valuation analyses are typically conducted by implementing models containing strong parametric assumptions, which are rarely underpinned by theoretical arguments. This work illustrates how these assumptions can be relaxed and the estimation conducted semiparametrically by using generalized additive models (GAMs). This approach directly estimates the degree of the variables’ nonlinearities from the data, thereby avoiding subjective choices on the smoothing parameters and offering many advantages when compared to the conventional modeling techniques that dominate the environmental economics literature. Additionally, this paper illustrates how GAMs can be specified to construct theoretically consistent willingness-to-pay measures. (JEL C14, Q51)
I. Introduction
The elicitation of welfare measures corresponding to alternative natural resource management options is an important aspect for environmental and public economics decisions. Such welfare measures are commonly obtained via stated (e.g., contingent valuation [CV]) or revealed preference methods (e.g., estimation of recreation demand [RD]). These analyzes are typically conducted using models that rely heavily upon parametric assumptions regarding the nature of consumer preferences (Cooper 2002; Haab and McConnell 2002; Crooker and Herriges 2004). Unfortunately, the imposition of specific functional forms a priori can rarely be justified on the basis of economic theory and, when unsupported by the data, is likely to generate biased welfare estimates (Kling 1989; Creel 1997; Poe and Vossler 2002). While the bias is typically small in average, in-sample measures, unconditional on the explanatory variables, it can become substantial when considering welfare measures conditional on the regressors (Creel and Loomis 1997; Belluzzo 2004). This is relevant in at least three cases: (1) if the sample is nonrandom and, therefore, the means of the explanatory variables in the sample differ from those of the population; (2) when analyzing the distributional effects of welfare changes; and (3) when transferring welfare measures to other populations or case studies.
These concerns have stimulated the application of nonparametric (NP) techniques, which allow the estimation of flexible relationships by removing the restrictive functional form assumptions underpinning parametric models. While NP approaches (e.g., Kristro¨ m 1990; Haab and McConnell 1997; Boman, Bostedt, and Kristro¨ m 1999; Crooker and Kling 2000; Cooper 2000) allow extreme flexibility, they suffer from the curse of dimensionality and, in small samples, can be grossly inaccurate. This has motivated the introduction of semiparametric (SP) techniques, which lie between parametric and NP methods. Typically, SP models assume a specific density for the error term while including the explanatory variables in a NP fashion, or jointly estimate the distribution function of the stochastic component and the relationships of interest using flexible functional forms. Various methods have been proposed to model either CV or RD data (Li 1996; Chen and Randall 1997; Creel and Loomis 1997; Cooper 2000, 2002; Crooker and Herriges 2004; Huang, Nychka, and Smith 2008).1
These approaches, albeit more flexible than standard parametric models, have not yet found many applications in empirical environmental economics studies (Belluzzo 2004). Most likely, the main reason is that their implementation requires considerable experience, particularly regarding the choice of the smoothing parameters that regulate the flexibility of the functional form, exposing the analysis to an element of both subjectivity and complexity. As pointed out by Creel (1997), Creel and Loomis (1997), and Cooper (2000), this is a delicate task, and careful modeling is critical to avoid overfitting. Furthermore, the process rapidly becomes more uncertain and prone to error as the number of explanatory variables increases. Finally, standard parametric models are typically not nested in these SP specifications, and therefore, model-reduction testing requires nonstandard inferential procedures (see Creel and Loomis 1997, for an example).
This paper addresses these issues by taking advantage of recent developments in statistical modeling by implementing generalized additive models (GAMs) for analysis of revealed and stated preference data. About 20 years ago Hastie and Tibshirani (1986, 1990) introduced GAMs, an extension of the traditional generalized linear model (GLM) framework, which encompasses the exogenous variables via smoothing terms, that is, flexible, nonlinear, differentiable functions. A well-known issue with this approach is the choice of the right amount of smoothing. Wood (2000, 2004) proposes a convenient solution by implementing GAMs coupled with a penalized likelihood function and an automated smoothing selection criterion, which estimate the optimal degree of nonlinearity of the model directly from the data. This resolves the subtle task of determining the model flexibility a priori by incorporating this choice into the actual estimation process. Compared with the cited SP approaches proposed in environmental economics, another important feature of this method is that it automatically tests the standard parametric models as special cases of the more advanced GAM specifications, providing a direct comparison with the traditional approaches applied in the CV and RD literature. Thirdly, unlike in other SP approaches, multidimensional smoothing functions of more than one exogenous variable can be estimated without incurring in severe computational difficulties. Fourthly, we show that, since GAMs can be rewritten in the standard GLM form, welfare measures for both RD and CV data can be easily derived in this framework following minor adaptations of the standard techniques implemented for parametric specifications. These measures are consistent with economic theory; that is, willingness to pay (WTP) is nonnegative and bounded from above by income and is not vexed by a priori functional form assumptions. Finally, GAMs can be also applied in many other circumstances where no theoretical justification supports a parametric form, such as hedonic price studies.
GAMs based on penalized likelihood and automated smoothing selection have been recently implemented in various disciplines such as biology (e.g., Bininda-Emonds et al. 2007), ecology (e.g., Mueter and Litzow 2008) and medicine (e.g., Clements, Armstrong, and Moolgravkar 2005). Surprisingly, these techniques have not found many applications in economics yet (an exception is Greiner and Kauermann [2007], in the context of modeling U.S. public debt) and, to our knowledge, have not previously been applied at all within the field of environmental and resource economics.
II. Traditional Nonmarket Valuation Models in The Glm Framework
This section illustrates the foundations of welfare measures elicitation using both the dichotomous choice CV (DCCV) approach and the single-site RD (SSRD) estimation. While this background is well established, a brief review is useful to demonstrate how the econometric specification of both approaches can be encompassed in the GLM framework, which represents the basic structure defining GAMs.
Dichotomous Choice Contingent Valuation
The single-bound DC approach, introduced by Bishop and Heberlein (1979), has become the most established method of elicitation in applied CV research (Haab and McConnell 2002; Boyle 2003). Considering a CV framework, the WTP for an environmental quality improvement from q0 (status quo) to q1 (alternative scenario), for each respondent, can be defined as the quantity solving the following equality:
[1]where U( ⋅ ,⋅ ,⋅, ⋅) is the indirect utility function; I is income; xis a vector of market commodities, prices, and individual characteristics; and ε is a stochastic component unobservable to the researcher. Following the random WTP approach, introduced by, among others, Cameron (1987) and Cameron and James (1987), the respondent will answer “yes” to the hypothetical CV question only if
[2]where t is the threshold price offered and βis the vector of parameters to be estimated. In this framework the researcher does not observe the WTP but rather a binary variable Vi, which takes value 1 if respondent i answers “yes” and 0 otherwise. The density function of the endogenous variable Vi can be written as the Bernoulli:
[3]where pi(β) indicates the probability that [2] is true. Common probability distributions for pi(β) are the logistic and the normal, which lead, respectively, to the logit model and the probit model. These specifications are used to estimate βand recover respondents' WTP.
Single-Site Recreation Demand
The SSRD valuation framework is based on estimating the demand function for a recreational location by using information on individuals' trips to the site under analysis. In this illustration, we consider the case in which respondents are randomly sampled from the population of interest, and therefore, we can include individuals who are not visiting the site. Alternatively, if respondents are sampled directly on site, the estimator has to be modified slightly, to take the sample-selection bias into account (Shaw 1988). In this case the endogenous variable Ri represents the number of trips for respondent i in a fixed unit of time. The demand function can be written as a count data model:
[4]where ci is the cost of traveling to the site (including the opportunity cost of time); xi is a vector of exogenous explanatory variables, including, for example, the travel cost and time taken to visit substitute sites, income, and other socioeconomic characteristics of the individual i; and βis the vector of parameters to be estimated. Given the typically low budget shares of recreational activities, the WTP for access can be calculated as the area under the income-constant demand curve for the site (Haab and McConnell 2002):
[5]where ci ∗ is the choke price, that is, the price at which the quantity demanded is zero.
A common specification for equation [4] is the Poisson distribution, with probability density:
[6]where ηi is both the mean and the variance of the distribution and is typically expressed as a log function of the exogenous variables ( ci , xi) and the parameters β. This model has been found to be too restrictive for many applications, mainly because travel cost data are often characterized by overdispersion (i.e., the variance is higher than the mean) and by an excess of zero outcomes (Cameron and Trivedi 1986; Gurmu and Trivedi 1996). Therefore, alternative modeling strategies have been proposed. A possibility is to add an additional overdispersion parameter in the Poisson distribution, which can be interpreted as resulting from unobserved heterogeneity in the data-generating process. This strategy leads to the same estimates of βas in the standard Poisson, but with the covariance matrix adjusted for overdispersion (see Cameron and Trivedi 1990, for details). On the other hand, assuming that the overdispersion follows a gamma distribution leads to another generalization of the standard Poisson: the negative binomial model.2
GLM Specification
The standard models applied in the literature to analyze DCCV data (the logit and the probit) and SSRD data (the Poisson and the negative binomial) can all be encompassed in the GLM framework, introduced by Nelder and Wedderburn (1972). Indicating with yi the endogenous variable for respondent i (with expected value μi ), with xi a vector of exogenous regressors and with βthe parameters to be estimated, a GLM can be written as
[7]where g( ⋅ ) is a known, monotonic “link function” and the quantity xi′βis often referred to as the “linear predictor.” The distribution function of the endogenous variable (yi), which can be used to represent both DCCV (Vi ) and SSRD (Ri ) responses, is assumed to belong to the exponential family, whose probability density can be written as
[8]where v(⋅) and c(⋅,⋅) are known functions, θi is the canonical parameter, and ϕ is the scale (dispersion) parameter. The mean and variance of such distributions are
[9]and
[10]Equation [9] is the key element that links the parameters of the distribution function to the model parameters β. The analyst can choose which member of the family of distributions to use by specifying the functions v( ⋅) and c( ⋅,⋅ ) accordingly. Possible densities include the Gaussian, the Poisson, the Bernoulli, the negative binomial, and the gamma. Therefore, the standard approaches implemented in DCCV and SSRD empirical studies can all be rewritten as GLMs. The model parameters can be estimated via maximum likelihood or, equivalently, by iteratively reweighted least squares.
The Bernoulli log-likelihood [3] can be derived from [8] by imposing ϕ=1, θi = log[μi/(1−μi)], v(θi) = log(1+eθi) and c(yi, ϕ) = 0. Choosing the canonical link function (i.e., a link function that equates the linear predictor to the canonical parameter),
defines the standard binomial logit model. Alternatively, the link function can be specified as the standard Gaussian inverse cumulative distribution function, Φ−1(μi),which leads to the binomial probit. Also the Poisson distribution [6] can be rewritten as a GLM by imposing ϕ= 1, θi = log(μi ), v(θi ) = exp(θi ), and c(yi, ϕ)= −log(yi!) with the log as the link function.
III. Gams: A Primer
GAMs, introduced by Hastie and Tibshirani (1986, 1990), are GLMs in which part of the linear predictor is specified as the sum of smoothing functions of the exogenous variables. Equation [7] is, therefore, generalized to
[11]where x1,i,...,xh,i are the exogenous variables entering in a parametric form with α1,..., αh the corresponding parameters, and sh +1( ⋅ ), ..., sk( ⋅ ) are the smoothing functions of the remaining variables xh +1,i,..., xk,i. The assumption of additive effects embedded in equation [11] is a special case of the more general multidimensional smoothing function s(xh+1,i, xh +2,i,..., xk,i), which can be also included in the GAM framework. GAMs are generalizations of GLMs and, therefore, can encompass different types of endogenous variables, including counts, binary responses, and censored outcomes, all commonly analyzed in nonmarket valuation studies. At the same time, by specifying the functional relationships via smoothing functions, they avoid imposing parametric restrictions a priori.3
The smoothing terms are typically represented by regression splines. These are linear combinations of known basis functions of the regressors bl,j(xj,i), and estimated parameters δl,j. Dropping the subscript i for simplicity, they are defined as
[12]where qj is the number of basis functions that determines the maximum possible flexibility of the relation between xj and y (the higher the value, the more “wiggly” will be the estimated effect). In cubic regression splines (CRSs), for instance, the range of values of the exogenous variable is divided into various segments, separated by knots, and a local cubic regression is fitted for each segment. To ensure continuity and smoothness at the knots, conditions are imposed on the first-and second-order derivatives (Keele 2008 provides an overview). As an illustration, consider a CRS with two knots (k1 and k2). The bases in [12] are then defined as b1,1(xj ) = 1, b1,2(xj) = xj, b1,3(xj )=xj2, b1,4(xj )=xj3, b1,5(xj )=I{xj > k1} (xj-k1)3, b1,6(xj)=I{xj>k2}(xj-k2)3,where I{xj>k} is an indicator function forxj> k1. For estimationthesebasescanberewritteninamore convenient fashion to minimize collinearity (e.g., Wood 2006b). In this paper we use natural CRSs,which,inordertoavoiderraticbehaviors in the extremes, constrain the fit before the first knot and after the last one to be linear (i.e., first and second derivatives are set to zero). This leads to the number of basis functions qj to be equal to the number of knots. Therefore, the amount of knots determines the overall model flexibility and nonlinearity.
Given the number and the locations of the knots, natural CRS bases can be computed as a function of the regressors, and the model can be estimated as a standard GLM via iteratively reweighted least squares. Indicating with β= {α,δ} the vector of parameters (the scale parameter ϕ can also be included, depending on the chosen distribution of the exponential family) that determines the natural parameter θi according to equations [9] and [11], the loglikelihood of a sample of n observations can be written as
[13]In practice, there is a trade-off between choosing the numbers of basis qj to be large enough to accurately represent any unknown, nonlinear relation sj ( ⋅ ) ( j = h +1,..., k) and, at the same time, avoiding the risk of overfitting. A possible solution is augmenting the likelihood [13] by including a penalty for the excessive “wiggliness” of the smoothing functions. Such a penalty can be written as a function of the integral of second derivatives of the smoothing functions, which is a measure of their curvature. The objective function to be maximized becomes
[14]with λj smoothing parameters that control the trade-off between model flexibility and overfitting by determining the amount of weight given to each penalty. Large values of λj lead to straight line estimates, while λj = 0 results in unpenalized splines. In this setting, the degree of nonlinearity of the model is no longer determined by the number of knots, which actually make little difference, but by the values of the smoothing parameters (e.g., Ruppert, Wand, and Carroll 2003). As shown by Wood (2004), equation [14] can be conveniently reexpressed as a quadratic function of the parameters as
[15]where S= ΣjλjSj and Sj are known weight matrices. Given the smoothing parameters, maximization of [15] can be achieved by setting its derivatives with respect to the elements of βto zero:
where [ ⋅]j denotes jth row of a vector. These equations correspond to the ones that would solve the penalized nonlinear least squares problem
where the variances V(yi ) are inversely proportional to some weights wcomputed iteratively. The solution can be found by penalized iteratively reweighted least squares (P-IRLS), which can be implemented as follows: choosing some starting values for the parameters, β[0], compute the linear predictor estimate and the corresponding mean response vector μ[0]. Then compute the weights w[0],
and the pseudodata z[0],
Finally, minimize the objective function,
with respect to βto find β[1] and repeat until convergence is achieved. This approach estimates the model coefficients given the values of the smoothing parameters λj. However, the problem of estimating the value of such smoothing parameters, that is, the level of nonlinearity of the model, persists. This corresponds to the issue of determining the model flexibility in the SP approaches mentioned in Section I. While with one or two explanatory variables this problem can be solved, for example, by conducting grid search procedures, this strategy becomes more uncertain and complex when the number of regressors increases. The appealing property of penalized-likelihood GAMs is that both βand λj can be jointly estimated from the data following the strategy introduced by Wood (2004, 2006a). Specifically, this can be achieved by minimizing the generalized cross validation (GCV) score, which can be approximated as
if the scale parameter is unknown (e.g., in the overdispersed Poisson case), or the unbiased risk estimator (UBRE)
when the scale parameter is equal to 1, as in the Poisson and binomial cases. The term edf indicates the “effective degrees of freedom,” which, as in standard linear models, are defined as the trace of the hat matrix A, that is, the matrix that satisfies yˆ= Ay. In penalized GAMs, Ais given by (X′WX+ S)−1X′WXwith Xthe matrix containing the spline bases. Since the λj's, which can take values from zero to infinity, are included in the definition of A, the edf of a GAM can also take all noninteger positive values. The value of edf can also be computed for each regressor separately. It indicates how nonlinear the estimate of the corresponding relation is and can be calculated by selecting the elements of Athat pertain to the related smoothing function.
Wood (2006a) indicates two possible strategies to jointly estimate both smoothing and model parameters using iterative methods. The first approach is called performance iteration and is based on finding the optimal smoothing parameter values for each model in the P-IRLS algorithm. The second approach, called outer iteration, minimizes directly the GCV/UBRE score by iterating the P-IRLS scheme for each trial value of the smoothing parameters. While the performance iteration is faster, the outer iteration is more stable and has better properties when concurvity is present, in other words, when the explanatory variables are strongly related. The appealing feature of using penalized likelihood and automated smoothing selection criterion via GCV/UBRE is that, if nonlinear relations are not supported by the data, the penalties will increase and the smoothing functions will reduce to standard linear forms. Therefore, this estimation strategy automatically tests the standard GLM as a special case of the more-advanced GAM specification.
In large samples, the parameters estimates are approximately normally distributed. However, while always consistent, they are unbiased estimators only if the value of the parameter in the population is zero. Therefore, standard inference can be used only to test hypotheses of coefficients equal to zero, for example, for choosing between two nested models.4 Alternatively, following Wahba (1983), Silverman (1985), and Wood (2006a, 2006b), a Bayesian approach to coefficient uncertainty estimation can be implemented. This strategy recognizes that, by imposing a particular penalized likelihood [14], we are effectively including some prior beliefs about the likely characteristics of the correct model. This can be translated into a Bayesian framework by specifying an (improper) prior distribution for the parameters β. Moreover, it turns out (Wood 2006b) that the “confidence” intervals derived via Bayesian theory are also well behaved from a frequentistic point of view, that is, their average coverage probability is very close to the nominal level 1−α, where α is the significance level. Therefore, one can use Bayesian intervals to approximate confidence intervals for the smoothing functions s( ⋅ ). Finally, model checking can be conducted as in GLMs, that is, by looking at the deviance residuals, which, if the model assumptions are satisfied, are expected to behave close to standard normal random variables (Cameron and Trivedi 1998). Given space constraints, we limited the illustration of the GAMs to a brief overview; for a more in-depth discussion we refer the reader to Ruppert, Wand, and Carroll (2003) and Wood (2006a).
IV. Theoretically Consistent WTP Calculation Using Gams
This section introduces theoretically consistent WTP elicitation techniques for DCCV and SSRD studies via GAMs. Given that GAMs are a generalization of GLMs, the WTP estimation can be implemented with some relatively simple modification of the standard approaches proposed in the literature.
Dichotomous Choice Contingent Valuation
Haab and McConnell (1998, 2002) discuss three criteria that a DCCV measure of WTP needs to satisfy for valuation purposes: (1) WTP is nonnegative and not greater than the available income, (2) estimation and calculation are accomplished with no arbitrary truncation, and (3) there is consistency between the randomness of estimation and randomness of calculation. The first condition requires that the proposed change is typically considered as an improvement over the present situation. The second recognizes that arbitrary assumptions, such as the choice of ad hoc truncation points, can lead to equally arbitrary WTP estimates. The third and final criterion states that the model estimation approach and the WTP elicitation technique need to be grounded in the same statistical assumptions. However, the same authors recognize that “not all models that fail to satisfy these criteria are misspecified” and that, in particular, “if the probability that WTP exceeds a bound is small enough,the problem is minor” (Haab and McConnell 2002, 85).
A possible GAM specification that satisfies all three criteria is
[16]where inci is the available household income and G( ⋅) is a known function bounded between 0 and 1 that represents WTP as a proportion of income. This model extends the parametric specification proposed by Haab and McConnell (2002) by modeling the exogenous variables xh +1,..., xk via smoothing functions. Therefore, it is not only consistent with economic theory but also does not require imposing parametric restrictions on the relationship between WTP and the regressors. A simple specification for G ( ⋅ )is
[17]In this framework, the probability of individual i responding “yes” to the offered bid ti can be written as
[18]As in the standard parametric case, the error component might be logistically or normally distributed. Assuming a logistic distribution with standard deviation σ, the probability of a positive response becomes
[19]This can be substituted into the Bernoulli likelihood [3] in order to derive a GAM with a logit link, where x1,..., xh and ln[(inci − ti )/ti] are exogenous variables encompassed in a linear form and xh +1,..., xk are exogenous variables modeled via smoothing functions. As in the standard parametric case, the parameter of ln[(inci − ti )/ti]isanestimate of 1/σ (which we refer to as σ ∗), whereas the parameters associated with the other regressors are equal to the corresponding parameters divided by σ. Again, we refer to them by adding an asterisk to the parameter. For example, the parameter relative to x1 is an estimate of α1/σ, and it is indicated by α1 ∗. After estimation, the median WTP can be calculated as
[20]where
and
Bayesian “confidence” intervals for the median WTP can be computed via Monte Carlo simulation by following Krinsky and Robb's (1986) procedure, that is, by drawing from the posterior distribution of β= {α∗,δ∗,σ ∗}, which in large samples is approximately normal, to generate a numerical estimate of the distribution of the welfare measure. As discussed by Wood (2006a, 2006b), these Bayesian intervals also enjoy desirable frequentistic properties and can be used to approximate confidence intervals for any nonlinear combination of the model parameters. Note, however, that the smoothing parameters' uncertainty (i.e., the fact that also the λj's have been estimated from the data) is not taken into account in this procedure, and therefore, the confidence intervals are only approximated. Wood (2006a, 202–4) presents fully Bayesian,computing-intensive procedures that can be used to address this limitation.5
In particular circumstances, when the expected nonmarket values are low and, therefore, WTP exceeding the income bound is not a concern, a simpler GAM specification can be implemented, which is an extension of the exponential WTP model introduced by Cameron (1987) and Cameron and James (1987). Individuals' WTP can now be defined as
[21]This model bounds WTP to be strictly nonnegative and encompasses the exogenous variables via flexible smoothing functions. The probability of a positive answer becomes
[22]Specifying the stochastic component εi as a logistic, the median WTP can be written as
[23]and the mean WTP as
[24]As in the income-bounded version, approximated confidence intervals can be derived using the Bayesian approach.
Single-Site Recreation Demand
The common distributions used for SSRD data (Poisson, overdispersed Poisson, and negative binomial) can all be encompassed in the GLM framework, and therefore, corresponding GAM specifications can be derived. In all of them, the expected number of trips to the site under valuation can be related to the explanatory variables (which include the cost of traveling to the site, ci ) via a log link function,
[25]which defines an exponential demand function. This ensures that the expected demand for visits is always nonnegative. Recalling equation [5], we can calculate the WTP for access as the area under the expected demand curve. Since in the exponential demand function the choke price c i∗ is infinite, the consumer surplus for access can be written as
[26]Unlike in the standard parametric case, this integral does not typically have a closed form and can be computed via numerical methods such as adaptive quadrature (e.g., Piessens et al. 1983). Confidence intervals can be derived using the Bayesian approach, as described in the previous section. Note that, in this case, the procedure will be more computationally intensive, since each Monte Carlo draw from the posterior distribution also requires the computation of a numerical integral. Finally, note that this model specification does not ensure that the integral will have a finite value, but neither does a parametric specification. Such an occurrence should normally be taken as an indication of poorly collected data or faults in the empirical specification, rather than a methodological flaw. Given their flexibility, GAMs should be more likely to detect these issues than standard parametric methods.
V. Implementing Gams For Nonmarket Valuation
This section illustrates the insights that GAMs can provide in applied environmental economics by presenting two empirical applications: DCCV for landfill site closure in the province of Siena, Italy, and SSRD analysis for polychlorinated biphenyls (PCB) contamination in the New Bedford area in Massachusetts (McConnell 1986; Haab and McConnell 2002).
In both datasets, distance to the improved site is likely to be an important factor in determining respondents' WTP. While economic theory implies a monotonic relationship between WTP and distance, the specific shape of this relationship is an open empirical question (for discussions see, e.g., Loomis 2000; Bateman et al. 2006). Therefore, the study of such distance effects provides an interesting example of how GAMs can be used to model unknown relations without imposing restrictive parametric assumptions.
Both SSRD and DCCV data analyses are implemented with the freely available software R (R Development Core Team 2008). GAMs estimation via penalized maximum likelihood and UBRE/GCV is based on the library mgcv (Wood 2006a). All the code relative to the WTP estimation is available from the authors on request.
CV for Landfill Site's Closure: The Province of Siena Study
The CV data refers to a survey aimed at valuing a new waste management plan designed to increase waste recycling and decrease the number of landfill sites in the province of Siena, Italy. The aim of the research was to estimate welfare measures related to different waste management options, including recycling, landfill, and incineration. In this study we focus on the environmental benefits due to landfill closure. The survey was administered face to face in the year 2000, and a full description is reported by Basili, Di Matteo, and Ferrini (2006). The data analyzed here refer to 339 individuals. Respondents were asked if they were willing to pay an additional amount ti on their annual garbage bill to improve the quality of the waste management and shutting down six of the eight landfill sites operating at that time in the province.
While the original study does not consider the distance from the respondent's home to the landfill site, we may expect a significant relationship between this variable and the WTP for landfill closure. Landfills are, in fact, commonly considered as a source of nuisance. Negative externalities include noise, operation of heavy equipment, flies and other insects, rats, odor, and appearance. Furthermore, evidence of a negative effect of landfill sites on nearby property values is widely documented (e.g., Nelson, Genereux, and Genereux 1992). In this empirical application we investigate the shape of the distance-WTP relation by using GAMs.
The explanatory variables included in our analysis are offered bid (t), road distance from the respondents' town to the nearest landfill expected to close (dist), and annual household income (inc). All monetary values are reported in Italian liras (₤), the currency used in Italy at the time the survey was conducted. For comparison, ₤10,000 roughly correspond to €5. Descriptive statistics are reported in Table 1. The data are characterized by a considerable variability and, therefore, provide an interesting case study to explore possible nonlinearities in both the income and distance effects on respondents' WTP elicited via the DCCV framework. Tables 2 and Table 3 list the percentage of positive responses to the CV question for the different bid levels and for different distances to the landfill site. The data is consistent with our expectations: higher bids and longer distances to the site under improvement correspond to lower percentages of “yes” responses.
Since the expected nonmarket values in this analysis are generally low relative to the respondents' incomes, we specify WTP as in equation [21], that is, as
[27]where εi is the stochastic component. We analyze respondents' answers following the model in equations [2] and [3] and contrast three possible parametric specifications for f(⋅ ) and g(⋅ )—linear, quadratic, and cubic— against the GAM approach.6 All the models are estimated assuming that εi is logistically distributed, and therefore, the linear version is equivalent to the standard binomial logit model. The GAM has a Bernoulli log-likelihood and a logit link function.
The estimates of the parametric models are reported in the first three columns of Table 4. The explanatory variables are highly significant, and the sign is in line with our expectations. In the linear model the WTP increases with income and decreases with distance to the nearest closing landfill site. The quadratic and the cubic specifications investigate the presence of nonlinear relationships. While the choice between the linear and the quadratic is somehow a matter of taste (likelihood ratio [LR] test with p-value 0.078), the cubic seems the preferred specification according to statistical theory (LR p-value of 0.003 and lower Akaike information criterion [AIC]).
While the three models have similar median WTP calculated at the income and distance sample means (about ₤11,900, ₤11,600, and ₤10,000) they have very different distance decay functions.7 This is illustrated in Figure 1, which compares the estimated canonical parameters (which, in this specification, are equal to the linear predictor and, therefore, are linear functions of the model parameters) and median WTPs (as in equation [23]) in the three parametric models as functions of the distance to the nearest closing landfill site (the range of distances is chosen to match the one in the estimation sample). The effects of distance on the canonical parameter, in the top half of Figure 1, are very different from each other: both nonlinear specifications present nonmonotonic behaviors that are also reflected in the WTP, reported in the bottom half. These shapes look rather strange, since simple intuition would predict a nonincreasing relationship between WTP and distance. These unexpected estimates are clearly spurious, being the consequence of fitting rigid parametric forms on data, which, in fact, are generated by different relationships. Indeed, in order to fit the strong negative effect of distance on WTP in the first 10 km, the quadratic specification requires a positive effect from above 40 km, whereas the cubic model estimates an odd S-shaped relation. Overall, while the statistical fits of the nonlinear specification are superior to that of the linear one, their economic interpretations are somehow more problematic.
An appealing strategy is to approximate the distance function using a more flexible specification. In this respect GAMs can provide significant insights. Their estimates, obtained by minimizing the UBRE score of the penalized likelihood, are reported in the last two columns of Table 2. In Model A, both the distance to the landfill site and income are included via smoothing functions. The effective degrees of freedom (edf ) indicate the estimated “wiggliness” of each smoothing function. For example, an edf equal to one means that the best approximation of s( ) is linear. In our analysis, while there are significant nonlinearities in the distance decay to the landfill site, the P-IRLS/UBRE method completely smooths out the nonlinearities in the income effect, which is estimated to be linear (edf = 1). Therefore, the model flexibility is no longer imposed a priori but directly determined by the data. For comparison, Model B in Table 2 reports the estimates of a model that assumes the effect of inc to be linear and includes only dist via a smoothing function. The parameters estimates, with the exception of the constant, are basically the same as those in Model A, and, as indicated by the UBRE score and the WTP values, the fit of two models is identical. Another advantage of GAMs is that the models can be tested against the linear specification with LR tests. In this example the approximate p-value strongly supports the GAM.
Finally, Figure 2 plots the effect of distance estimated by the GAM specification. The relationship with the canonical parameter is highly nonlinear. There is a strong negative effect for low values of distance (until approximately 15–20 km), which then levels out for the remaining range of the data. This relationship is even stronger in the WTP: individuals living very close to the landfill sites are, ceteris paribus, willing to pay three times more for its closure than those residing 10 km from it. Furthermore, for people living 20 km away or beyond, the WTP is not significantly affected by distance. The relationship is decreasing and monotone, consistent with intuition and with the results of previous studies.
SSRD and PCB Contamination: The Fort Phoenix Study
The data set used for our SSRD analysis refers to a survey conducted by McConnell (1986) to study the effect of PCB contamination in New Bedford, Massachusetts. Respondents were drawn from random phone surveys of households in the greater New Bedford area, and their trips to saltwater beaches were recorded. As in the textbook by Haab and McConnell (2002), we analyze the demand for trips to the beach at Fort Phoenix. We select a subsample of 223 respondents, corresponding to the individuals between 21 and 35 years old.8
We specify the recreational trips (Ri )to Fort Phoenix as a function of the round-trip travel costs to the site (CFP), the round-trip travel costs to a substitute site (C1), defined as the nearest beach from the respondent's home, and a dummy variable to take into account if the respondent has a parking-pass to Fort Phoenix (PFP). The descriptive statistics are presented in Table 5.
We specify the expected value of number of trips to Fort Phoenix as
[28]which leads to a Poisson model with log-link function. To take into account possible overdispersion, we also estimate the scale parameter (ϕ) by minimizing the GCV score. We consider three possible specifications for the bidimensional function of the two cost variables s(⋅,⋅): a standard, linear parametric form with interaction (s = b1CFP + b2C1 + b3CFPC1), a GAM with two separate unidimensional smoothing functions to analyze nonlinear distance effects (s =s1(CFP)+s2(C1)),andaGAM with a joint bidimensional smoothing function to also capture nonlinear interaction effects (s = s(CFP,C1)).
Table 6 reports the three model estimates. In all specifications, the estimated scale parameter is greater than one, indicating overdispersion. Assuming ϕ= 1 would have led to biased inference, overestimating the statistical significance of the parameters. The parametric model, reported in the first column, estimates a significant effect for the travel cost to the site and for the parking pass dummy. However, while with the correct sign, the effect of cost to the substitute site (C1) appears to be nonsignificant at the 10% level. Compared with these estimates, both GAMs provide a significant improvement in the likelihood and in the inference. Both the cost to Fort Phoenix and to the substitute site are now estimated to be significant and characterized by nonlinear effects. Furthermore, the better GCV score, AIC, and likelihood support the bidimensional GAM and the hypothesis of significant interaction effects.
Why is the GAM estimating a significant effect of C1 whereas the linear specification does not find any significant substitute effect? To answer this question, Figure 3 plots the expected number of visits to Fort Phoenix as a function of both C FP and C 1, for both the linear and the bidimensional GAM. The two estimated relationships for CFP, in the top half of the figure, are very similar to each other. Consistent with economic theory, both models predict the number of visits to decrease with the cost of visiting the site. On the other hand, considering the substitute's effect, in the bottom half of the figure, the models lead to different conclusions. The GAM estimates a nonlinear relation, with the number of visits to Fort Phoenix progressively increasing with the cost of visiting the substitute beach, until C1 reaches about $3.50. For higher costs, the estimated relationship is flat. In other words, C1 is important in determining the expected number of trips to Fort Phoenix only when its value is small. For values higher than $3.50, C1 does not present a significant effect. The parametric model is constrained to fit a linear function to this nonlinear behavior and, therefore, estimates a slope which is in between the steep curve below the threshold of $3.50 and the flat relationship above it. Overall, this “average” relation is not very strong and its significance is rejected by a standard z-test. In this example, therefore, by imposing the assumption of linearity we would have erroneously concluded that the travel cost to the substitute beach does not have any significant effect on the number of visits to the site when, in fact, this effect is significant, but also nonlinear.
Recalling equation [26], we can calculate the WTP for access as the area under the expected demand curve, that is, as
[29]We compute this integral by numerical methods by using the adaptive numerical quadrature provided in the R package, which is based on the QUADPACK routines developed by Piessens et al. (1983). Figure 4 plots the estimated WTP for access as a function of the distance to the Fort Phoenix beach and its nearest substitute according to the GAM with the bidimensional smoothing function. Consistent with Figure 3, WTP decreases with the distance to the site, and increases with the distance to the substitute. The relationships are generally monotonic and nonlinear. The interaction effect between the two distances seems to be substantial, suggesting that a failure to consider substitutes might generate a bias in the derived nonmarket values.
VI. Conclusions
This paper introduced GAMs for nonmarket valuation and WTP estimation for both revealed and stated preference valuation studies. We implement the penalized likelihood estimation method with automatic smoothing selection criterion proposed by Wood (2000, 2004). This technique presents various advantages with respect to both the traditional parametric models and the SP approaches recently introduced in the nonmarket valuation literature.
Compared with the parametric models, GAMs provide a systematic framework for testing more flexible functional forms without imposing any restrictive parametric assumption a priori. Equally important, the simpler linear specifications are nested in the GAMs and are automatically tested on the data during estimation. Compared with other SP approaches for SSRD and DCCV, the most important advantage of GAMs estimated by penalized likelihood with GCV/UBRE is that the degree of nonlinearity of the model is automatically determined by the data. This eliminates the complex and subjective task of determining the model flexibility a priori by incorporating it within the estimation process. This characteristic makes GAMs attractive for applied research. Furthermore, as illustrated in the SSRD case study, multidimensional smoothing functions of more than one predictor can be included without incurring in severe computational difficulties. Moreover, since GAMs are generalizations of the standard parametric approaches developed in the literature, theoretically consistent WTP estimation can also be implemented via relatively straightforward modifications of the established routines. The downside is that, unlike some other SP approaches (e.g., Chen and Randall 1997; Creel and Loomis 1997), GAMs are not distribution-free estimators. However, the density function of the endogenous variables can be chosen among any of those belonging to the exponential family, allowing flexibility and comparison across different specifications. On this point, Abe (1999) shows that GAMs are remarkably robust to misspecifications in the error term. Another appealing feature is that GAM estimation does not require particularly large samples, as confirmed by our case studies, which are both based on fewer than 400 observations.
Of course, it should be noted that GAMs are no panacea. Poorly designed samples, unobserved variables, and other specification errors will certainly affect the performance of these estimators. GAMs use fewer a priori assumptions and rely more on the data than parametric models and, therefore, might be less robust to these issues. For example, GAMs perform well when the observed values of the exogenous modeled via smoothing functions are randomly distributed across the range of their possible values. However, the results can be much less satisfactory when those values are concentrated around a few points leaving large portions of the data range “uncovered.” This lack of information typically will not allow GAMs to estimate meaningful smoothing relations. However, this is not an entirely undesirable property, because it promotes the detection of poor sampling strategies, and the potential resolution of these problems, rather than their avoidance.
Finally, while this paper presents GAMs for dichotomous choice and count data, this methodology can also be applied in other frameworks. GAMs can be implemented in any context in which theory fails to provide guidance on the functional forms describing the variables of interest, and the statistical model can be written as a GLM, including linear and censored data. For example, this approach can be used for the estimation of hedonic price functions for which, as pointed out by Ekeland, Heckman, and Nesheim (2004), there exists no theoretical justification for parametric specification. We are optimistic in thinking that GAMs, coupled with the fast and stable estimation routines now available, can be helpful in supporting a more widespread use of flexible functional forms in applied studies. Additional work in this area may prove rewarding. In particular, it seems worthwhile to extend GAMs to multivariate distributions, such as the ones used in multisite RD and choice experiment applications.
Acknowledgments
We are grateful to the editor and two anonymous referees for their comments, which helped improve the quality of the paper. Many thanks to Kenneth McConnell for sharing with us the New Bedford travel cost data. We are also grateful to Ian Bateman, Cathy Kling, Joe Herriges, and the participants of the 11th Occasional Workshop on Environmental and Resource Economics at UCSB, to which a previous version of this paper was presented. All errors are the sole responsibility of the authors. This research was supported by the SEER project, which is funded by the UK ESRC (Reference RES-060-25-0063).
Footnotes
The authors are, respectively, researcher, Centre for Social and Economic Research on the Global Environment (CSERGE), School of Environmental Sciences, University of East Anglia, Norwich, U.K., and lecturer, Economics, University of Siena, Italy; and researcher, Centre for Social and Economic Research on the Global Environment (CSERGE), School of Environmental Sciences, University of East Anglia, Norwich, U.K.
↵1 Another approach is to specify a parametric functional form for the exogenous variables without imposing a specific density for the error term (for a review see Powell 1994). However, this strategy has not found many applications in environmental economics.
↵2 Other strategies have been proposed in the literature: for example, hurdle models (Mullahy 1986) and zero inflated Poisson models (Mullahy 1986; Haab and McConnell 1996). For a thorough illustration see, for example, Gurmu and Trivedi (1996) and Cameron and Trivedi (1998).
↵3 Note that smoothing functions cannot be applied to noncontinuous variables, such as categorical and dummy variables, which need to be included in a parametric form.
↵4 Note that, however, inference will only be approximate because the standard distributions used for testing do not take into account the smoothing parameter uncertainty. Discussion is provided by Wood (2006a, 194–96).
↵5 As in the standard parametric case, mean WTP does not have a close form solution, and numerical procedures can be used to integrate under the function of the WTP.
↵6 We also tested more general specifications with interaction terms. These parameters, however, were never significant and, therefore, are not discussed here.
↵7 These values roughly correspond to €6.15, €6.00, and €5.19.
↵8 We decided to not include the entire New Bedford data because we found evidence of variation in the parameters of distance among the different age groups, and we felt that discussing such a feature would lead us too much astray from the scope of this paper. This choice has the additional advantage of demonstrating the performance of GAMs in small samples.










