Convergent Validity and the Time Consistency of Preferences: Evidence from the Iowa Lakes Recreation Demand Project

DongGyu Yi and Joseph A. Herriges


We examine the consistency of preferences over time and as implied by revealed versus stated preference (contingent behavior) data. Studies in this area are typically based upon a single site and focus on consumer response to changes in travel cost or site access. We use data from the Iowa Lakes Project, which provides information on usage patterns over several years and for 132 lakes, along with detailed water quality data. This allows us to test for convergence in how individuals respond to changing site characteristics, as well as the consistency of preferences over time and between actual versus contingent trips. (JEL Q51)


The task of valuing environmental amenities is hampered by the lack of direct markets for such goods. To fill this void, researchers have turned to a variety of revealed preference (RP) and stated preference (SP) methods to infer the values of interest.1 For example, recreation demand models are often used to estimate the value of both sites as a whole and their individual attributes (e.g., water quality) by modeling individual visitation patterns as a function of travel cost and site characteristics. The logic is that individuals reveal information about the value they place on an environmental amenity by incurring travel costs to reach sites with the amenity. One practical problem with this approach is that there may be little variation among the available sites with the amenity of interest, making it difficult to tease out its marginal value to consumers. SP techniques can help to alleviate this problem. In particular, contingent behavior surveys ask individuals how they might shift their visitation patterns in response to changes in site access, travel costs, or individual site attributes. This provides not only the variation needed to identify the marginal impact of a particular amenity but, as noted by von Haefen and Phaneuf (2008), also avoids potential omitted variables bias through the random assignment of survey scenarios.

The potential problem with combining SP and RP data sources is that they may not be driven by the same data-generating process. One might, for example, be concerned that respondents to a contingent behavior survey have an incentive overstate their additional trips to a site improved under a proposed policy scenario. By doing so, the respondent encourages policy makers to adopt the change, creating the option for future use of the improved site and, depending upon how the policy is to be paid for, incurring little or no direct cost.2 Problems can also emerge when the data sources differ in a temporal dimension. For example, anticipated trips for the coming year, even without changing conditions, can differ significantly from actual trips because respondents are overly optimistic about their future recreational activities (much like one might overstate future trips to the gym). Even comparisons in actual behavior over time may be problematic as preferences evolve or are impacted by changes over time in unobservable individual or site attributes.

There have been a number of papers in the literature examining the convergent validity of valuations based on actual versus contingent behavior responses (e.g., Azevedo, Herriges, and Kling 2003; Grijalva et al. 2002; Loomis and Richardson 2006). However, most of these studies are based upon a single site and focus on consumer response to changes in either travel cost or site access. The purpose of this paper is to examine the convergent validity of nonmarket valuations (and the underlying preference parameters) along several dimensions, drawing on a unique database from the Iowa Lakes Project. The Iowa Lakes Project is a multiyear panel study of the usage patterns of Iowa households with regard to the 132 primary recreational lakes in the state. Of particular interest for this paper are the datasets collected in the 2004 and 2005 surveys. In 2004, households were asked to report on the following:

  • Actual single-day trips to each lake in 2004 (AT04)

  • Expected single-day trips to each lake in 2005 under current water quality conditions (ET05)

  • Expected single-day trips to each lake in 2005 contingent on water quality improvements to a subset of the lakes (CB05)

In 2005, these same households were ask to report on their actual single-day trips to each of the 132 lakes (AT05). The four datasets allow for a total of six pairwise convergent validity tests, including convergence between actual and expected trips under fixed water quality conditions (AT05 vs. ET05), convergence between expected trip responses with and without water quality improvements (CB05 vs. ET05), convergence between actual trips in differing years (AT04 vs. AT05), and convergence between actual and contingent behavior responses (AT05 vs. CB05).3 For each pairwise comparison, we examine convergence along three dimensions:

  • Convergence in individual parameters (such as the marginal utility of income)

  • Joint convergence in the parameter estimates

  • Convergence in the implied welfare measures for a series of policy scenarios


An extensive literature has emerged over the past two decades, in terms of combining SPs and RPs (see, e.g., Cameron 1992; Whitehead et al. 2008) and in testing for the convergent validity of the two data sources (e.g., Carson et al. 1996; Carson and Hanemann 2005). While much of this literature has focused on contingent valuation as the source of the SP data, there are a number of studies drawing on contingent behavior data. Adamowicz, Louviere, and Williams (1994), for example, combine the RP implied by visitation patterns to 24 recreational sites with SPs elicited through a choice experiment. They find that the fundamental preferences in both cases are similar, and that combining the RPs and SPs yields benefits in estimation. They conclude that the multicollinearity among quality attributes, which often plagues RP data sources, can be ameliorated through a well-designed SP survey.

Englin and Cameron (1996) combine observed behavior and contingent behavior responses to an increased travel cost to estimate the demand for recreational angling. Like Adamowicz, Louviere, and Williams (1994), they conclude that contingent behavior data can be a valuable supplement to observed data. Grijalva et al. (2002) test the convergent validity of contingent behavior trip data for three different levels of recreational site access: data before and after a policy to restrict climbing access, and hypothetical changes in site access. They suggest that contingent behavior data can be a useful supplement to RP data when the range of quality changes envisioned by a policy proposal are historically unobserved. Loomis and Richardson (2006) investigate Rocky Mountain National Park visitation behavior with regard to climate change to compare observed behavior and intended behavior. They also do not find any statistical difference between the preferences implied by their RP and contingent behavior data.

Though the above literature supports the consistency between RP and SP data, other studies reject convergent validity. Adamowicz et al. (1997), for example, test consistency between observed and contingent behavior data in their moose hunting demand study and, at least for some modeling configurations, reject convergent validity. von Haefen and Phaneuf (2008), using the same data source and a mixed logit framework, also reject consistency between the implied RPs and SPs. Azevedo, Herriges, and Kling (2003) combine RPs and SPs under proposed higher trip costs using data from the Iowa Wetlands Survey. They test the hypothesis of consistency between the RP and SP data sources using four different definitions (degrees) of consistency and find that in all cases the hypothesis of consistency is rejected. Whitehead et al. (2010) consider the consistency in the context of trips to beaches in southern North Carolina. They construct three models: a Kuhn-Tucker model, a single equation count data model of RP-SP trips, and a count data demand system. Their results are mixed. Their Kuhn-Tucker and RP-SP models are convergent valid in trip prediction, while they are convergent invalid in terms of implied welfare effects. In addition, their count system model is convergent invalid with both the Kuhn-Tucker and RP-SP models.

One of the limitations with most studies combining RP and contingent behavior data is that the actual trips data lack significant variation in the underlying site characteristics, either because it is not there or it is not observed, making it difficult to isolate the impact that these characteristics have on the RP trips. Indeed, this is one of the primary arguments for combining RP and SP data. The problem, of course, is that convergent validity tests in this setting are essentially limited to testing convergence in the implied marginal utility of income and unable to test for convergence in the marginal willingness to pay for site attributes. One exception is a study by Jeon and Herriges (2010), who draw on a portion of the Iowa Lakes data employed in this paper. Specifically, they develop a joint repeated mixed logit model of observed trip patterns in 2004 (AT04) and expected trips in 2005 under both baseline (ET05) and improved water quality (CB05) conditions. Within their structural model, they focus on two hypotheses. First, they test whether individuals respond differently to proposed water quality changes than they do to existing cross-sectional variation in water quality (comparing CB05 and ET05 trip choices). Second, they test whether households anticipate changes in their patterns of trips between 2004 and 2005 (comparing AT04 versus ET05). Both hypotheses are soundly rejected. In particular, households appear to be more responsive to actual water quality differences across sites than they are to proposed changes embodied in the SP questions.

There are three limitations of Jeon and Herriges’s (2010) paper that we seek to address here. First, Jeon and Herriges rely on a mixed logit framework to jointly model individual usage patterns from the disparate RP and SP data sources and, within this joint framework, test for convergent validity. The advantage of this approach is that it explicitly models the correlation between the different data sources (see Herriges and Phaneuf 2002). If the assumed correlation structure is correct, estimation of the joint model will yield efficiency gains. The downside of the approach lies in its robustness. Any subsequent consistency test is conditional on the assumed correlation structure. If the assumed correlation structure is incorrect, then the corresponding convergent validity tests will be incorrect as well. We instead rely on a bootstrapping approach so as to remain agnostic as to the form of any RP/ SP correlations. In doing so, we potentially sacrifice efficiency for the sake of robustness.4 Second, Jeon and Herriges do not control for potential unobserved site-specific factors influencing RP and SP choices. Following Murdock (2006), we incorporate a full set of alternative specific constants (ASCs) into our random utility model, using a second-stage regression to disentangle the role that site-specific attributes might have on these ASCs. Third, unlike Jeon and Herriges, we make use of observed trip behavior in 2005 (i.e., AT05) to examine the consistency of preferences over time.


The primary data source for this paper is the Iowa Lakes Project, a multiyear panel data study jointly funded by the Iowa Department of Natural Resources and the U.S. Environmental Protection Agency. The project was designed to track the annual visitation patterns of a randomly selected sample of Iowa households to the 132 primary recreational lakes in the state. Understanding how travel costs to and attributes of the sites impact where households choose to recreate allows one to infer the value placed both in the sites themselves and in individual site characteristics. Moreover, the panel nature of the dataset provides additional opportunities to control for unobserved factors influencing recreation demand.

The original project was designed to run for four years, 2002 through 2005, though subsequent funding from the Iowa Department of Natural Resources and the U.S. Environmental Protection Agency has allowed for additional surveys in 2009 and 2014. The survey instruments have varied from year to year in order to incorporate topics of interest to both the funding agencies and project’s principal investigators. For example, some years included contingent valuation questions, while others focused on better understanding the attitudes of Iowa residents toward water quality or potential restoration projects. However, a common core was included in all of the surveys, with a section eliciting the numbers of trips to the primary recreation lakes in the state and a section gathering sociodemographic data.

This paper draws on data from the 2004 and 2005 surveys.5 The primary reason for this is that the 2004 survey included a unique contingent behavior component. In addition to asking households how often they visited each of the 132 primary lakes in the state (AT04), respondents were asked how many trips they anticipated taking to these same lakes in 2005 under current water quality conditions (ET05), as well as how many trips they would take given improvements to a subset of the lakes (CB05). Both the current and improved water quality conditions were conveyed using a water quality ladder (Figure 1). The water quality ladder, as described by Mitchell and Carson (1989), characterizes water quality in terms of allowable uses (boatable, fishable, swimmable, etc.). In introducing the proposed water quality changes, the survey included the following explanation of the water quality ladder:

The top of the water quality ladder stands for the best possible quality of water, and the bottom of the ladder stands for the worst. On the ladder you can see the different levels of water quality. For example: The lowest level is so polluted that it has oil, raw sewage, and/or other things in it like trash; it has almost no plant or animal life, smells bad, and contact with it is dangerous to human health. Water quality that is “boatable” would not harm you if you happened to fall into it for a short time while boating or sailing. Water quality that is “fishable” is a higher level of quality than “boatable.” Although some kinds of fish can live in boatable water, it is only when water is “fishable” that game fish like bass can live in it. Finally, “swimmable” water is of a high enough quality that it is safe to swim in and ingest in small amounts.

The proposed water quality improvements were then described as increasing the water quality throughout the state to be at least swimmable (a 7 on the water quality scale). Specifically, 52 lakes would be improved to swimmable, while the remaining lakes would remain unchanged.

Following the description of the policy scenario, the 2004 survey elicited actual trips in 2004 and expected trips in 2005 under current water quality conditions. As illustrated in Figure 2, which depicts a portion of the trip spreadsheet, the current water quality ladder rating for each lake was provided in terms of an integer rating from 0 to 10, with the water quality ladder itself repeated on each page of the trip spreadsheet. The water quality ladder ratings under the proposed policy were also provided on a lake-by-lake basis and respondents were asked to indicate their anticipated numbers of trips. What is unique about this contingent behavior study is that it not only involves changes to a series of lakes, whereas most contingent behavior studies focus on a single site, but it also gathers contingent trips to both the changed and unchanged sites. As Jeon and Herriges (2010) note, the responses exhibit the anticipated pattern of households, on average, reporting additional trips to the improved sites and fewer trips to (i.e., substituting away from) the sites that were not improved. Data from the 2005 survey are also included in our analysis as they provide actual trip information for 2005 (AT05) that can be compared to both the prior year’s actual trips (AT04) and the expected trips (ET05) reported in 2004 for 2005.


Excerpt from 2004 Survey

Both survey instruments were administered via direct mail using components of the Dillman approach. Specifically, in both instances an initial mailing was followed up with a postcard reminder after approximately two weeks, and a second copy of the survey instrument was mailed if no response had been received after four weeks. A $10 incentive was provided upon returning a completed survey. The 2004 survey was mailed to the 5,206 Iowans who had completed the 2003 survey, yielding a total of 4,242 responses.6 The 2005 survey was then mailed to these respondents, yielding 3,993 completed surveys.

In analyzing the data, we restrict the sample along three dimensions. First, we exclude households taking more than 52 trips (i.e., more than one trip per week on average) in a given year. The rationale for this restriction is that our interest is in trips taken explicitly for the purpose of recreation. Examination of the data for households taking more than 52 trips in a year indicates that these individual typically live close by an individual lake. The concern is that these individuals are simply passing by their hometown lake on their way commuting to or from work. This restriction reduces the 2004 and 2005 samples by approximately 3% and 5%, respectively. Second, we employ a balanced panel, requiring responses for all four trip types (AT04, AT05, ET05, and CB05). This reduces the sample to 2,150 observations.7 Third, and finally, we focus our attention on the 100 most frequently visited sites. We do this for two reasons. First, as described below, we employ a bootstrapping procedure to account for potential correlation across the alternative trip categories. Specifically, in constructing the bootstrapped samples, we select an individual and include all of that person’s trip data. For infrequently visited sites, a given bootstrapped sample may not include any visitors to the site, making it impossible to estimate the ASC for that site. Restricting our attention to the 100 most frequently visited sites eliminates this as a problem in practice. Second, the bootstrapping procedure is computationally intensive, and focusing on the 100 most frequently visited sites alleviates this problem as well. As a practical matter, the 100 most frequently visited sites account for roughly 98% of the actual trips reported in the 2004 and 2005 Iowa Lakes surveys.

The second data source used in this paper is provided by Iowa State University’s Limnology Lab. A unique feature of the Iowa Lakes Project is that it ran parallel to a five-year Limnology Lab study monitoring the water quality of the 132 primary recreational lakes three times a year from 2001 to 2005. The lab gathered 13 distinct measures of water quality, including measures of Secchi transparency, total phosphorus, and total nitrogen used in the current study. Of particular importance is the fact that Iowa lakes vary significantly along these water quality metrics, with some of the cleanest and dirtiest lakes in the country. Secchi transparency, for example, which measures how far into the lake water one can see, ranges from 0.3 m (less than 1 ft) to over 6 m (or nearly 20 ft). In addition to water quality variables, the Iowa Department of Natural Resources provided information on site characteristics for each lake, including lake size and a series of dummy variables indicating the presence of a boat ramp, wake restrictions, and handicap facilities and whether the lake had an associated state park.

Finally, travel cost plays an important role in recreation demand models. In the current paper, travel cost (Cij) for individual i to visit site j is calculated using the following formula: Embedded Image [1] where f denotes fuel costs, dij and tij denote the roundtrip travel distance and time, respectively, required for individual i to visit site j, and ωi denotes individual i’s opportunity cost of time. In this paper, PCmiler was used to compute both trip distance and time.8 CPI adjusted gasoline prices (dollars/gallon) divided by average fuel efficiency of U.S. light-duty vehicles (miles/gallon) is used as fuel cost (dollars/mile).9 For opportunity cost of time, we use one-third of the household’s wage rate, where the wage rate is calculated as the CPI adjusted household income divided by 2,000.

Table 1 provides summary statistics for the final database, including average numbers of trips by trip category. The average annual RP trips per person are similar in 2004 (AT04) and 2005 (AT05), with less than a 4.5% increase in trips between the two years. There is, however, a substantial difference (over 25%) between the actual trips taken in 2005 (AT05) and those anticipated in 2004 for 2005 (ET05). This is consistent with the notion that individuals are overly optimistic regarding their future recreation activity, much like one might be overly optimistic about one’s use of a gym membership. Comparing expected 2005 trips under current conditions (ET05) to those under hypothetically improved conditions (CB05), we see that total trips indeed increase with the improvement, though only by a modest 1.5%. These comparisons, of course, potentially mask differences in the patterns of trips across the data sources. We capture these differences below by comparing nested logit models of the various data sources.


Summary Statistics (N = 2,150)


The basic model estimated for each of the data types is the repeated nested logit (RNL) model originally proposed by Morey, Rowe, and Watson (1993). Within a random utility maximization framework, the RNL model assumes that individuals choose from among J + 1 alternatives on a series of T choice occasions, where J denotes the number of sites in the choice set (lakes in our case), with the outside (“stay at home”) option available on each choice occasion. In our application T = 52, corresponding to an average of one choice occasion each week. The utility that individual i receives from choosing alternative j on choice occasion t for data type v (v = AT04, AT05, ET05, and CB05) is assumed to take the form Embedded Image [2] where j = 0 denotes the stay-at-home option, Ziv denotes individual characteristics, Cijv denotes travel cost, Xjv denotes observed site attributes, and Wijv denotes interactions between individual characteristics and site attributes. A key feature of the model is the inclusion of ξjv, capturing site characteristics that are unobserved by the analyst, but assumed known to the decision-maker. This is the model structure advocated by Murdock (2006). Unobservable site attributes are realistic possibilities in this setting, as analysts typically have relatively few observable site attributes available. As Murdock demonstrates in her paper, allowing for these unobservables is important from an econometric perspective for two reasons. First, because the unobservables are potentially correlated with Cijv, Xjv, and/or Wijv, it is important to control for them in estimation so as to avoid potential omitted variables bias, particularly for the key travel cost parameter (τv). Second, Murdock demonstrates that estimating the model in [2] without controlling for the possibility of unobserved site attributes can significantly understate the estimated standard errors for the observed site attribute parameters (i.e., βv).

A cost of controlling for the unobservable factors is that a two-stage estimation approach is required to recover all of the parameters of the model in [2]. This can be seen by rewriting [2] as Embedded Image [3] where Embedded Image [4] is an ASC associated with site j. The parameter δjv captures how both observed and unobserved factors impact the utility received by visiting site j. As described in more detail in the next section, estimation of the parameters of the model proceeds in two stages: Stage 1 estimates the parameters (including the ASCs) in [3] and Stage 2 recovers the parameters αv and βv by estimating equation [4] using the fitted ASCs from Stage 1. Any concerns regarding correlation between Xjv and ξjv are isolated to the second-stage regression (protecting the Stage 1 parameters from such bias) and can be dealt with, as necessary, using instrumental variables techniques. It is important to note that the two-stage estimation process does not alter the role of the main site attributes (as captured by Xjvβv), but simply alters the process by which the relevant parameters are recovered.

The final step in the model specification is the choice of the underlying distribution for the idiosyncratic error terms, that is, the 𝜖ijtv. We employ a common specification in the literature, assuming that the error vectors 𝜖≡(𝜖i0tv,𝜖i1tv,...,𝜖jtv) are independent and identically distributed across individuals and choice occasions and drawn from a generalized extreme value distribution inducing a nested logit structure in which the J lake sites form a nest and the stay-at-home option forms a singleton nest. The corresponding choice probability that individual i will choose alternative j on choice occasion t for data type v becomes Embedded Image [5] where Embedded Image [6]

Pi0v denotes the probability that individual i chooses to stay at home on any given choice occasion for data type v, PiTrip,v = (1 – Pi0v) denotes the corresponding probability that the individual takes a trip to one of the sites, and Pij|Trip,v denotes the probability that this person chooses to visit site j conditional on having decided to take a trip. The dissimilarity coefficient, θv ∈ (0,1], determines the correlation among the 𝜖ijtv’s for the trip alternatives (i.e., j = 1,..., J) on a given choice occasion. As θv shrinks toward zero, the correlation among these alternatives increases and the alternatives become more similar in terms of the utility provided. The model reduces to a standard logit structure if θv = 1. The contribution of individual i to the log-likelihood function for data type v becomes Embedded Image [7] where nijv denotes the number of times that individual i chooses alternative j for data type v.

Two versions of the model are reported below. In Version A, water quality is captured via a single water quality measure, namely, the water quality index, WQI. A site’s WQI enters the utility received from visiting that site both as a main effect (i.e., WQI is one of the site attributes in Xjv in equation [2]) and as interacted with demographic characteristics, with Wijv = WQIjv · Ziv in equation [2].10 In Version B, WQI is replaced with three water quality attributes: Secchi transparency (a measure of how far into the water one can see), total phosphorus (TP), and total nitrogen (TN). Secchi transparency (ST) is particularly appealing as an indicator of water quality given that it is readily apparent to recreators. All three water quality measures appear as main effects (i.e., all three are included in Xjv). However, only Secchi transparency is interacted with sociodemographic characteristics (i.e., Wijv = STjv · Ziv in equation [2]).


The separate estimation of the RNL model for each of the data types based on the log-likelihood function in [7] is a straightforward task but would ignore two sources of correlation inherent in the data. First, within the individual data type models, the structure ignores potential correlations across choice occasions for the same individual. As a result, traditional maximum likelihood estimation standard errors would overstate the precision with which the parameters are estimated. One approach to this concern would be to explicitly model such correlation through the inclusion of random parameters shared across choice occasions, as was done by Jeon and Herriges (2010). The downside of this approach is that it relies on specific structural assumptions regarding the form of the correlation. Alternatively, one could cluster the standard errors at the individual level (e.g., Wooldridge 2003), allowing for a more general form of correlation across choice occasions.

Second, since the four data types are drawn from responses by the same set of individuals, there is also potential correlation across the data types. This correlation is particularly important, since it will induce correlation among the parameters estimates for the different data types, which will in turn impact any test for differences across the implied data type preferences. In this case, the random parameters option is again available but would suffer from the additional problem of requiring joint estimation of the models for the four data types. Clustering the standard errors across data types in a pooled data model would control for such correlation in constructing the standard errors for the estimated parameters. However, it would not provide an estimate of the correlation itself. Such correlation estimates are essential to testing for convergence between any two data sources.

We, instead, account for both sources of correlation by using a bootstrapping procedure. Specifically, B = 1,000 bootstrap samples were generated by drawing N = 2,150 individual respondents with replacement from the original sample. The bootstrap sample b is represented by the index set Embedded Image, where Embedded Image denotes the number of the respondent chosen as the jth observation in the bth bootstrap sample, with b =1,..., B. These respondents were then used to create bootstrapped datasets for each data type (denoted by Embedded Image, b = 1,..., B, v = AT04, AT05, ET05, and CB05). The datasets Embedded Image consist of the trips vectors ni·v ≡ (ni0v,ni1v,...,niJv), the travel cost vectors Ci·v≡(Ci1v,...,CiJv), and the vectors of individual characteristics (Ziv) stacked for each i in the index set Embedded Image (including duplicates). Note that any correlation across choice occasions and data types induced by having the same individuals providing survey responses is directly reflected in bootstrapped samples’ data generating process.

With the bootstrapped datasets in hand, we then estimate the parameters of the RNL model for each bootstrapped sample and each data type. The advantage of this approach is that we can estimate the models for the data types separately, while capturing correlation in the parameters across data types through the bootstrap’s data generating process and without imposing a specific structure on the correlation patterns.11

Estimation of the model for a given bootstrap proceeds in two stages (as described by Murdock [2006]). In Stage 1, the parameters of the model in equation [3], Embedded Image, are estimated via maximum likelihood, including the full set of ASCs. These ASCs absorb any potential unobserved site characteristics (ξjv’s), insulating the key travel cost parameter from potential omitted variables bias caused by such factors. A second-stage estimation is used to recover the parameters in [4] (i.e., Embedded Image and Embedded Image) for each bootstrapped sample. The ASCs from Stage 1 can be thought of as capturing the overall “appeal” of the individual sites. This second stage regression then examines the correlation between this appeal of a site and site attributes, such as site facilities (e.g., boat ramps) and lake water quality. To reflect the sampling variability due to the sites included in the model, the second stage includes a second bootstrapping step in which the sites are randomly selected with replacement for this regression.


Drawing on the results from the bootstrapping process, we consider five broad categories of hypothesis tests. First, we can examine the consistency of individual parameters across data types. For example, in the case of the travel cost parameter, one can test the hypothesis that the travel cost parameter is the same for data types v and v′: Embedded Image [8]

We conduct these tests employing percentile intervals (Efron and Tibshirani 1993) constructed using the paired parameter differences from the bootstraps. For example for the travel cost coefficient, the 90th percentile interval is constructed by selecting the 50th and 950th highest values from the ordered set of Embedded Image. The hypothesis H0A is rejected at the 90% confidence level if the 90th percentile interval does not contain zero. Similar tests are conducted at the 95% and 99% confidence levels.

Second, joint hypotheses of the equality of parameters across data types can be tested using Wald statistics. For example, recall that Embedded Image denotes the vector of first-stage parameter estimates for data type v using bootstrap sample b. One can test the joint hypothesis that this parameter vector is the same across data types v and v′, that is, Embedded Image [9] using a standard Wald test with the Wald-statistic, Embedded Image [10] where Embedded Image [11] denotes the vector of mean paired difference in parameters and Embedded Image [12] denotes the variance-covariance matrix of this paired difference across the B bootstrapped samples.

The limitation of hypothesis tests such as H0A and H0B is that they ignore potential changes in the scale of the unobservable factors across data types, that is, changes in the variance of the 𝜖ijtv’s in equation [2]. To allow for changes in scale, the model parameters relative to the travel cost parameter are examined, both individually and jointly.12 For example, the corresponding test for the ASC for site 1 would take the form Embedded Image [13] where Embedded Image [14] denotes the ASC for site 1, data case v, relative to the corresponding travel cost parameter. The necessary percentile intervals can be constructed in a fashion analogous to what was used to test the hypotheses H0A, only now using parameter ratios rather than their original levels. Likewise, joint hypothesis tests (H0D) are conducted using Wald statistics constructed using vectors of parameter ratios.

There is a potential concern with using a Wald test in the context of parameter ratios, since they are unlikely to be normally distributed.13 As an alternative perspective on the problem, however, one can reparameterize the model in equation [3] as Embedded Image [15] where Embedded Image [16] Embedded Image [17] and Embedded Image [18]

This is what Scarpa, Thiene, and Train (2008 996) refer to as writing “utility in WTP space.” The advantage here is that we can estimate the parameter ratios (i.e., ηv, ζv, and ψjv) directly, and use these as an alternative means of conducting the joint hypothesis tests for relative parameters (i.e., H0D).

The fifth and final set of hypotheses we consider examines the consistency across data types in the welfare impact of site closures. Specifically, we test hypotheses of the form Embedded Image [19] where Embedded Image [20] denotes the average annual compensating variation associated with the loss of site j in data case v. The hypothesis tests H0E are conducted using percentile confidence intervals based on paired bootstrap sample differences Embedded Image. We test hypothesis H0E for two sites: Saylorville Lake (the most popular site in the dataset) and Big Creek Lake (a moderately popular site).


Tables 2 and 3 provide the Stage 1 parameter estimates for the model identified in equations [3] and [5], which are those associated with the sociodemographic factors impacting the stay-at-home option (γv), the interactions between demographic factors and water quality (ρv), travel cost (τv), and the dissimilarity coefficient (θv).14 The two versions of the model reported in Tables 2 and 3 differ in terms of the water quality measure interacted with the sociodemographic characteristics, with Version A (Table 2) measuring water quality in terms of the water quality ladder index (WQI) and Version B (Table 3) measuring water quality in terms of Secchi transparency. Tables 4 and 5 provide the corresponding pairwise differences in parameter estimates across the four datasets used in testing hypotheses of the form H0A. For both versions of the model, the parameter estimates show a great deal of consistency across the four datasets in terms of sign, size, and statistical significance. The travel cost parameter is consistently negative and statistically significant at a 1% level, falling in the narrow range between -0.016 and -0.020. The estimated dissimilarity coefficient, as expected, lies within the unit interval for all four datasets, indicating a relatively strong nesting of the trip alternatives, with θv between 0.241 and 0.293. For all of the models, older adults and women (Gender = 0) are found to be significantly less likely to take a trip on a given choice occasion (all else equal). College graduates are consistently more likely to stay at home, though this result is not always statistically significant. Larger households are significantly more likely to take trips in two cases (Version B: ET05 and CB05). Examining the interactions between water quality and demographic characteristics, few clear patterns emerge. For both versions of the model, college graduates are more positively influenced by a site’s water quality when choosing where to recreate, with the corresponding ρv > 0 and statistically significant at a 1% level in all but one case. Larger households also appear to be more responsive to site water quality, though this effect is typically not significant.


Stage 1 Parameter Estimates Based on B = 1,000 Bootstrap Samples Version A: Using Water Quality Index


Stage 1 Parameter Estimates Based on B = 1,000 Bootstrap Samples Version B: Using Water Quality Attributes


Pakwise Differences in Stage 1 Parameters (H0A) Version A: Using Water Quality Index


Pairwise Differences in Stage 1 Parameters (H0A) Version B: Using Water Quality Attributes

Examining the pairwise differences across datasets in Tables 4 and 5, we find relatively few statistically significant differences. This is particularly true in comparisons involving the AT04 dataset. The most persistent pattern appears to be that the parameters on Age are (with one exception) significantly different across the datasets, and this difference appears to be largely due to a smaller Age effect in 2005 actual trips (i.e., AT05). There are also significant differences in terms of interaction effects between water quality and gender. These differences emerge generally in comparisons between actual trips (AT04 and AT05) and contingent trips (ET05 and CB05). While there are other isolated differences, they are typically significant only at the 10% level, if at all.

Stage 2 in the estimation process involves regressing the fitted sites’ specific constants (i.e., the δjv) on site characteristics, as in equation [4]. There are two versions of the Stage 2 models, paralleling those in Stage 1. In Version A, water quality is represented by a quadratic in the water quality ladder index (WQI) in addition to five site attributes (log of lake size and four dummy variables for the presence of boat ramps, wake restrictions, and handicap facilities and the lake’s designation as a state park). In Version B, the WQI quadratic is replaced by linear terms for three water quality attributes: Secchi transparency, total phosphorous, and total nitrogen.

The resulting Stage 2 parameter estimates (i.e., αv and βv) are presented in Tables 6 and 7. The corresponding pairwise differences in parameters across datasets are provided in Tables 8 and 9. Again, there is considerable consistency across the four data cases. For both versions of the Stage 2 model, the estimates consistently indicate that residents are more likely to visit lakes that are larger in size, have wake restrictions in place, and are designated as state parks. The water quality index (WQI) quadratic used in Version A exhibits a similar pattern across the four data sources, with a positive linear term and a negative quadratic term, suggesting a positive, but diminishing, impact of water quality on site value. However, this pattern is statistically significant only for the AT05 data source. None of the water quality measures are individually significant in Version B of the model.15 Turning to Tables 8 and 9, none of the pairwise parameter differences for individual site attributes are statistically significant. Only the intercepts (i.e., the αv’s) differ significantly across the datasets. While the lack of significant differences across the data sources at this stage is promising in terms of convergent validity, some caution is appropriate in interpreting these findings. For a given data source, the Stage 2 analysis essentially relies on only 100 observations (i.e., the 100 ASCs). Consequently, the resulting parameter estimates are not precisely measured. This is particularly true for the contingent behavior dataset (CB05), since the WQI under the improved water quality scenario takes on only three values (7, 8, and 9), limiting the variation used to identify the water quality parameters. In hindsight, from a design perspective it might have been better to have improved only a subset of the lower-quality lakes so as to retain greater variability in WQI across lakes.


Stage 2 Parameters Estimates Version A: Using Water Quality Index


Stage 2 Parameters Estimates Version B: Using Water Quality Attributes


Pakwise Differences in Stage 2 Parameters (H0A) Version A: Using Water Quality Index


Pakwise Differences in Stage 2 Parameters (H0A) Version B: Using Water Quality Attributes

Table 10 provides a series of joint hypothesis tests along the lines of H0B. For the Stage 1 parameter estimates (Table 10, panel a), three sets of hypotheses are considered. First, we test the joint hypothesis exactly as stated in equation [9], that is, that all of the first-stage parameters are the same across a given pair of data types. In this case, we do not reject the hypothesis of common parameters for the two actual trip datasets (i.e., AT04 and AT05), nor do we reject the hypothesis of common parameters for the two forward-looking trip datasets (i.e., ET05 and CB05). We do, however, reject all other pairwise consistency tests. Second, we test the joint hypotheses of common ASCs across data types (i.e., H0B:δjv = δjv′ ∀j = 1,...,J). Interestingly, this hypothesis is not rejected in any of the pairwise comparisons. This suggests that the pattern of trips, conditional on taking a trip, is consistent across all of the four data types. Third, and finally, we test the hypothesis of consistency in all of the non-ASC parameters. The results here are similar to those of the first hypothesis tests, though now we do reject consistency in these parameters between the AT04 and AT05 datasets.


Results of Wald Tests for the Joint Equality of Parameters (H0B)

For the Stage 2 parameter estimates, we test two joint hypotheses. First, we test the joint hypothesis that all of the Stage 2 parameters are the same (i.e., H0D:αv = αv and βv = βv). Second, we test the joint hypothesis that all of the slope terms are the same (i.e., H0D:βv = βv). Not surprisingly given the individual differences results in Tables 8 and 9, the first hypothesis is rejected in a number of cases (due mainly to differences in the intercept terms αv), whereas restricting the slope terms is not rejected in any of the pairwise comparisons. This is consistent with other findings in the SP literature, where marginal effects are found to be consistent across hypothetical and actual willingness-to-pay sources.

The cross-dataset comparisons in Tables 4 through 10 are all based on comparisons of parameter levels. However, such comparisons fail to control for changes in the scale of the unobserved factors captured by 𝜖jtv in [3]. Tables 11 through 15 repeat these comparisons using parameter ratios, with each parameter divided by the corresponding travel cost coefficient. Thus, for the ASCs these parameter ratios are as in equation [14]. Note that in the case of the Stage 2 parameters, these coefficient ratios, Embedded Image can be interpreted as marginal willingness-to-pay measures for the corresponding attribute.

Focusing on parameter ratios does not substantially alter the conclusions based upon parameter levels. However, a few changes/results are worth noting. First, Table 11 indicates that there are additional significant differences across Stage 1 parameters in Version A. These mainly arise in comparisons involving the actual 2005 trips data, which has the largest travel cost parameter. These differences, however, are significant at the 10% level but not the 5% level. For Version B of the model (Table 12), there are fewer significant differences in general across the databases. The results in Tables 13 and 14 indicate that the marginal willingness to pay for a given site attribute (and not just the corresponding parameter level) is consistent across the four datasets for all of the site attributes, including the various water quality attributes. Second, as indicated in Table 15, we now find in most instances that the ASC parameter ratios as a group are significantly different across the datasets, whereas their levels are not.16 None of the Stage 2 joint hypotheses are now rejected.


Pairwise Differences in Stage 1 Parameter Ratios (H0c) Version A: Using Water Quality Index


Pakwise Differences in Stage 1 Parameter Ratios (H0c) Version B: Using Water Quality Attributes


Pakwise Differences in Stage 2 Parameter Ratios (H0c) Version A: Using Water Quality Index


Pakwise Differences in Stage 2 Parameter Ratios (H0c) Version B: Using Water Quality Attributes


Results of Wald Tests for the Joint Equality of Parameter Ratios (H0D)

The joint hypothesis tests reported in Table 15 are based on Wald tests using parameter ratios. As noted above in Section VI, there is a potential concern with using a Wald test in the context of parameter ratios, since they are unlikely to be normally distributed. We reestimated the models using the WTP-space specification in equation [15], thus providing direct estimates of the relative parameters (i.e., ηv, ζv, and ψjv) to be used in the Waldtests of interest. We do not report the specific results here for the sake of space, but the basic conclusions are the same as those reported in Table 15.17

Finally, Table 16 provides estimates of the welfare loss from the closure of two individual lakes (Big Creek Lake and Saylorville Lake), with Table 17 providing the corresponding cross-dataset comparisons. In the case of the RP data sources (AT04 and AT05), the compensating variation (CVjv) associated with the loss of site j is little changed over time, with CVj,AT04 differing from CVj,AT05 by roughly 2% for Big Creek and 4% for Saylorville Lake. As Table 17 indicates, these differences are not only economically small, but also statistically insignificant. CVs for the two forward looking (contingent) dataset (ET05 and CB05) are likewise similar, though the differences are larger (on the order of 8% to 10%) and statistically significant at a 5% level. These differences are not unexpected, however, as they reflect the fact that individual lake characteristics are different across the two datasets. Specifically, under the contingent behavior scenario, no changes were proposed for Big Creek Lake, while Saylorville Lake was to improve from a water quality index of 6 to a water quality index of 7. Thus, it is not surprising to see CVj,CB05 > CVj,ET05 for Saylorville Lake, while CVj,CB05 < CVj,ET05, a result consistent with there being more high-quality lakes available to substitute for Big Creek Lake under the contingent behavior scenario. The biggest differences in Table 16 occur for comparisons between a RP data source (either AT04 or AT05) and a contingent (forward-looking) data source (either ET05 or CB05). In these cases, the cross– data source differences range from 18% to 40%, with the contingent data sources yielding statistically significant higher site valuations (Table 17). This is consistent with an overall tendency to anticipate greater participation in future periods than is ultimately realized, implicitly placing higher value on trip-taking as a whole. For the most directly comparable data sources, AT05 and ET05, the compensating variation is higher by 24 to 31 percentage points when using anticipated trips versus actual trips. Whether these differences are economically important will, of course, depend upon the application.


Estimated Impact of Site Loss (Annual Compensating Variation)


Differences in Site Loss Compensating Variation (H0E)


SP data are often used by practitioners to estimate the value of changing environmental amenities, particularly when the available RP data lack the variation needed to identify the amenity’s value. The concern, however, is that the underlying data-generating processes for the SP and RP datasets are not consistent with one another (i.e., they lack convergent validity). In the context of recreation demand models, when the SP takes the form of contingent behavior data, tests for convergent validity are typically limited by the very rationale for adding SP data in the first place, namely, the lack of variation in the amenity of interest in the RP data. If one cannot estimate the marginal impact of the amenity within the RP data, then the marginal impact cannot be compared across the two data sources. As a result, most RP/SP validity comparisons are implicitly limited to tests of convergent validity in the travel cost coefficients.

This papers draws upon a unique dataset from the Iowa Lakes Project, which collected panel data on the visitation patterns of Iowa households to 132 lakes, lakes that vary substantially both in terms of measured water quality and individual site characteristics. This variation allows the estimation and comparison of the marginal impact of individual site attributes across various RP and SP data sources from the study. In particular, we examine both the individual and joint consistency of parameter estimates across several dimensions, including consistency over time in RP data (AT04 vs. AT05), consistency between actual and anticipated trips (AT05 vs. ET05), and consistency between actual trips and those anticipated under a contingent behavior scenario (AT05 vs. CB05).

In general, the results are mixed. We find a remarkable degree of consistency among the various data sources in terms of the sign, size, and statistical significance of individual parameters. This is true for both stages of the model. However, significant differences do emerge. In Stage 1 of the analysis, for example, the travel cost and dissimilarity coefficients are consistent across the data sources, but the impact of age on participation varies significantly, as do individual interaction terms between water quality and sociodemographic factors. In Stage 2 of our analysis, we find no significant differences in terms of the marginal impacts that site attributes (including water quality) have on the overall appeal of a site (as captured by the site’s ASC). However, there appear to be level differences. The most noticeable differences arise between actual (RP) trip data (AT04 and AT05) and forward-looking (contingent behavior) trip data (ET05 and CB05). Households appear to anticipate greater participation rates into the future than are ultimately realized. This, in turn, implies greater site valuations derived from contingent behavior sources than are implied by their RP counterparts. One way to handle this when combining RP and contingent behavior data might be to allow for a difference in the level (but not the relative pattern) of ASCs for RP versus contingent behavior data sources.18

There are several important caveats associated with our analysis. First, the comparisons made in the paper provide information only on the convergent validity of recreational use values. This is necessarily the case given the nature of both the revealed and contingent behavior data. One might expect greater comparability in this setting than when nonuse values are involved.19 However, understanding convergent validity in this setting is still important, and even here we do find some significant and substantive differences. Second, our Stage 2 analysis finds no significant differences in terms of how site attributes (including water quality) impact the overall appeal of a site; that is, we find no evidence disputing convergent validity in terms of the marginal impact of site attributes on site values. In the case of water quality, however, these marginal effects are not precisely estimated. This is due, in part, to the fact that these marginal effects are essentially based on relatively few observations, specifically, the variation in water quality across the 100 sites used in our analysis. This variation is particularly small in the case of the contingent behavior dataset, as the hypothetical water quality improvements increased the water quality index to lie in the narrow range from 7 to 9, rather than 3 to 9 in the RP sources. Additional research is needed to better isolate the impact of water quality on site valuations in both revealed and contingent behavior settings.


The authors would like to thank Cathy Kling for comments on earlier drafts of this paper. This paper was supported by MSU AgBioResearch. Funding for the Iowa Lakes Project used in this analysis was provided by the Iowa Department of Natural Resources and by the U.S. Environmental Protection Agency’s Science to Achieve Results (STAR) program. Although the research described in the article has been funded in part by the U.S. Environmental Protection Agency’s STAR program through grant R830818, it has not been subjected to any EPA review and therefore does not necessarily reflect the views of the agency, and no official endorsement should be inferred.


  • The authors are, respectively, associate fellow, Korea Institute of Public Finance, Sejon-si, Korea; and professor, Department of Economics and Department of Agricultural, Food, and Resource Economics, Michigan State University, East Lansing, Michigan.

  • 1 See Freeman, Herriges, and Kling (2014) for an overview of this literature.

  • 2 This problem is analogous to the difficulty associated with valuing a private good using contingent valuation techniques (see, e.g., Carson and Hanemann 2005).

  • 3 Comparisons are also possible between AT04 vs. CB05 and between AT04 vs. ET05, though these comparisons compound changes in years and some other factor.

  • 4 A secondary benefit of our bootstrapping approach lies in its ease of estimation, as it involves separately estimating a large number of well-behaved nested logit models. In contrast, a joint structural model of multiple data sources allowing for a rich pattern of correlations across the data sources and differences in the underlying patterns of preferences would require a more complex specification with a large number of parameters.

  • 5 Copies of the 2004 and 2005 survey instruments are available from the authors upon request.

  • 6 The 2002 Iowa Lakes survey was mailed to 8,000 randomly selected Iowa households. The 2003 Iowa Lakes survey was, in turn, mailed to the 4,423 households who completed the 2002 Iowa lakes survey, plus an additional 3,577 randomly selected Iowa households.

  • 7 The sample sizes for completed AT04, ET05, and CB05 trips in the 2004 survey were 3,968, 3,694, and 2,882, respectively, so that most of the sample reduction in 2004 is due to missing contingent behavior data. The 2005 respondents provide 3,524 observations with completed trips data (i.e., AT05).

  • 8 See

  • 9 Each source comes from the U.S. Energy Information Administration (Midwest all grades, all formulations retail gasoline prices), the Research and Innovative Technology Administration in the U.S. Department of Transportation (average fuel efficiency of U.S. light-duty vehicles), the Bureau of Labor Statistics in the U.S. Department of Labor (annual CPI and average hourly earnings), respectively.

  • 10 The other site attributes included in Xjv are the log of lake size and four dummy variables for the presence of boat ramps, wake restrictions, and handicap facilities and the lake’s designation as a state park.

  • 11 The downside of the procedure, of course, is that the estimation is less efficient because it does not take account of the cross-dataset correlation. Any such efficiency gains, however, would be conditional on having correctly specified the structure of the correlation patterns.

  • 12 Dividing by the travel cost parameter converts many of the parameter estimates into measures of the corresponding marginal willingness to pay.

  • 13 We thank a reviewer for pointing this out.

  • 14 The ASCs (i.e., the δjv) are also estimated at this stage but are not reported here for the sake of space. They are, instead, available from the authors upon request.

  • 15 As suggested by a reviewer, our finding that the more subjective WQI measure resonates better with recreation site choice is broadly consistent with earlier findings by Adamowicz et al. (1997) that the “model based on perceptions slightly outperforms the models based on objective attribute measures” (p. 65).

  • 16 Note that this does not alter our conclusion from Table 6 that the site choice probabilities are consistent across the models, since the parameter levels, not their normalized counterparts, determine these conditional probabilities.

  • 17 The corresponding Table A9 is available from the authors upon request.

  • 18 Yet another option, suggested by a reviewer, would be to first estimate a model based on RP data alone, using the resulting ASCs as constraints in the subsequent combined RP/contingent behavior model.

  • 19 We thank a reviewer for suggesting this.