Incorporating Stated Consequentiality Questions in Stated Preference Research

Patrick Lloyd-Smith, Wiktor Adamowicz and Diane Dupont


Although consequentiality has transformed the focus of stated preference research, there are concerns with including elicited consequentiality perceptions in econometric models. We test the effect of varying the order of the valuation and consequentiality questions using data from a drinking water reliability survey. We find that this ordering has a substantial impact on consequentiality perceptions. We address the potential endogeneity of consequentiality perceptions and find that they do not have a significant impact on voting. These results provide caution on the use of consequentiality questions and suggest these questions may not be a panacea for stated preference validity issues. (JEL Q25, Q51)

1. Introduction

The stated preference (SP) literature has been transformed by the focus on ensuring respondents perceive their answers to be consequential to mitigate the hypothetical nature of SP surveys.1 Carson and Groves (2007) outline the theoretical result that respondents’ answers can be expected to be truthful if the survey is viewed by the respondent as potentially influencing the policy outcome and there exists some probability that they will have to pay. They suggest that consequentiality can be used as the incentive compatibility mechanism to ensure respondents tell the truth. However, implementing this concept in applied work is not straightforward.

Herriges et al. (2010) provide an early empirical application of incorporating consequentiality perceptions, where they provide survey respondents with direct quotes from policy makers on the future use of the survey results, to test this consequentiality hypothesis. Using willingness-to-pay (WTP) estimates, Herriges et al. (2010) find empirical evidence consistent with this prediction. Other empirical evidence on the importance of consequentiality in survey design has been provided by Landry and List (2007), Vossler, Doyon, and Rondeau (2012), and Interis and Petrolia (2014). Although the idea of consequentiality has played a transformative role in how SP practitioners think about the validity of survey responses, there are a number of lingering issues with implementing this concept. One concern is how to assess whether respondents think the survey is consequential. The most common practice is to ask respondents (using a Likert scale) their perceptions of the extent to which the survey results will be used by policy makers or affect decision-making. With these responses in hand, a second concern relates to incorporating these responses into the valuation estimation equation, potentially raising issues of endogeneity. Endogeneity can arise as responses to these consequentiality questions likely suffer from measurement error and are probably driven by factors unobserved by the analyst that are related to responses to the valuation questions (Herriges et al. 2010).

The aim of this paper is to explore the potential endogeneity of consequentiality questions in the econometric modeling of SP data. While this issue has been explored previously by Herriges et al. (2010) and Groothuis et al. (2017) among others, we build on the previous work in two ways. First, we implement a new approach for addressing endogeneity, the special regressor approach,2 and compare this approach to the common naive method of incorporating consequentiality in voting behavior models. A second novel aspect is that the survey uses a split sample approach that varies the order of the consequentiality and valuation questions. The study uses data from a SP survey implemented to value the public benefits of reducing boil water advisories (BWAs) in the province of Alberta, Canada. The SP survey asks a single binary choice question in a clear public good context and thus provides a clean test of the underlying incentive compatibility theory.

The results make an important contribution to the growing body of theoretical and empirical evidence on the importance of consequentiality (Herriges et al. 2010; Czajkowski et al. 2017; Oehlmann and Meyerhoff 2017). The results of the split sample suggest that the order of the valuation and consequentiality questions matter for self-reported perceived levels of consequentiality. Specifically, in our application the percentage of respondents who find the survey inconsequential increases by over 68% if the consequentiality question is asked after the valuation question, compared to before (22.3% compared to 13.3%). This result has important implications, as attempts to exogenously manipulate the levels of perceived consequentiality by varying the information provided to respondents on use of surveys by policymakers have had mixed results (Herriges et al. 2010; Czajkowski et al. 2017). Furthermore, the results suggest that once the endogeneity of consequentiality is addressed, its importance in determining voting behavior is less clear. Both of these results raise important issues for survey design.

Several papers have explored the potential endogeneity of consequentiality responses in SP surveys. The first paper to tackle this problem head on was by Herriges et al. (2010), exploring whether the consequentiality perceptions of respondents affect their WTP for improvements in lake water quality in Iowa. The source of endogeneity examined by Herriges et al. (2010) is the unobserved confounding problem: respondents who state a high degree of consequentiality may do so in part because they place a high value on the proposed environmental improvement programs. To assess and ultimately address these concerns, a split sample approach was implemented in which half of surveys included a letter from a government official stating that the information from these surveys would be used in the decision-making process. The empirical results suggest that respondents’ perceived degree of consequentiality is positively affected by the presentation of this letter. Using this exogenous information treatment, the causal impact of consequentiality perceptions on WTP is estimated using a Bayesian treatment effect model. The empirical findings are consistent with the “knife-edge” theoretical result of Carson and Groves (2007), which states that respondents who perceive their responses to have a positive probability of being taken into account by policy makers have similar WTP distributions, while respondents who perceive the survey to be purely inconsequential have no incentive to respond truthfully and may have different WTP distributions.

While the findings of Herriges et al. (2010) suggest that the endogeneity of consequentiality questions is an important consideration in econometric modeling, three more recent papers find the opposite result. Vossler, Doyon, and Rondeau (2012) test for the endogeneity of consequentiality questions using sociodemographic indicators such as occupation status, gender, and age as instruments. Using a generalized method of moments overidentification test, they fail to reject the hypothesis that the consequentiality interaction terms included in their model are jointly exogenous. Vossler and Watson (2013) use the same set of sociodemographic variables as instruments and tentatively conclude there is little empirical evidence for endogeneity of consequentiality responses, although the authors admit the instruments are weak. Finally, Interis and Petrolia (2014) use a similar set of sociodemographic instruments to test for endogeneity using a two-step instrumental variable (IV) probit model. The results suggest that the null hypothesis that consequentiality is exogenous cannot be rejected.

Groothuis et al. (2017) use a bivariate probit approach to address endogeneity of perceived consequentiality responses. The correlation coefficient between the two error terms is estimated to be negative and significant, suggesting that perceived consequentiality is endogenous and that the unobserved characteristics increase perceived levels of consequentiality and decrease the likelihood of voting for the program.3

The mixed evidence found in these studies regarding the endogeneity of consequentiality questions can likely be explained by the different empirical applications, the use of different instruments, and modeling approaches. The information treatment used by Herriges et al. (2010) is perhaps the most convincing instrument for addressing the endogeneity of consequentiality questions. Subsequent attempts to replicate these effects using different forms of consequentiality scripts, however, have been largely unsuccessful (Czajkowski et al. 2017). Sociodemographic characteristics of respondents are generally not strongly correlated with perceived consequentiality, and therefore their usefulness as instruments is limited. Thus the elusive search for a robust instrument for perceived consequentiality responses in voting behavior models continues.

2. Survey Data

Survey Development and Design

We use data from an online SP survey on drinking water reliability in Alberta. The survey was designed using three focus groups conducted in the spring of 2014 in Edmonton (n = 10), Calgary (n = 8), and Okotoks (n = 11). The survey text and program description was modified based on focus group comments, as detailed below. An online pilot survey was administered to 155 respondents between January and February 2015.

The survey was divided into three sections and included 52 questions. The first part of the survey introduced the respondents to drinking water reliability issues in Alberta and collected information on past experiences and future risk perceptions of short- and long-term water outages and BWAs. The second part of the survey included the BWA valuation question that is the focus of the current paper. The final section of the survey includes debriefing questions that collected sociodemographic information.

Before the valuation question, respondents were provided with additional information on the frequencies of BWAs by community size over the past 5 years to inform the current situation. Respondents were then told about a proposed program that would reduce the annual number of BWAs in Alberta. The scope of the BWA reduction varied by small (under 500 residents), medium (between 500 and 50,000 residents), and large (over 50,000 residents) community sizes. The payment vehicle was described as additional income taxes collected over the next 10 years. Based on comments we received from focus group participants, we added information on how many days a typical BWA lasts and added a statement that the federal government would pay their fair share for reducing BWAs in communities that fall under their jurisdiction.

The valuation task used a single binary choice question that presented respondents with a number of different attributes. While collecting less information per respondent compared to repeated multinomial-choice question formats, the single binary choice format has several key advantages. First, it closely mimics a real referendum vote that, combined with the clear public goods framing of the program, enhances incentive compatibility (Carson and Groves 2007). Second, it avoids the issues of ordering effects, learning, strategic behavior, and other potential biases associated with repeated choice questions (Johnston et al. 2017). These reasons are especially pertinent given the paper’s focus on consequentiality questions. An example of the valuation question to respondents is presented as Figure C2 in Appendix C.

The valuation task was described as a referendum vote between the current situation and the proposed program. Table 1 provides a description of the attributes and their respective levels that are used in the single binary choice question. The question employed a D-efficient design consisting of 31 final valuation tasks, using responses to the pilot survey to inform the priors.4 The number of BWAs for the current situation is informed by historical data presented in Table 1 and includes 50 BWAs in small communities, 4 BWAs in medium communities, and 1 BWA in large communities. The attribute levels in the proposed program range from 5 to 50 for BWAs in small communities, 1 to 4 BWAs in medium communities, and 0 to 1 BWAs in large communities. In addition to reducing BWAs, the proposed program also included an attribute that specified the specific method used to improve reliability. The first reliability improvement method was investments in traditional drinking water treatment systems (i.e., gray infrastructure). The second improvement method was investments in watershed and forest management to reduce the potential for events such as forest fires to cause water reliability problems downstream (i.e., green infrastructure).

Table 1

Single Binary Choice Question Attributes and Levels

The other notable aspect of the survey is that a 2 × 2 design was used on two different treatments. First, a split sample treatment was implemented with half the surveys including the consequentiality question before the valuation question and half the surveys including the consequentiality question after. We used the standard consequentiality question: “To what extent do you believe that the voting results collected from you and other survey respondents will be taken into consideration by policy makers?” and respondents answered on a five-point Likert scale from “not taken into account” to “definitely taken into account.”5 The consequentiality question as presented to respondents is presented as Figure C1 in Appendix C.

Second, half the surveys included a statement on the government agencies involved in the project. Specifically, at the beginning of the survey the following statement was included: “Partners in this project include Alberta Environment and Sustainable Resource Development and the Canadian Forest Service.” The purpose of this statement was to act as an information treatment to enhance perceived consequentiality, similar to Herriges et al. (2010).

Survey Administration

The final survey was administered online by an Edmonton-based market research firm in March 2015. In addition to a representative sample of 1,000 Alberta residents, 250 additional responses from residents of rural communities were included because water reliability challenges are more common in rural communities.6 The survey also included a separate set of valuation questions focused on private home water outages, and the order of these valuation questions was randomized. For the empirical analysis, we include only the 757 respondents who received the BWA valuation question before the other valuation questions, to avoid ordering issues.7

Descriptive statistics for these respondents are presented in Table 2. The average age of respondents was 49 years, and 31% of respondents had household incomes over $150,000 per year. A further 12% of respondents did not answer the income question. In the sample, 37% of respondents had attended some or completed college, and 45% of respondents had completed at least some university. Comparing our sample with the 2011 Canadian census results for Alberta, the sample contains a similar gender mix compared to the general population (51% versus 50% female) but a higher proportion of people over the age of 50 (56% compared to 44% in the general population).8 The sample also included 31% of respondents with household incomes greater than $150,000 per year compared to only 18% in the broader Alberta population. Thus our sample is older and has higher incomes compared to the general population.

Table 2

Summary of Sociodemographic Characteristics

3. Econometric Model

The econometric analysis of stated choices is grounded in McFadden’s (1973) random utility model (RUM). In a single binary choice setting, respondents are asked to choose between two alternatives: a status quo alternative and a program alternative. Both alternatives are characterized by various attributes. The RUM is based on the idea that an individual chooses the alternative that yields the highest expected utility between the two choices.

We start with the following binary decision model: Embedded Image [1] where I(·) represents an indicator function equal to one when the argument is true and zero otherwise, Di is the yes/no answer to the valuation question for respondent i, Ci is the response to the perceived consequentiality question, Bi is the tax amount presented to the respondent, Xi is a vector of exogenous regressors, and εi is the error term. If Ci is assumed to be exogenous, then the parameters of the model can be estimated using a probit model, and WTP can be derived in the conventional fashion.

If we think that Ci might be endogenous in equation [1], then we require an approach to properly account for potential endogeneity biases. The traditional approach to handling potentially endogenous binary variables in binary choice models is through a bivariate probit model (Wooldridge 2015).9 The main limitation of the bivariate probit is that it imposes strong distributional and specification assumptions on the model.10 A control function approach uses a two-step estimation method and can also be used in these contexts. For the control function approach, the estimated residuals from the first-stage model are included in the probit model to act as controls for the endogeneity of the consequentiality variables.11 Both these approaches rely on the first-stage equation being correctly specified, and different assumptions regarding the joint distribution of error terms, which are strong assumptions that we might want to relax.

Special Regressor Approach

An alternative approach to controlling for endogenous regressors is provided by Lewbel (2000). He introduced a simple multistep estimator for the scaled probit model that is useful given our discrete choice setting. The main advantage of the special regressor approach is that it does not rely on the strong specification and error distribution assumptions associated with the bivariate probit and control function methods discussed above.

To operationalize the approach, we can rewrite equation [1] so that the special regressor’s coefficient is normalized to one: Embedded Image [2] The special regressor, B, must have the following properties:

  1. B is additively separable with respect to the model error εi.

  2. B is independent of the model error, εi, conditional on the set of regressors (i.e., B is exogenous).

  3. E(D|X,C,B) increases with B.

  4. The conditional distribution of B given X and C is continuous and has a large support.

In our application, we use the randomly assigned tax amount presented to respondents as the special regressor (B), which has been used in previous research (Lewbel, McFadden, and Linton 2011; Riddel 2011; Kalisa, Riddel, and Shaw 2016). Property 1 is satisfied if we use a linear functional form of the indirect utility function as is common in the discrete choice literature. Property 2 is satisfied because the specific tax amounts are selected by the researcher and randomly presented to respondents as part of the experimental design.12 We would expect the probability of voting yes to the proposed program decreases with the tax amount, and therefore to ensure Property 3 holds, we use the negative of the tax amount as the special regressor. Property 4 is a common assumption in semiparametric binary choice models (Horowitz 1992) and in our application implies that the support of the distribution of WTP is large relative to the model error, ε (Kalisa, Riddel, and Shaw 2016).13

Estimation in the special regressor approach proceeds in five steps (Riddel 2011):

Step 1. Create Embedded Image to allow Embedded Image to take on a range of positive and negative values.

Step 2. Estimate the equation Embedded Image using a linear regression and save the residuals Embedded Image.

Step 3. Compute the nonparametric kernel estimator of the density f of Embedded Image, Embedded Image, for each Embedded Image using the following equation: Embedded Image and compute the estimates Embedded Image. For this step, we require a choice of kernel K(·) and a bandwidth h.14

Step 4: Construct Embedded Image for each observation i as Embedded Image where Embedded Image is an indicator function equal to 1 if Embedded Image and 0 otherwise.15

Step 5: Estimate the choice-model parameters of the scaled probit Embedded Image using a two-stage least squares regression of Embedded Image on Xi and Ci using instruments Zi.

These five steps comprise the special regressor approach to recovering the parameters of the scaled probit model in equation [2] that are used in this paper.16 The coefficient on the tax amount is interpreted as the marginal utility of income and in the current application is normalized to one. Therefore, the coefficients on the other model variables Embedded Image can be interpreted as WTP.17

The special regressor approach can use an IV (Zi) in Step 5. The main instrument we consider is the voting and consequentiality question order.18 As illustrated in the next section, this variable has a significant effect on levels of perceived consequentiality. Because the survey versions were randomized, we do not expect this variable to affect voting behavior except through its impact on consequentiality.

4. Results

We first assess whether the distribution of perceived consequentiality responses is affected by question order. Figure 1 provides a comparison of the distributions of perceived consequentiality responses for the two survey versions that varied the question orders. This figure provides preliminary evidence regarding the importance of the question ordering. A more formal test of distribution differences can be conducted using Pearson’s chi-squared test. The returned test statistic is 11.9 (p-value = 0.018), and thus we can reject the null hypothesis at the 5% significance level that the consequentiality responses are independent of the ordering treatment.19 The most remarkable result in Figure 1 is that the proportion of respondents indicating that their voting results will “not be taken into account” by policy makers jumps from 13.3% to 22.3%, if the consequentiality question is after, rather than before, the valuation question. The results suggest that the ordering of the consequentiality question has a marked impact on the least consequential responses.

Figure 1

Ordering Effect on Perceived Consequentiality (Sample Size = 757: 375 Respondents in the Consequentiality Question “Before” Treatment and 382 Respondents in the “After” Treatment)

We convert the consequentiality perceptions variable into a “knife-edge” dummy variable that takes a value of zero if the respondent stated that his or her responses would “not be taken into account” and the value of one otherwise (Oehlmann and Meyerhoff 2017).20 Table 3 reports the results of various probit model specifications with the binary consequentiality variable as the dependent variable. We focus the results on the small community BWA and tax amount attributes, as the medium and large community BWA and treatment method attribute are insignificant determinants of voting behavior. The first column includes the “consequentiality question after” treatment dummy variable, which has a negative and significant effect on the perceived consequentiality, consistent with Figure 1. In contrast to Groothuis et al. (2017), we do not find a significant effect of the tax amount presented to respondents on perceived consequentiality.21 The partner information variable represents whether the survey version included the statement on government partner agencies involved in the research. The coefficient is not statistically significant, which corroborates the findings of Oehlmann and Meyerhoff (2017) and Czajkowski et al. (2017) of the difficulty in inducing consequentiality perceptions. To investigate the impact of the valuation question attributes on perceived consequentiality, the second column of Table 3 shows the results using the subsample that received the consequentiality question after the valuation question. Column three of Table 3 uses the sample of respondents that received the consequentiality question before the valuation question. As expected, none of the program attributes have a significant effect on perceived levels of consequentiality.

Table 3

Probit Model of Perceived Consequentiality

Table 4 shows the results of the water management referendum voting models. For all these models, we convert the small BWA attribute variable to represent reductions from the status quo level. The first column is the base model without any controls for consequentiality. In this specification, we find that respondents prefer programs with more reductions in the number of BWAs in small communities.22 The coefficient on the tax amount is negative and significant.

Table 4

Choice Models of Water Management Referendum Voting

The second column includes a perceived consequentiality variable using the dummy variable specification (Vossler and Watson 2013; Groothuis et al. 2017).23 The coefficient on the consequentiality variable is positive and significant, suggesting respondents that perceive the survey to be at least somewhat consequential are more likely to vote for the program. The other explanatory variables remain relatively stable. While this model highlights the importance of incorporating consequentiality, it assumes that perceived consequentiality is exogenous in these voting models. The third and fourth columns show the probit model estimates using the question order subsamples. While the coefficient for BWA reductions in small communities is significant for the sample that received the consequentiality question before the valuation question, the coefficient is not statistically significant from zero for the other subsample.

The special regressor approach results are presented in the seventh column of Table 4. These estimates are given by a two-stage least squares regression linked to Step 5 of the special repressor approach outlined in the previous section. The coefficient of the special regressor B is not shown here because it is normalized to one. Therefore, the other coefficients can be interpreted as WTP.

In sum, the results of the special regressor approach suggest that perceived consequentiality is not a statistically significant determinant of voting behavior for these data. This result is in contrast to the naive probit model where perceived consequentiality is an important determinant of voting behavior.24

Table 5 summarizes the WTP estimates for various programs to reduce BWAs using the coefficients from Table 4. The first row reports the marginal WTP (MWTP) to reduce one BWA in a small community. While the average MWTP across the full sample is $3.40 to $3.43 per household per year for 10 years to reduce a single BWA in a small community, there is a substantial difference in MWTP for the subsamples that varied the order of the consequentiality question. The mean MWTP is over double for the sample that received the consequentiality question before the valuation question compared to the other group ($4.78 versus $1.72). Using the special regressor approach, the MWTP for a BWA reduction in a small community is estimated to be $1.25, substantially less than the other models.

Table 5

Mean Annual Willingness to Pay for 10 Years for Programs to Reduce Boil Water Advisories (BWAs)

The rest of the rows report the results of the WTP for a program to reduce 25 BWAs in small communities. The second row shows that the average WTP is $90.20 across all respondents, while it drops to $73.38 for respondents that received the consequentiality question before and increases to $111.30 for respondents that received the consequentiality question after. The last two rows report the WTP for respondents that viewed the valuation question as inconsequential and consequential. The respondents who view the question as inconsequential have a mean WTP that is negative and not statistically different from zero across all models. For respondents who view it as consequential, the mean WTP is $117.60 using the consequentiality dummy model and $19.12 using the special regressor approach. Thus, addressing the potential endogeneity of consequentiality questions decreases the WTP for BWA reduction program.

5. Conclusion

The importance of consequentiality perceptions of respondents in SP surveys has seen a rapid rise in recent years. Mechanism design theory provides clear guidance on the importance of consequentiality in yielding truthful responses and ensuring survey responses are consequential, and incorporation of consequentiality is now a “best practice” in SP work (Johnston et al. 2017). Yet questions remain on the best way to elicit consequentiality perceptions of respondents and, more importantly, on how to incorporate this information in the econometric analyses of SP data. This paper focuses on how to use responses to perceived consequentiality follow-up questions in econometric modeling of voting behavior. The results of the study provide additional evidence of the difficulty of eliciting meaningful consequentiality perceptions and appropriately modeling these responses. Given these difficulties, the use of consequentiality follow-up questions is no substitute for ensuring that the survey is consequential in the first place.

In all survey applications eliciting consequentiality perceptions to date, the consequentiality question is posed after the valuation question. This study provides the first empirical evidence that the order of the consequentiality and valuation questions matters for consequentiality perceptions. Specifically, the number of respondents who view the survey as inconsequential increases 68% if the valuation question is posed first (13.3% to 22.3%). This finding is important in light of the theoretical and empirical knife-edge results identified by Carson and Groves (2007) and documented by Herriges et al. (2010). Furthermore, the ordering also affects welfare measures with WTP for a BWA reduction program, rising 50% if the consequentiality question is posed after the valuation question.

In naive models without endogeneity controls, perceived consequentiality is found to be an important determinant of voting behavior. However, using the special regressor approach to address endogeneity concerns, we find that consequentiality beliefs do not have a significant impact on voting. Together with the question ordering results, the results of this paper suggest that the new trend of including consequentiality follow-up questions in surveys may not be a panacea for SP validity issues.

The question of why we might observe such a large difference in perceived levels of consequentiality remains an open question, in particular why more people found the survey to be inconsequential if the consequentiality question was posed second. A somewhat speculative explanation is that going through the voting process may cause respondents to doubt how applicable the results will be for policy makers. The online voting mode administered to respondents is quite different from the traditional way policy makers have administered referendum surveys. An alternative explanation is that once the respondents have seen the program and voted, they have a greater probability of engaging in K-level thinking (Camerer, Ho, and Chong 2004) and acting strategically in responding to the consequentiality question. Part of the challenge we face in interpreting this result is the absence of a consensus view about the meaning of these responses. While Carson and Groves (2007) argue that economic behavior does not play a role in predicting how respondents who do not view the survey as consequential make choices, economic reasoning may not be the sole driver of response behavior. Other human decision-making disciplines such as psychology may provide a useful lens for interpreting these responses (Tourangeau, Rips, and Rasinski 2000). Furthermore, there is evidence that people may feel uncomfortable not representing what they think they would actually do (Dellavigna et al. 2017).

These results are best viewed as a first step in understanding the effects of the location of the consequentiality question within a survey on consequentiality perceptions. Additional research is needed on the placement of the consequentiality question, along with other follow-up questions, to assess whether the results from this study can be replicated and to explore alternative explanations for why the question order matters. If this effect is found in replications, research effort should be put into identifying the reason for such differences in the response to the consequentiality question. While we employed a single binary choice question in the survey, the ordering effects of consequentiality questions in multivaluation question contexts is also worthy of further investigation given the prevalence of these applications. Assessing consequentiality beliefs before and after the valuation question for the same respondent may also provide insights into how these beliefs change and what respondent attributes correlate with changing consequentiality perceptions.

There is not a clear recommendation from this research on the appropriate placement of the consequentiality question. Placing the consequentiality question first may be preferred, as it perhaps provides a cleaner assessment of these beliefs that is not affected by the specifics of the valuation question. However, respondents may not have the necessary information to appropriately answer the consequentiality questions if they have not seen the specific alternatives to choose between. For example, if people are asked to choose based on a status quo description or tax amount that they view as unreasonable, they may not view the question as particularly useful for policy makers. In the meantime, we recommend that researchers should vary the location of the consequentiality and valuation questions in their SP survey designs to at least control for any ordering effects.

Moving beyond eliciting consequentiality perceptions to ensuring survey consequentiality, when justified, is an important task for SP practitioners going forward, recognizing that this is not possible in all survey contexts. Consequentiality is an important part of the reliability “toolkit” to assess, and ultimately improve, SP surveys. Nevertheless, the broader hard and unglamorous work of good survey design, including focus groups, cognitive interviews, and pretests, although largely unsung and perceived to be “noneconomic,” continues to be the most important factor in improving the reliability of SP surveys.


We thank the Water Economics, Policy, and Governance Network (WEPGN) and Alberta Innovates for financial support. We also thank Alfred Appiah for research assistance.


  • 1 Understanding if and why people respond differently in hypothetical and real contexts has long been a focus of SP research. In general, this issue has been studied in experiments using both real and hypothetical payment treatments, with any resulting difference in behavior being attributed to hypothetical bias. If preferences collected in SP surveys do not represent how individuals actually behave in the real world; the use of SP value estimates in economic analysis is questioned. Neoclassical economic explanations of why respondents misrepresent their preferences in SP surveys have largely focused on strategic behavior. As a result, the focus of survey design has been on ensuring that respondents have the incentive to truthfully report their preferences (Carson and Groves 2011). On the other hand, behavioral economics has offered a diverse set of explanations for hypothetical bias, such as the social context, the level of scrutiny, and restrictions on the time horizon and choice set (Levitt and List 2007; Carlsson 2010).

  • 2 The special regressor approach has previously been applied in the environmental valuation literature by Lewbel, McFadden, and Linton (2011) to control for the endogeneity of double-bounded choice questions and by Riddel (2011) and Kalisa, Riddel, and Shaw (2016) to address the endogeneity of risk perceptions.

  • 3 Another proposed approach to dealing with endogeneity of consequentiality is the hybrid choice model (Czajkowski et al. 2017). However, Budziski and Czajkowski (2017) demonstrate through simulations that hybrid choice models do not eliminate the bias in estimated coefficients of endogenous variables.

  • 4 The valuation tasks were designed using Ngene. We removed one of the initial 32 valuation tasks created by Ngene that differed only by the treatment method and cost between the status quo and program.

  • 5 We do not differentiate between policy consequentiality, defined as respondents perceiving their survey responses will influence the outcome they care about, and payment consequentiality, defined as respondents perceiving some probability that they will have to pay (Herriges et al. 2010). The single consequentiality question used in this study captures some degree of both elements of consequentiality, as policy makers in this context both control the provision of the good and can impose taxes. Understanding how the ordering of the consequentiality questions differs across these two elements of consequentiality is left for future work.

  • 6 The survey research firm does not disclose response rates for the survey. The completion rate was 60% for the 2,105 people who started the survey.

  • 7 A total of 769 respondents fit this criterion, but 5 respondents did not answer the voting question, 2 respondents did not answer the consequentiality question, and 5 respondents did not answer the age question.

  • 8 These percentages are computed excluding census data for people under the age of 18 because they were not eligible for the survey. Census data were collected from Statistics Canada (2011).

  • 9 The IV probit model produces inconsistent estimates if the endogenous variable is not continuous (Lewbel 2014), such as the case with consequentiality perceptions reported on the Likert scale.

  • 10 Specifically, the first-stage equation is correctly specified, and the error terms of the first-stage equation and equation [2] follow a joint normal distribution.

  • 11 There is some confusion in the literature on whether control function approaches can be used in nonlinear models with discrete endogenous variables. Bontemps and Nauges (2016) and Kalisa, Riddel, and Shaw (2016) both state that control function approaches are inconsistent. This is generally true for the commonly used IV probit approach, which uses a linear regression in the first stage (i.e., equation [2]). However, control function approaches need not use a linear functional form for the first stage. As shown by Wooldridge (2015), if a probit model is used in the first stage, combined with including the generalized residuals in the second stage, the control function approach is no more or less robust than the bivariate probit model.

  • 12 One potential limitation of using the tax amount as a special regressor to control for the endogeneity of perceived consequentiality is if the level of consequentiality is affected by the tax amount presented to respondents. Groothuis et al. (2017) provide some empirical evidence for this notion and find that as the tax amount given to respondents increases, the levels of perceived consequentiality decrease. In our application, we test for this effect by including the tax amount in the perceived consequentiality equation and do not find a significant effect.

  • 13 The use of discrete tax amounts in the experimental design of the valuation question has the potential to introduce bias to the results using the special regressor approach, as the underlying theory is developed assuming continuity for the special regressor. Lewbel, McFadden, and Linton (2011) conduct a simulation study to compare the potential biases from using discrete versus continuous designs for the special regressor and find that this can be substantial for small samples (n = 100 in their context) but decreases as the sample size increases. Given our larger sample size, we believe the potential biases arising from discrete tax amounts is minimal.

  • 14 An alternative approach for this step is to use the ordered data estimator proposed by Lewbel and Schennach (2007).

  • 15 As noted by Bontemps and Nauges (2016), the denominator, Embedded Image, can take on very small values, which implies extremely large Embedded Image absolute values. This can induce large standard errors in the two-stage least squares regression in Step 5. Consequently, Lewbel (2014) recommends some form of trimming or winsorizing to remove these extreme values. For our main specification, we use a 5% winsorization specification. Table B3 in Appendix B presents the results with trimming and winsorizing at different cut-off levels.

  • 16 The special regressor approach is implemented in Stata using Baum’s (2012) sspecialreg routine.

  • 17 This normalization has no impact on marginal effects and estimated WTP because the parameters of probit models are identified only up to the location and scale. Specifically, the parameters are related by Embedded Image, where σϵ is the root mean square error of the regression (Riddel 2011).

  • 18 As a robustness check, we also consider an alternative IV: whether the respondent would vote for the incumbent government’s political party. This political support dummy variable equals one if the respondent indicated he would vote for the incumbent political party in the upcoming provincial election and zero otherwise. At the time of the survey, the political party holding power in the provincial government had been there for 43 years, and we expect party supporters to be more inclined to perceive that their responses would be taken into account by decision makers. However, there are reasonable doubts about the strict exogeneity of this instrument, as these party supporters may be more or less inclined to favor the proposed program for reasons besides perceived consequentiality. At the very least, this dummy variable is insignificant in the voting probit model. The results obtained using this instrument are similar to the results presented in the paper, and the instruments pass the Sargan-Hansen test of overidentification.

  • 19 While randomization should control for the influence of sociodemographic differences, a formal test of the two split samples provided in Table A1 in Appendix A suggests some statistically significant differences regarding the size of communities where the respondent lives.

  • 20 We also estimated an ordered probit model using the full five-point scale and find that the ordering treatment has a significant effect on consequentiality perceptions. These results are available upon request.

  • 21 Although not shown, similar to Vossler and Watson (2013) and Oehlmann and Meyerhoff (2017), there is a rather weak relationship with perceived consequentiality and sociodemographic variables.

  • 22 To simplify the presentation, we do not include the medium- and large-sized BWA reductions nor the water treatment method, as none of the coefficients are statistically significant at the 10% level. Given that these variables are exogenous in the experimental design, excluding these variables does not affect the main results that are the focus of this paper.

  • 23 We also estimate a model with indicator variables for each of the consequentiality levels with inconsequential as the omitted category (C = 1) and find that each consequential level (C = 2, C = 3, C = 4/5) is statistically significant from the omitted level. These results provide support for treating the inconsequential respondents separately from the others and are provided in Table B1 in Appendix B.

  • 24 Because respondents living in small communities were oversampled, our sample is not representative of the underlying population. In the representative sample, 62.5% of respondents live in large communities, 23% live in medium communities, and 14.5% live in small communities. As shown in Table 4, respondents living in small communities constitute 34% of the sample used in the analysis, or 256 respondents out of 757. As a robustness check to account for weighting, we use bootstrap sampling to repeatedly sample 85 respondents from these 256 people living in small communities to correct for the oversampling. This sampling is done 500 times and Table B2 in Appendix B provides the main results with weighting.