Abstract
We estimate the effect of levee-related flood-risk reduction on rates of new housing development. Using a fixed-effect Poisson regression and a nonlinear difference-in-differences identification strategy, we find that newly constructed levees increased the rate of residential development by more than 50% compared with areas without levee protection. Contemporary analysis using a duration model indicates effects lasting decades later, with the magnitude of the induced development attenuating over time. Our findings inform discussion of the “levee effect” and highlight the possibility that further flood-risk reduction investment in levees may be partially offset through increased development activity.
1. Introduction
Rising sea levels, aging infrastructure, and increased intensity of extreme events have resulted in more households than ever facing flood risk in the United States. Attempts to reduce the collective vulnerability to flooding require the coordinated efforts of households and various levels of government. While government involvement in flood insurance markets and infrastructure management projects is necessary to minimize market failures, such as adverse selection, charity hazard, and insufficient provision of public goods (Cornes 1993; Kousky and Shabman 2014), these interventions can also lead to unintended consequences (Bagstad, Stapleton, and D’Agostino 2007). Perhaps nowhere in the federal floodrisk management policy portfolio is this more consequential than in regard to the construction of structural flood-risk defenses such as levees.
The 2019 spring floods in the Midwest of the United States provided a reminder of how heavily we rely on levees to mitigate damages from flooding and what can happen when they fail. Though managed by levee districts, many of the levees in this area were independently constructed in the early to mid-twentieth century and have little federal oversight (Lowe 2019). When faced with 50% more water than the previous high in 1952, dozens of levees failed. In total, 62 levees breached across the Midwest in March 2019, with hundreds of miles of levees sustaining damage (Smith and Schwartz 2019). Initial estimates predicted losses totaling $3.5 billion in Nebraska and Iowa alone (Eller 2019).
States such as Nebraska and Iowa provide examples of recent levee failures, and exposure to similar risk is dispersed throughout the United States, with the most striking example occurring in New Orleans after Hurricane Katrina (Craig 2017). As the 2021 Infrastructure Report Card assessed the quality of the national levee system to be at a D level (with many individual levees receiving failing grades), crucial and expensive repairs and maintenance are needed throughout the country to ensure these levees remain effective.1
When prioritizing these improvements, it is essential to consider how individuals will respond to changes associated with levee construction and maintenance. While the construction of levees reduces risk for some homeowners, it may impose negative externalities on downstream communities (Wang 2020). This may take the form of increased flood heights and flow rates (Yen 1995; Pitlick 1997), the acceleration of flood waves (Jacobson, Lindner, and Bitner 2015), or elevated peak flood discharges (Remo et al. 2018). The effects of changes in flood risk are well studied on the demand side of real estate markets and are reflected in hedonic analysis of property values (Beltrán, Maddison, and Elliott 2018b).
Focusing on the supply side of the housing market, increases in development that result from decreased annual risk of flooding may offset efforts to reduce vulnerability, potentially increasing the expected damages from flooding events (Pinter et al. 2016). This can occur in several ways. First, because no levee provides 100% risk reduction, there remains a degree of residual risk after levee construction. If the increases in protected value outpace the reduction in the residual risk, total exposure to damages will be greater than before the levee was constructed or improved. Second, real flood risk is not as discrete as traditional flood insurance maps suggest, and there is a greater gradient of risk than the purported spectrum of a 1% chance in a given year, a 0.02% chance, or a 0% chance (Horn 2019). If levees induce development and initiate agglomeration economies, peripheral development could be pushed to areas toward the edge of the protected area and could put more homes in vulnerable locations. Finally, by increasing the relative quantity of impervious surfaces, channelizing existing streams, and decreasing the absorptive capacity in leveed areas, induced development may increase the probability of flash floods, leading to greater damages if a flood occurs (Cutter et al. 2018).
Collectively, these potential supply-side outcomes make up “the levee effect” (Tobin 1995). Awareness of this phenomenon dates back to at least Gilbert White’s (1945) “Human Adjustments to Floods,” and despite its history and intuitive rationale, empirical estimation of the relationship between levee construction and residential development is limited. Prior work has largely relied on aggregated macro-level analysis to yield correlations of increased populations (Di Baldassarre et al. 2013; Hutton, Tobin, and Montz 2018), increased conversion of forests to agricultural land (Stavins and Jeffe 1990), and increased number of structures in floodplains (White et al. 1958; Montz 1986) with the construction of levees. Causal identification of the effect of levees on rates of residential development requires a counterfactual estimate of the rate of development that would have occurred had the levee not been constructed and an explicit control for the highly endogenous determination of levee construction (i.e., levees are often built to protect areas expected to experience growth). We overcome these challenges and fill a hole in the literature using micro-level housing data in a difference-in-difference model of new housing development and a duration model of the timing of new housing development.
In this article, we use the discrete construction of the Central and Southern Florida Project levees in the mid-twentieth century and a count-data difference-in-differences identification strategy to causally identify the extent to which newly constructed levees affect the rate of residential development. The location of these levees is shown in Figure 1. While minimizing the potential for omitted variable bias by relying on narrow spatial and temporal samples as well as precise fixed effects, we find that the level of new housing construction per year increased by 57% in newly leveed areas. Recognizing that the average age of federally affiliated levee systems is 55 years old, we extend the analysis to consider the enduring effect of historically constructed levees on current residential development patterns. Using a discrete time duration model, we find that parcels in areas receiving floodrisk protection from a levee are more likely to develop, long after initial levee construction. Our results suggest that not only do newly constructed levees significantly induce greater residential development, potentially increasing exposure, but also that housing markets may take decades to reequilibrate following a relaxation of a constraint on developable land.
The results of this analysis are of both practical and policy significance. In addition to informing more accurate modeling of the levee effect, providing a causal estimate of the effect of flood-risk reduction from levees on residential development yields a critical input into ongoing policy discussions. This work is of particular relevance for discussions concerning floodplain reconnection and whether levees should be maintained and reconstructed or whether buyouts should be offered to rural residents behind their walls while removing some levees to ease the burden on other ones up-and downstream. While structural flood defenses have saved lives and property from numerous catastrophic flooding events and may present the most efficient means for mitigating flood risk, they are not the only means by which flood risk can be reduced and may not present the most efficient method for doing so (Tobin 1995). Overlooking the possibility that maintaining, constructing, or improving levees could induce development and potentially increase vulnerability and exposure biases the cost-benefit analysis of flood-risk management policy alternatives toward structural solutions. Our results have the potential to better inform such comparative evaluations.
2. Flood Prevention in the United States
The history of flood prevention in the United States is long, with the earliest artificial levees being built before the arrival of European settlers (Lafrance 2015). While decentralized and uncoordinated planning of earthen embankments along waterways characterized American flood-risk mitigation strategies up until the mid-nineteenth century, the Swamp Land Acts of 1849 and 1850 established a new precedent for government involvement in land reclamation and flood control. In addition to securing revenue to finance the construction of drainage canals and levees, these acts gave rise to the organization of levee districts with substantial autonomy, including eminent domain (White 1945). By the end of the 1800s, publicly funded projects had begun lining the entire lower Mississippi River and had managed to desiccate Tulare Lake in central California, at the time the largest freshwater lake west of the Mississippi.
Continued episodes of severe flooding led to a series of federally enacted Flood Control Acts (FCAs). The first of which, the FCA of 1917, represented the first act of Congress to exclusively target flood protection (with no mention of land reclamation). FCA of 1928 and the FCA of 1936 established the federal government’s role as the primary provider of flood protection, putting federal spending on flood control on par with other public works projects. With nine more FCAs authorized after 1936 and with floods overcoming levees on an annual basis, levee heights have seen sustained growth over the years, with the typical Mississippi levee growing from a height of 3 ft. in 1717 to 8 ft. in 1882 to 22 ft. in 1914 to 30 ft. following the FCA of 1927 (Mississippi River Commission 2007). Considering that many of the more than 9,000 levee systems across the United States have experienced similar patterns of modernization, it is often more appropriate to consider the construction of many levees to be a continual process rather than a discrete one. This makes the estimation of the effect of levee construction on residential development challenging.
Fortunately, historical accident yields a notable case of a more discrete pattern of levee construction. Because of the harsh conditions for development, the Civil War, and financial mismanagement of earlier efforts, by 1913, flood control infrastructure in Florida did not exceed drainage canals and modest levees on the southern shore of Lake Okeechobee (Davis and Ogden 1994). These levees were revealed to be inadequate by the hurricane of 1928, which caused the lake to overflow, killing 2,600 people in what has been estimated to be the second deadliest flooding event in U.S. history (Blake and Gibney 2011). Although an improved levee was built to contain Lake Okeechobee, it was not until another period of intense flooding in 1947 that the impetus for systematic flood control was realized. Over a 25-day period, intense rains and hurricanes led to 90% of southeastern Florida being underwater, directly resulting in the FCA of 1948, authorizing a system of levees in southern and central Florida.2 With universal flooding the summer before, and with the FCA of 1936 requiring all federally funded levees to pass a cost-benefit analysis, the levees authorized by the FCA of 1948 were to be constructed as soon as possible and specifically to protect existing populations, reducing the scope for endogeneity from levees potentially being located to coincide with new development.
This system of levees, known collectively at the time as the Central and Southern Florida Project for Flood Control and Other Purposes (C+SF), stretches nearly 1,000 miles and was constructed in several phases, with individual projects generally undertaken in order of urgency. In general, the East Coast Protective Levees were constructed first, followed by the levees surrounding Lake Okeechobee and in the Everglades Agricultural Area (EAA).3 Finally, The L-31 levees were constructed in what is now southern Miami-Dade County. In total, there are 161 levee systems in Florida cataloged by the National Levee Database. The 20 included in our analysis are the subset of the larger population that have documented construction completion dates and were built in response to an FCA authorized between 1948 and 1968. Because these levees were constructed before regulations to require flood insurance for homes purchased with federally backed mortgages, the salience of the spatial extent of levee protection would not have been conveyed to developers or homebuyers through flood insurance premium rates. However, state reports from the 1960s suggest that the Federal Housing Authority would only guarantee a housing loan if the area were “sufficiently safe from flooding” and that floodrisk reduction projects would permit greater lending opportunities in the newly protected areas (Kohout and Hartwell 1967). Regardless of the availability of flood insurance, residential mortgage availability and affordability provide a salient link between flood-risk reduction through levees and housing supply decisions.
While these federally authorized levees have provided sufficient protection to withstand decades of hurricanes and avoid catastrophic failures similar to those that plagued New Orleans in 2005 or the Midwest in 2019, concerns over structural integrity persist. As of 2015, the levees surrounding Lake Okeechobee were deemed to be “critically near failure” and of “extremely high risk” (South Florida Water Management District 2016). Other significant improvements have been made to C+SF project levees since 2010, costing taxpayers an average of approximately $53 million a year to maintain the integrity of the structural flood control system (Mitnik 2018). Concurrently, pressure from environmentalist interest groups to restore the Everglades has given rise to the possibility of removing some levees instead of maintaining or improving them (U.S. Army Corps of Engineers 2002). While Florida faces unique challenges in the face of climate change, this for a more exhaustive account of the impact of the C+SF Project. decision between preserving preventive infrastructure and reconnecting floodplains is shared by many municipalities across the country.4
In response to the flood of 1993, Congress authorized a study to formulate a comprehensive plan for flood-risk management along the upper Mississippi River. Though existing flood-risk management facilities prevented up to 97% of the potential damages from the flood of 1993, Hurricane Katrina highlighted the importance of preparing for residual risk. With this in mind, the study formulated, evaluated, and compared 14 alternative plans for minimizing flood risk. The plans were generally distinguished by either supporting levee improvement without floodplain reconnection, supporting floodplain reconnection without levee improvement, supporting both, or maintaining the status quo. However, no mention was made of induced residential development attributable to levee improvements or the limited number of new levees that could be constructed (U.S. Army Corps of Engineers 2008).
Analyzing revealed behavior in Florida in response to levee construction provides insight into the housing supply response attributable to a perceived reduction in vulnerability. A critical input into the policy decision of whether to improve or remove levees is knowledge of the effect that such improvement or construction will have on the housing stock in the protected area. While several studies have found that those living in leveed areas underestimate their risk of flooding or take fewer protective measures (Ludy and Kondolf 2012; Atreya, Ferreira, and Michel-Kerjan 2015), this could reflect confirmation bias or a selection process that would render these residents unrepresentative of the greater population. To determine the average treatment effect on the treated (leveed) areas, we use a difference-in-differences identification strategy and a fixed-effect Poisson model as described in the following section.
3. Methodology
There are three key components to our identification strategy. We discuss them in turn, beginning with the count-data specification necessary to account for the discrete and nonnegative distribution of the data before moving on to the feasibility of difference-in-difference identification in nonlinear models. Finally, we address the stringent fixed effects needed to explain development patterns from the most recent decade back to the 1920s.
The Poisson Model
Models of new residential development assume a utility-maximizing landowner deciding how to secure the greatest possible discounted return from their land (Bockstael 1996; Irwin and Geoghegan 2001). However, while these models explicitly consider the individual behavior which gives rise to aggregate outcomes, the conversion of predicted probabilities of development into predicted development is not obvious and may require more information (Bockstael 1996). We return to the atomistic model of development in our auxiliary analysis, as our ultimate goal is to estimate predicted development and its determinants, and here we consider a conceptual framework where development (or land use change) is characterized by the locally aggregated decisions of many individual landowners in a count-data setting (Kline 2003; Towe, Klaiber, and Wrenn 2017). The Poisson regression model is the standard approach used to predict aggregate behavior when the distribution of the outcomes is characterized by generally small, positive integers and zero values (Greene 2002). This model originates from the premise that every outcome, yi, is drawn from a Poisson distribution: [1] where yi are nonnegative integers and λispecifies the Poisson distribution, most commonly a log-linear model so that ln(λi) = βXi. Given this distribution, the expected number of events per period is [2] and [3]
To predict the conditional mean and thus produce unbiased estimates of the covariates, the only assumption required by the fixed-effect Poisson model is that the conditional mean is correctly specified. Following Pregibon (1979), we test whether the square of the predicted values have any explanatory power when included in the regression. Because we fail to reject the null that this value is equal to zero, we determine that the conditional mean is correctly specified.
We predict the number of homes built in year t, near levee l, and in census block group BG according to the equation below: [4] where each of the included variables are 0/1 indicators (described more in in the next subsection) and µt,l,BG are fixed effects (described in detail later). Because the units of observation are annual snapshots of development in portions of block groups, which eventually receive or never receive flood protection from the nearest levee, each home construction event only contributes toward the observed count in one unit of observation, regardless of the number of levees present in a block croup.
Difference-in-Differences Identification in Nonlinear Models
To develop a difference-in-difference estimator, we define treatment effects by whether an area is protected by a levee and if those homes were built after the construction of the levee. This interaction term captures whether an observation unit is in an area protected by a levee after the levee was built. The standard depiction of parallel trends is provided in Figure 2 (top). The vertical axis represents the average number of homes built in a unit of analysis, while the horizontal axis represents the build year of the home relative to the year of levee completion. For example, the x-axis value for the count of homes built five years before the construction of the nearest levee would be −5. Several patterns are evident in this figure. First, the number of homes built in permanently unprotected areas (represented by the gray line) and eventually leveed areas (represented by the black line) increase at a similar rate as the date of levee construction draws nearer, with growth in permanently unleveed areas consistently outpacing growth in eventually leveed areas. Second, on completion of the levee, growth in the leveed areas accelerates and the number of homes built in these areas in a given year eventually eclipses the number of homes built in still unleveed areas. Whereas growth in the number of homes in permanently unleveed areas may slightly exceed that in eventually leveed areas prior to the construction of a levee, the effect appears to be minimal, and if a bias exists, it implies that our results are overly conservative. To provide additional context regarding the timing of levee construction and the expansion of the treatment group over time, Figure 2 (bottom) illustrates the relationship between the construction dates of the levees in our sample and the number of homes they protect in absolute terms. All vertical lines represent the date of construction of levees which were commissioned in response to the FCAs from 1948 to 1968.
Despite the popularity of the difference-in-differences estimation strategy, identification is challenging when this strategy is applied to nonlinear models such as the Poisson regression described by equation [4]. Because the expectation of the outcome variable is bounded, the treatment effect is not constant across treated populations and the cross-difference of the potential outcome is not zero, which is an identifying assumption in the linear models (Ai and Norton 2003; Athey and Imbens 2006). However, by applying the difference-in-differences identifying assumption to the unobserved latent linear index of a nonlinear model, identification comes not from the cross-difference of the expectation of the potential outcome being zero but by the nonlinear parametric restriction on the cross-difference (Puhani 2012).
Therefore, the treatment effect is not simply the cross-difference of the expectation of the observed outcome as it is in a linear model, but the difference of the cross-differences of the expectations of the observed and potential outcomes, as shown in equation [5]: [5] where L = Leveed, A = After, and τ is the treatment effect of the difference-in-differences model. The treatment effect is then the incremental effect of the interaction term coefficient. Because this is not β3, the coefficient on the interaction term, we report these two effects separately.
Fixed Effects in Count-Data Models
The historical identification strategy we use limits the availability of variables we can include in our model. As a result, it is necessary to account for spatial amenities by including fixed effects. Requiring identification to come from variation within small enough spatial groupings should hold most variables constant, including the price of land. While it is possible that sharply delineated boundaries in some neighborhoods (possibly elevated roads or drainage canals) demarcated areas where the price of land was susceptible to discrete changes over continuous measures of distance before the construction of levees, by considering 20 different levee systems covering more than 1,300 square miles, we expect that this particular concern would not result in systematic bias overestimates from the entire area covered by the C+SF Project levees.
The need for fixed effects also motivates the choice of the Poisson model over a negative binomial model. Despite the flexibility gained by relaxing the assumption that the conditional mean is equal to the conditional variance, negative binomial models are incompatible with traditional fixed effects when estimated by conditional maximum likelihood (Greene 2005; Guimarães 2008). While it is possible to condition out the incidental parameters for each fixed-effect group in a negative binomial regression, because this model allows for group-specific variation in the dispersion parameter, time-invariant variables are not necessarily subsumed under the fixed effect (Hausman, Hall, and Griliches 1984). This results in the fixed-effect negative binomial model not providing a true “within-group” estimator and an inability to control for neighborhood level unobservables. For these reasons, the fixed-effect Poisson model is our preferred specification.
The creation of our preferred set of fixed effects is shown in Figure 3a illustrates the location of the L-31 levees in southern Miami-Dade County. The dark gray shapes represent the areas protected from flood risk by levees. We want to hold as many spatial attributes constant across leveed and unleveed areas as possible, so in our primary model, we restrict identification to come from changes within a mile of the boundary of the leveed area, as depicted in Figure 3b. Similarly, we avoid focusing on development patterns deep in the leveed areas, restricting identification to come from differences in development patterns between the light gray and hashed bands surrounding the levees in Figure 3c. Following previous empirical work (Zhou, McMillen, and McDonald 2008; Kuminoff and Pope 2012; Turner, Haughwout, and Van Der Klaauw 2014), limiting our analysis to areas close to the border of leveed and unleveed land helps ensure that unobservable attributes of the land and of the economic agents varies continuously rather than discretely as the boundary between leveed and unleveed land is crossed.5
To control for neighborhood-specific amenities that vary along the length of a given levee, we include census block group fixed effects in Figure 3d. In this setting, identification arises from changes in development patterns between the hashed and light gray bands in block groups. In our preferred model, we further interact the set of fixed effects depicted in Figure 3d with a set of year dummy variables, which requires identification to come from differences in annual residential development in only the block groups that have leveed and unleveed areas within a mile of a leveed area boundary. With this set of fixed effects, any time-varying unobservables are controlled for at the neighborhood level. Though this provides a high level of control, it also comes with a sizable reduction in the observations included in our sample, as any block group that does not include both leveed and unleveed areas provides no variation for the within-group estimator. The effect of this restriction on our sample size is demonstrated in Table 1.
4. Data
There are two primary data sets necessary to estimate our empirical model: a detailed record of property transactions for the state of Florida (purchased from national real estate information provider CoreLogic) and the National Levee Database. The record of property transactions consists of county-level assessor’s office data, providing information on spatial, structural, and sale characteristics of the property. For a property to appear in this data set, it must have sold during the first decade of the twenty-first century. Although this may not provide the complete universe of residential structures in Florida, there is unlikely to be a correlation between selection into this subset of the population of residential structures and the residual component of the predicted count of homes per group in our model. We calculate the number of homes in each group by recording the year of initial construction for every home in our data set as well as the geographic coordinates of the property centroids before aggregating them by the spatial and temporal groupings discussed in the previous section.
Because identification comes from variation in these groupings, we are unable to assess covariate balance in demographics across treated and control units using demographic data from the U.S. Census. However, using data from each of the 407 block groups in the sample, a linear regression of the percentage of each block group that is leveed on the share of the block group population that is white and the share that is male reveal no statistically significant relationships, with standard errors larger in magnitude than the point estimates. This suggests that the demographics of the population living in the newly leveed areas resemble the population living in the unleveed areas.
The National Levee Database is a congressionally authorized, publicly available, and continually updated record of the location and condition of the levees in the United States. The database currently provides the location of over 8,000 levee systems covering approximately 30,000 linear miles. However, there are only complete records of risk and condition for 2,000 of these systems, and these are primarily levees affiliated with the U.S. Army Corps of Engineers. This inconsistency results in a number of levees with unidentifiable dates of construction or boundaries of protected area, rendering them potentially unusable for our analysis. Fortunately, the levees constructed in response to the FCAs of 1948-1968 are generally well documented, allowing for a more exhaustive analysis.
While other variables aside from levee protection likely influence the decision of when and where to build a home (see Section 5), in our primary analysis we rely on the fixed effects to control for these other factors. Because we are restricting identification to come from within-census-block-group by year groupings, differentiated only by whether they receive flood-risk protection from a levee, and discarding transactions occurring more than a mile from the leveed area boundary, the scope for omitted variable bias is minimized and would need to arise from differences in neighborhoods. Table 1 demonstrates how the precision of our fixed effects removes a downward bias in our causal estimate.
5. Results
The columns in Table 1 represent four models that have progressively more restrictive fixed effects when read from left to right. While the spatial unit of analysis is the same for all four models (the portion of the block group that is leveed or to be leveed in the future) and the counts are calculated for every year, the fixed effects range from the year level to the specific levee by block group by year level. With the exception of model 2, which may suffer from an omitted variable bias related to the exclusion of temporal controls, the range of models demonstrates the magnitude and direction of the bias attributable to a failure to control for spatial and temporal omitted variables. The tighter fixed effects reveal the upward bias in the estimate of levee location (before and after levee construction) when failing to control for annual effects at the neighborhood level, perhaps capturing the dual incidence of risk and positive spatial amenities correlated with flood-prone areas. Most important, the estimated effect of the treatment effect increases significantly from model 1 to model 4, growing from a 33% increase in development attributable to levee construction to a 57% increase.
Although more restrictive fixed effects are essential in a model as parsimonious as ours due to the historical nature of the treatment effect, they come at the cost of a decrease in the effective sample size because many observations may belong to singleton groups, without any other observations sharing enough in common to help identify within-group variation. This effect is evident in Table 1, as our number of observations decreases from more than 20,000 in model 1 to 3,000 in model 4. The adjustment from model 3 to model 4 is subtle but important. Because some block groups receive protection from multiple levees, the dates by which certain areas are protected in a block group may vary. By interacting the fixed effects from model 3 with an identifier for the nearest levee boundary, we restrict identification to come solely from differences in development experienced across protected area designations for each levee in each block group by year grouping. To assess the sensitivity of our estimates our controls for spatial and temporal proximity, we turn to Table 2.
The models in Table 2 each use the same preferred set of fixed effects from model 4 of Table 1 but manipulate either a spatial or temporal restriction on the data to assess whether our results are an artifact of how we select our sample. In column (9), we relax the assumption that the data-generating process is nonlinear. Importantly, the causally identified parameter is significant at the 1% level in all specifications, suggesting that although restrictions to the temporal and spatial scope of our analysis may influence the magnitude of the point estimate, there is a qualitatively robust effect underlying our estimation.
In models 1 and 2, the temporal restriction is minimized to 15 years, meaning that only rates of residential construction 15 years before or after the completion of the levee are used for estimation. This allows us to rule out the possibility that our results are driven by changes in residential development attributable to factors unrelated to levee protection more than two decades before or after levee construction. This reduces the scope for time varying omitted variable bias. In model 2, the additional restriction is made to exclude observations in the two years before and after the completion of levee construction to allow for miscoded dates of levee completion or anticipatory effects. The results are not qualitatively different from those in model 1. Importantly, the results do not seem to be sensitive to the duration of this period of data omission, as the treatment effects estimated by excluding data from the four years or 10 years before and after levee construction are also similar to the results from model 4 in Table 1. Together, these three models suggest a limited scope for bias arising from any potential discrepancies between the Army Corps of Engineers-certified date of levee construction completion and the date any residential development or lending regulations may have changed.
Model 5 adjusts the temporal restrictions in the opposite direction, removing them completely and allowing identification to come from differences in rates of residential development within block groups up to 74 years before or 64 years after the construction of the levee. This relaxation of the restriction to the sample almost doubles the number of observations from our preferred specification, but potentially introduces omitted variables bias, as evident in the significant decrease in the magnitude of the estimate of the causal effect. These results suggest that our preferred estimates are not an artificial product of bounds on the temporal scope of our analysis. Models 6-8 take a similar approach as model 5, but do so spatially rather than temporally. These models relax the restriction that identification come from differences within one-mile bands of leveed area boundaries, extending this buffer to three or five miles. Identification is still restricted to come from within fixed effects, which is why the sample size in model 6 in Table 2 is not markedly different from that in model 4 of Table 1. By replacing block groups with census tracts and expanding the buffer to three and five miles, we allow identification to come from changes in development significantly further from the border between leveed and permanently unleveed areas.
In addition to assessing the sensitivity of our results to the inclusion of the one-mile buffer, this expansion of the spatial scope of the study lends further credence to the salience of the boundaries. If the National Levee Database lines indicating the borders of levee protection are imprecisely drawn, we would expect attenuation bias as the sample is restricted and the accuracy of the assignment of homes to treated or control observations is more random. However, we find the opposite: the effect is weaker when we include development in areas further away from the boundary of the leveed area. Therefore, it appears as if on average these boundaries did convey meaningful information to developers and homebuyers in the mid-twentieth century.
Finally, model 9 in Table 2 reports the results from a linear regression of the difference-in-differences estimator. To account for the skew in the data, which arises from the discrete and nonnegative distribution of the number of homes built in a given place and year, we consider log-transforming our dependent variable to estimate a model similar to one that estimated the effect of political ideology on housing development (Kahn 2011). However, because there are a nonnegligible number of observations for which the number of homes built is zero, and the log of zero is undefined, we instead employ an inverse hyperbolic sine transformation. The interpretation of a coefficient on a dummy variable in a model with an inverse hyperbolic sine-transformed dependent variable is identical to the interpretation of a coefficient on a dummy variable in a model with a log-transformed dependent variable, so the coefficients on both variables should be interpreted as approximate semi-elasticities, subject to the transformation below, where represents the regression estimate and measures the calculation of the treatment effect (Bellemare and Wichman 2020). [6] After making the appropriate correction, the linear model results suggest that newly constructed levees increase the rate of residential development by 51%, an effect similar in magnitude to that obtained from the fixed-effect Poisson model difference-in-differences estimate.
Enduring Effects
Although knowing the effect of levee construction on rates of residential development is imperative for understanding the relationship between protective infrastructure and vulnerability, and the average age of a levee is 55 years old, an equally policy-relevant question concerns what effect legacy levees continue to have on patterns of housing construction. We supplement our primary findings by addressing this question using largely the same data sources and a recent innovation in the land use literature.
To estimate the continued effect of 50-year-old levees on rates of residential development, we face an identification trade-off. Because there is no longer a temporal dimension to our measure of flood protection provided by levees, we lose the ability to estimate the traditional, event-study style, difference-in-differences model. However, we benefit from the availability of contemporary data, specifically, property transaction prices. This allows us to more explicitly model the optimal stopping decision, which characterizes the landowner’s choice of whether or not to develop a parcel of land (Capozza and Helsley 1989). This model operates under the premise that land should be converted to residential use when the annualized value of the previous use of the land plus the expectation of the conversion capital opportunity cost equals the annualized value from residential development. Although this has been a workhorse model for evaluating sprawl (Irwin and Bockstael 2004; Livanis et al. 2006), agricultural land value forecasting (Plantinga and Miller 2001; Guiling, Brorsen, and Doye 2009), and tracking real prices and speculative bubbles in metropolitan housing markets (Abraham and Hendershott 1994; Goodman and Thibodeau 2008), it also assumes that price is exogenous. If price is endogenous and correlated with our variable of interest, then our estimate will be biased. We overcome this potential barrier to unbiased estimates by instrumenting for price using a control function.
As explained by Wrenn, Klaiber, and Newburn (2017, 2019), there are several channels through which endogeneity could manifest in the real estate options value framework. However, because this process is estimated with a nonlinear duration model, standard two-stage least squared instrumental variable approaches are inconsistent. Wrenn et al. solve this problem by including the residuals from a first-stage regression of the neighborhood price index on neighborhood-specific attributes and a weighted average of exogenous distant-neighborhood-specific attributes in the second-stage duration model. Including this residual purges the model of correlation between the latent profitability of development in the given neighborhood and unobservable neighborhood attributes.
The instrumental variables needed to generate the residuals are derived from the theory of spatial equilibrium and take the form of a geographically weighted average of the exogenous neighborhood characteristics for neighborhoods outside a given buffer distance from the focal neighborhood. This decision proceeds from the logic that the exogenous characteristics from distant neighborhoods will affect the development in those neighborhoods (positively or negatively), which will affect the price in those neighborhoods, which will either expedite or slow development in substitutable neighborhoods, potentially including the focal neighborhood. The process by which our instrumental variable is created is illustrated in Figure 4 (left), which depicts the census tracts used to model neighborhoods in this auxiliary analysis. For consistency with our difference-in-differences estimation, we only consider development southeast of Orlando. Figure 4 (right) illustrates the census tracts in Broward County and details the difference between the focal neighborhood of interest, the local neighborhoods excluded from the calculation of the instrument to ensure exogeneity, and the area-weighted neighborhoods outside the buffer. When calculating the instruments, the values of the exogenous neighborhood attributes for all neighborhoods outside of the local neighborhood buffer are averaged, weighted by their area.
Because of the variance in the size of census tracts, the local neighborhoods are not defined by a Euclidean distance, but by whether they are one of the N nearest neighbors. The appropriate number of nearest neighbors to be omitted from calculating the instrument is determined by examining the size of the local neighborhood group, which produces the strongest instrument. Instrument strength is assessed via the F-statistics for a series of first-stage regressions with varying numbers of local neighborhoods omitted to calculate the instrument. These statistics are reported in the Appendix. Omitting 16 neighbors from the calculation of the instrument leads to the largest F-statistic, decreasing monotonically as the number of neighbors omitted increases and decreases. Because this F-statistic is greater than 10, previous analysis suggests that this instrument is valid (Stock, Wright, and Yogo 2002). The resulting specification of the duration model with the control function is provided in equation [7]: [7] Here, the probability that parcel i in neighborhoodj is developed at time t is equal to a parametric proportional hazard model of construction based on observable attributes of the parcel and neighborhood. Because development events are observed at annual intervals, a discrete hazard applies to the grouped duration data and is estimable via a binary dependent variable model (Beck, Katz, and Tucker 1998). Using a probit model allows us to satisfy the joint normality assumption necessary for the application of the control function (Wrenn and Klaiber 2019). Phi then captures the standard normal distribution function, and rho represents the joint normal correlation between the error terms in the control function and the duration model. and Xjt are parcel and neighborhood characteristics affecting profitability, respectively; Pjt is the quality-adjusted price of housing at the neighborhood level; and vjt are residuals from a price regression used to purge endogeneity from equation [7]. A set of time-specific hazard shifters, , is included to model the baseline hazard.
The results for a naive survival analysis and a duration model with a control function are shown in Table 3, columns (1) and (2), respectively.6 The coefficient on the control function residual is significant, suggesting that there is a downward bias on the estimation of price in the naive analysis attributable to endogeneity. This is supported by the sizable difference in the estimated effect associated with price, which increases by an order of magnitude when the residual from the control function is included in the analysis. The resulting estimate of the price elasticity of housing supply of 1.51 is slightly less than the population weighted elasticity of housing supply for the average metropolitan area in the United States of 1.75, which is consistent with the constraint on land development provided by the Everglades and Atlantic Ocean (Saiz 2010). Accounting for endogeneity in price also reveals a downward bias in the effect of being leveed in the naive analysis, which suggests that properly identifying the role of price is important because of correlation between housing prices and leveed areas. The resulting estimate of the marginal effect of levee protection indicates that protection is associated with a 2.6% increase in the probability of a residential parcel being developed in a given year. We also find that a 1% annual risk of flooding, intuitively, decreases the probability of development for a residential parcel, but not significantly.
Overall, our exploration of enduring effects of historically leveed areas on rates of residential development suggests that the perceived protection afforded by levees in Florida remains associated with a faster rate of residential development, conditional on price and other factors affecting the transition to the residential use of land. To consider how this effect attenuates over time, we estimate the same model and include an interaction term between a variable capturing the age of the nearest levee and a variable capturing the presence of a levee. Doing so reveals that the induced development effect is greatest for the most recently built levees and that this effect is reduced each year after a levee is built. Specifically, a newly built levee increases the probability of development by nearly 12%, but this effect decreases by 0.2% a year for each year the levee ages.
This finding may help explain the differences between the results in our analyses of immediate and enduring effects of levee construction. As the number of new homes built in a given year is a function of the underlying conditions that affect the likelihood that individual parcels will be developed, our count and duration modeling frameworks capture the same data-generating process. However, the different sampling timeframes render any direct comparisons between the results of the two models challenging. When comparing results, it is important to contextualize the results considering the different sampling timeframes. Given the rapid development of southeastern Florida in the mid-twentieth century, the number of parcels at risk of development was significantly higher during the timeframe sampled in our Poisson model (1922-2002) than during the timeframe sampled in our duration model (2000-2016). It is therefore possible that the estimated 2.6% increase in the likelihood of an individual parcel being developed could imply an aggregate increase in annual development of 50% or more if the number of developable parcels was sufficiently large. For example, if we consider a hypothetical spatial unit of observation with 100 undeveloped parcels, a baseline rate of development of four homes a year, and two time periods (before and after levee construction), then a 2% increase in the likelihood of development following levee construction results in an expected addition of two developed parcels alongside the baseline rate of four, sufficient for a 50% increase in the number of homes built per unit of observation. Because both the number of undeveloped parcels and the induced likelihood of individual parcel development attributable to newly built levees were greater in the mid-twentieth century than in the early twenty-first century, this simple example illustrates how the results of our two models may not be as different as they may otherwise appear.
6. Discussion
This analysis provides causal estimates of the effect of the construction of levees on the rate of residential development. This relationship constitutes half of the levee effect, a phenomenon that has been theorized for over 70 years but has not been empirically tested beyond case studies of single levee systems. Our findings suggest that the construction of levees induced residential development, increasing the number of new homes built in a given year by over 50% compared with the number of homes that would have been built had the levee not been constructed. Because this counterfactual is determined by the rate of development in the treated area prior to treatment and the rate of development in the untreated area after treatment, spillover development from newly leveed areas into permanently unleveed areas (a potential form of the levee effect) would attenuate the estimate of the treatment effect. The rate of development in permanently unleveed areas is slower after levee construction than before, and levees are still associated with an increased rate of development decades after their construction, so the scope for this potential attenuating bias is limited in our sample. However, we cannot completely discard the possibility that the effect of levee construction on rates of residential development is even greater than we estimate.
Our supply-side analysis also complements the hedonic literature on the effect of levees and other forms of structural flood defenses on property values. Early hedonic work found that risk mitigation provided by flood control structures was positively capitalized in property values (Holway and Burby 1990). More recent work has shown that other flood-risk-reducing infrastructure capitalizes in home values (Atreya and Czajkowski 2019; Davlasheridze and Fan 2019) and is heterogeneous across the urban-rural gradient (Beltrán, Maddison, and Elliott 2018a). Our results provide an additional explanation for the discrepancy between rural and urban capitalization of levees that is attributable to the levee effect. When the addition of an amenity causes an increase in the housing supply, the premium paid for that amenity will be less than would be expected with a fixed housing supply (Lutz 2015). Because the housing supply is more elastic in rural areas than in urban areas, if the presence of levees induces further development, price effects will be less in rural areas, potentially leading to the disamenity effects outweighing the positive capitalization of risk reduction.
This positive effect of levee construction on residential development also has a long-term persistent effect—in the case of southern Florida, up to 70 years after levees were built. Using a control function duration model to account for the endogeneity of price, we find that differences in flood risk and levee protection are distinctly incorporated into the sorting equilibrium, an outcome that may be unobserved when estimating capitalization effects (Fell and Kousky 2015) or may mask otherwise positive capitalization effects. Although this effect dissipates over time, the results suggest that any contemporary levee construction decisions will have lasting consequences for the sprawl of development.
Together, these results inform both policy makers and scholars. In the broadest terms, our causal estimates justify concern regarding “field of dreams” levees: those levees, which when built, induce greater development in the areas that they protect (Sun 2011). Although this development may well be associated with a net reduction in the expected damages due to flooding, channels exist by which this development may lead to an increase in expected damages instead. Though estimation of the ultimate effect of levee construction on vulnerability extends beyond the scope of this analysis, our findings highlight the need for planners at all levels of government to recognize the immediate and enduring effects that levee construction decisions have on residential development as well as the direct and indirect effects on overall vulnerability associated with induced new development.
Acknowledgments
We gratefully acknowledge funding from the National Science Foundation (CBE-1160961 and GSS-1127044). The authors declare that they have no relevant or material financial interests that relate to the research described herein.
Footnotes
Appendix materials are freely available at http://le.uwpress.org and via the links in the electronic version of this article.
↵1 American Society of Civil Engineer’s Infrastructure Report Card, available at https://infrastructurereportcard.org/cat-item/levees/.
↵2 South Florida Sun-Sentinel, “The Great South Florida Flood,” South Florida Sun-Sentinel, September 9, 1990, available at https://www.sun-sentinel.com/news/fl-xpm1990-09-09-9002130092-story.html.
↵3 The EAA levees stand out as outliers when compared with the other C+SF Project levees because they were opportunistically constructed to reclaim and cultivate land south of Lake Okeechobee. Today, this area is the nation’s largest supplier of sugarcane, rice, and winter vegetables (EAA 2018 Pre-Harvest Celebration). Our results are not sensitive to the inclusion or omission of these levees and are include for a more exhaustive account of the impact of the C+SF Project.
↵4 The Nature Conservancy, “New Tool Protects Floodplains in the Mississippi River Basin,” Nature Conservancy, October 30, 2019, available at https://www.nature.org/en-us/what-we-do/our-priorities/protect-water-and-land/land-andwater-stories/new-tool-protect-floodplains-mississippi-river-basin/.
↵5 The levee boundary is derived from terrain-based calculations and often aligns with raised highways, although it may take more irregular shapes in the absence of such divisions, as is the case in southern Miami-Dade County.
↵6 We supplement the data from our primary analysis with a variety of sources to calculate the parcel-and-neighborhood-specific attributes. These sources include the National Hydrography Dataset, SSURGO data from the Natural Resource Conservation Service (NRCS), the University of Wisconsin–Madison’s Spatial Analysis for Conservation and Sustainability (SILVIS LAB), the Homeland of Infrastructure Foundation-Level Data (HIFLD) database, and Holian and Kahn’s (2015) database of central business district coordinates