Abstract
When nonresidents purchase agricultural properties, the land use decision can make farmland operate below potential, still allowing for tax credits. We empirically investigate how nonresident ownership affects the agricultural land use decisions in upstate New York. A difference-in-difference matching approach shows a causal link between purchases by nonresidents and a loss of 11% of acreage to a lower-productivity use. A generalization shows this conversion counts for one-seventh of the decreased agricultural land in intensive uses in similar counties. Perhaps a simple opportunistic use of the tax-credit criteria, this phenomenon contradicts the policy’s objective and might impose other consequences on rural communities.
1. Introduction
Farmland protection has always been an important issue in the United States. One of the central policies to protect farmland is preferential property tax policies, which lower farmers’ land-carrying costs and slow down farmland conversion, especially in suburban areas where farmland is subject to considerable development pressure. However, in the suburban-rural adjacent areas (far exurban) that are not within commuting distance to the urban core, the development pressure is low. Still, the pursuit of amenities by nonresidents can alter agricultural operations to land uses of higher recreational values. More important, this nonresident owner can exploit the same economic preferential tax policies. Nonresident farmland ownership can result in land use changes that target amenity preservation and tax minimization instead of agricultural profit, leading to decreased farmland in production and other consequences, ranging from the rural job market, the fiscal soundness of the local government, and even a land supply effect for local food production. This article is the first to examine nonresident ownership’s impact on agricultural land use empirically.
Recent studies suggest that rural–urban migration continues apace, but rural areas offering natural amenities also have been increasingly experiencing housing pressure (Castle, Wu, and Weber 2011; Lichter and Brown 2014). One result of the latter trend is that farmland is increasingly in possession of nonresident owners.1 Such nonresident ownership is most likely to exist in the urban-rural adjacent areas that are close to an affluent urban area and drivable but too distant to commute on a daily basis. We define these urban-rural adjacent areas around a major city as the “second-home shed.” The formation of a second-home shed relies on the fact that wealth and population are centralized in the urban area, but so are certain disamenities (e.g., noise, congestion). Many urban residents choose to spend their weekends and short vacations outside urban areas. Leveraging this trend, the vacation rental industry (e.g., Airbnb) has been surging in these areas, which provides reliable investment value to the rural properties that further puts a focus on the amenity value of land use decisions.2 Combining these factors with the preferential tax policies, metropolitan dwellers find purchasing homes in the second-home shed a feasible and increasingly popular option.
In the second-home shed, preferential tax policies may promote lower productivity agricultural uses by offering tax credits to operations doing little agriculture in practice. In our study area, upstate New York, nonresident owners have multiple ways to qualify for an agricultural exemption. For instance, they can rent to a local agricultural enterprise that already qualifies for the agricultural tax credit. This lessee farmer is required to file paperwork stating that the land is an active part of their operation (being farmed) but is not required to earn income from the leased land.3 As such, the incentive structures in place encourage rental but not full agricultural usage of the land. Suppose a farm purchase is for a recreational home and the land is leased to a neighboring farmer; the nonresident owner will get the rural amenity as well as a significant property tax credit. The lease rate can be close to or even zero. The lessee farmer does not necessarily operate the land to its full agricultural potential, often simply baling low-quality bedding hay once a season.4 This can happen for many reasons, including capacity limitations, the low cost of the land, or even at the behest of the nonresident owner who wants to preserve pastoral views provided by a mostly dormant field. Thus, the land from these leased nonresident-owned farms tends to be underutilized compared with its full agricultural potential (State Board of Equalization and Assessment 1991), without mentioning the unleased parcels. We focus on hay as the tax-minimizing land use and discuss prime farmland being cut as hay as a mismatched field in terms of quality and use.5
The literature on farmland policies and rural-urban relationships has predominantly focused on urban sprawl and farmland conversion (Brueckner and Kim 2003; Livanis et al. 2006; Irwin and Bockstael 2007; Liu and Lynch 2011). As suggested by Polyakov and Zhang (2008), preferential tax policies not only slow down development but also significantly affect rural land use change between cropland and forest. However, they did not show evidence that the current tax policy might have a downside. Moreover, few have formally studied nonresident ownership and land use changes in farmland owing to data limitations and the thinness of the market. This article contributes to the literature as the first study using annual land cover data sets and a causal inference framework to investigate the impact of nonresident ownership on farmland use in rural counties experiencing little direct urban influence. By addressing the downside of existing property tax policies in the emerging rural-urban relationship, this article adds to the debate around the rural-urban interdependence framework presented by Castle, Wu, and Weber (2011) and Wu, Weber, and Partridge (2017).
In this article, we identify that the land use change in farmland is a result of purchase by a nonresident owner. We focus on an area of upstate New York, where there is pressure in the form of nonresident agricultural ownership. The major identification problem is that recreational homes are not randomly assigned, since the second-home transactions are the result of an endogenous process. In fact, many of the amenities that make good agricultural land are also those sought by second-home buyers. Moreover, finding a valid instrument that affects the purchase decision and not the subsequent land use decision is almost impossible. Thus, we rely on a rich data set of observable attributes and a difference-in-difference (DID) matching (Heckman, Ichimura, and Todd 1997; Smith and Todd 2005) approach to deal with the endogeneity problem as well as the measurement errors.
While agriculture in the Northeast has struggled for years (even decades), the surge in interest of local foods has made farming viable again for farms close to urban centers. In this market, our results suggest that nonresident owners are converting land from other agricultural cropping usages to hay at a much higher rate than existing residents, many of whom are farmers. Specifically, if an agricultural parcel in our primary study period (2001–2010) and area (Columbia County, New York) is purchased by a nonresident owner and sold to a resident owner, the agricultural land cover proportion decreases by at least 0.11, and hay proportion increases at least by 0.11. The assumptions for DID-matching estimates are satisfied, and we see similar results in a neighboring county with similar geolocational amenities (Sullivan County). Moreover, a rough estimation of the total impact across similar counties with strong second-home markets suggests the underutilized farmland bought by nonresident owners accounts for more than one-seventh of the dropped (from 2002 to 2012) total acreage of farmland in intensive uses in the agricultural second-home counties. From the outset and to be clear, it is undoubtedly within the landowners’ choice set to engage in land uses that maximize their utility but not necessarily their profit. The use-value assessment in agricultural tax policies considers soil type and productivity as major components and intends to encourage agriculture use commensurate with land quality. When a landowner benefits from the tax policy but engages in a lower productivity use than the land can support, it shifts the county’s fiscal burden onto other residents. These urban second homeowners are wealthier than the other county residents. This exacerbates existing rural and urban financial inequalities.
2. Conceptual Framework
Housing markets function in a way that the highest bidder generally wins the right to purchase, and the property market in the second-home shed functions properly in this sense. In this section, we begin with a brief conceptual framework to explain why the nonresident owners would underuse their farmland. In our context, underutilization is the deviation from for-profit farmers’ optimal land use decisions based on historical land use decisions.
All farmland owners (farmers or nonresident owners) choose a division of their land to maximize their utility,6 which consists of two parts: the value of amenities (V(·)) and the outside good (x). This maximization problem is constrained by a budget balance condition and the total acreage of the disposable land (li). If we simplify the model so that the agricultural land division is between unmanaged hay (lHi) and agricultural uses excluding unmanaged hay (lAi), this maximization problem can be formulated as
1
In equation [1], and in the rest of this article, i and t denote parcel i and period t, Yi denotes income from outside the farm, f(·) represents the production function, and Si denotes parcel-specific attributes other than acreage. pA, pH, and px represent the prices for agricultural products, hay products, and the outside good, respectively. CA(Si) and CH(Si) are total costs per acre of agricultural production and hay production. CH(Si) < CA(Si), because hay is a minimally invasive land use that requires no fencing, no irrigation, and minimal land preparation. Prices are varying across time but identical for all farms in our study area, but costs (C), linear in acreage, are related to parcel-specific attributes.7 Variables tA and tH are tax credits per acre of agricultural production and hay production. Di = 1 if the parcel meets the minimum requirement of agricultural assessment, and Di = 0 otherwise. The following equation can be derived from the first-order conditions:
2
where U2 = ∂U / ∂V, and U1 = ∂U / ∂X. Three features might differentiate farmers and nonresident owners and may make their solutions different. First, farmers and nonresident owners may have different marginal utilities ( and
, where superscript R and F denote recreational owner and farmer, respectively). Second, farmers almost always meet the criteria of agricultural assessment
, while nonresident owners do not necessarily meet the criteria. Third, nonresident owners’ production function might be different from that of farmers.8 Appendix A derives solutions for special cases, which shows that the optimal land division of farmers is different from that of nonresidents in each case. However, it is not theoretically clear that nonresidents’ optimal hay usage is always higher than that of farmers
in normal cases. In a special case where farmers only derive utility from production and nonresidents’ production function is flat, it is likely that
.
Nonresident-owned agricultural properties will be underutilized in the sense that the land use management will deviate from resident farmers’ optimal and profit-maximizing management. This deviation is a combined result of the different valuation of aesthetic values and different farming capabilities. The empirical question is whether the observed hay proportion of nonresident owned farms is larger than what it would be with resident owners and the size of this deviation. Relying on these theoretical predictions, we choose land cover proportion across time as the outcome, which has two advantages. First, the land cover data are derived from data sets combining a large-scale survey and satellite images, which are less influenced by attrition bias and more reliable than survey data. Second, hay, identified as the most favorable land cover for nonresident owners in the literature (e.g., State Board of Equalization and Assessment 1991), is a distinguishable land cover in available data sets. A finding of hay intensification suggests that nonresident owners tend to underutilize their farmland.
3. Data
Before delving into the data, we briefly introduce the empirical analysis to provide a context. We focus on an area of upstate New York where pressure in the form of agricultural nonresident ownership exists. After the initial steps to identify nonresident owners, we proceed to investigate whether and how much land use is different between nonresident owners and regular farmers. Using a rich set of observational data, we use a DID-matching approach with regression adjustment, which addresses the selection problems from time-invariant unobservables (i.e., selection bias of second home). The primary outcome of this study is agricultural land use. We conduct several tests to show our sample satisfies the assumptions of DID matching and to check the robustness of the estimated effects. Since the methodology largely focuses on solving the problems in the available data, we introduce it in detail later. Data for this study can be divided into three general types: geographic data, tax parcel data, and county-level information.
Geographic Data
Geographic data contain spatial information about land cover, soil, temperature, precipitation, elevation, towns, populated areas, roads, conservation areas, and agricultural districts. The land cover data are from the National Land Cover Database (NLCD) data by the U.S. Geological Survey (USGS). Specifically, we used NLCD 2001, NLCD 2006, and NLCD 2011,9 which show New York state’s land cover. To match the time frame of the tax parcel data (ending 2014), we retrieved additional land cover data (Cropland Data Layer [CDL]) from the National Agricultural Statistics Service,10 including land cover information for 2002 and from 2008 to 2014. The soil data were downloaded from the Soil Survey Geographic Database (SSURGO).11 The agricultural district data were acquired from the Cornell University Geospatial Information Repository.12 All the other data sets are from Geospatial Data Gateway at the National Resources Conservation Service.13
Tax Parcel Data
These data sets contain cadastral data and transaction data. Cadastral data are from New York’s GIS clearinghouse,14 including tax parcel point files with detailed property information from 2004 to 2010 and a tax parcel polygon file with detailed information for 2014 or 2015 (varies across counties). The transaction data come mainly from the Economic Research Service at the USDA and contain property transactions in New York from 1999 to 2008. With these transaction data, we can get detailed information about the landowners and the second-home transactions. The temporal variations allow us to extend the ownership status (resident or nonresident) back to 2001.
County Information Data
These data consist of commuting data, population and income data, American Community Survey data, county business pattern data, and operation with direct sale data. Commuting information comes from the U.S. Census Bureau’s data set “Journey to Work,”15 which includes estimations based on large-scale surveys containing the number of workers in commuting flow from county to county. The commuting information is used to build a variable indicating nonresident ownership, as described later. Other data sets are mainly from the U.S. Census Bureau, and these national-level data are used to estimate the total impact.
Outcome
The available data provide for a feasible quasi-experimental approach to show how the land use of an agricultural property might differ between a nonresident owner and a farmer. Here, we discuss the specification of the outcome variable. As a result of the conceptual framework, the outcome of interest is naturally the land use decisions, of which the corresponding measurements in the geographic data sets are the land cover proportions of agricultural parcels. Examples of how the land cover layers interact with parcel boundaries are shown in Figure 1. We classify land covers into six types: agricultural land (used for agriculture except hay, including land in crops, vegetables, fruits, tree nuts, and pasture land), hay, forest, wetland, vacant land, and developed land. We calculate the proportion of these land cover types across time on each parcel and use them as the outcome variables. Furthermore, we define mismatched hay as hay on high-quality farmland with alternative higher-profit options.16
Example of Land Cover and Parcel Layers
Note: Cropland data are not used as examples, but they have the same resolution as the National Landcover Data.
Covariates
The set of covariates naturally consists of those independent variables influencing agricultural land use decisions. A subset of these observables (e.g., those variables determining the rural amenity of a parcel) are very likely to be correlated with the property’s second-home status, and omitting them from the analysis would cause selection bias. By combining geographic data sets and tax parcel data sets, we get a comprehensive set of characteristics of a parcel, including parcel acreage, total assessed value, age of the house, house square footage, distance to the primary road, whether the parcel is divided by a road, streams or ponds on the parcel, distance to population centers, elevation, average minimum temperature, erosion percentage, excess water percentage, soil limit percentage (SSURGO classification), land capability, prime farmland percentage, certified agricultural district. Though not exhaustive, these covariates can cover most of the relevant attributes of a parcel when it comes to land use and second-home transaction decisions.
4. Methodology
The main hypothesis we want to test requires narrowing the study location to areas popular for vacation properties and identifying nonresident owners in these areas. First, we define the second-home shed of a major city, New York City (NYC). Then we identify the agricultural counties in the second-home shed. Finally, we discuss how to identify the nonresident owners in these agricultural second-home counties.
Identifying the Study Area and Nonresident Owners
Conceptually, the second-home shed is an area close enough to a wealthy urban area to be suitable for regular short trips but far enough not to be commutable daily, and it is an area where ample amenities exist and the rural land market experiences pressure from urban dwellers.17 NYC is a major metropolitan area that imposes a huge influence on surrounding rural areas; Figure 2 illustrates the location of its second-home shed. The approach underlying Figure 2 begins with a basic assumption: if the tax documents are sent to NYC,18 we assume that the landowner’s primary residence is in NYC. This approach identifies the NYC-dweller-owned properties in upstate New York. From Figure 2, we can see that a large share of the properties owned by NYC dwellers is located within 200 km but outside the 100 km buffer of NYC, where the density of NYC homeowners is even higher than in the 50–100 km range. Therefore, we assume that the second-home shed of NYC is between its 100 km and 200 km buffer.
Second-Home Shed, New York City: (a) Distribution of New York City Homeowners in Upstate New York; (b) New York City Homeowners and Commuters
Source: Data are from New York State tax records and the Journey to Work survey.
We identify the agricultural second-home counties in this second-home shed. Not all the properties owned by urban dwellers in Figure 2a are recreational homes, since some of the owners may commute to NYC and have the tax bills delivered to their businesses. Figure 2b illustrates the numbers of workers in the commuting flow to NYC by county, where the numbers of homeowners from NYC are also included for comparison. The histogram suggests that Dutchess County and Putnam County, which are partly within the 100 km radius of NYC and have many commuters to the city but not as many homeowners from the city. Counties like Columbia and Sullivan have many more homeowners from the city than commuters to the city, indicating that a fair number of people own properties there but do not consider these properties to be primary residences. Hence, Columbia, Ulster, Sullivan, Delaware, and Greene appear to be candidate second-home counties of NYC.19 Because our focus is on agricultural land use, we narrow down the location of interest to agriculturally important counties. In New York, agricultural districts, where law encourages the use of land for farming, have been established since 1971 and certified regularly; we take it as an indicator of agricultural importance.20 Appendix Figure G1 shows the wide geographic dispersion of agricultural districts, where Columbia, Sullivan, and Delaware have large acreage in agricultural districts, and Greene and Ulster are mostly occupied by the Catskill Mountains. Thus, the nonresident ownership, preferential agricultural taxes, and high amenity values combine in Columbia, Sullivan, and Delaware Counties. We conduct our impact evaluation for Columbia County first, then apply it to Sullivan to verify its robustness.21
Finally, we investigate individual properties to find those agricultural properties owned by nonresidents.22 This involves some complexity, since nonresident owners are not necessarily from NYC (see Appendix C for detailed information). If the tax bill of a property is consecutively sent to a zip code that is not commutable, we treat it as a second home (see detailed second home zone for Columbia in Appendix Figure G2). If this property is used for agricultural purposes recently in the tax parcel data sets (since 1999), we classify the property as a nonresident-owned agricultural property. Although this approach generates an indicator for nonresident-owned farmland, two issues in this approach and a question concerning land cover data quality may make our estimates lower bounds of the true effects.
First, with this approach, we may identify fewer nonresident owners than there are in reality. We exclude nearby nonresident owners, as these short-range commutes are convenient enough to manage the farm, but this does not necessarily mean they actively operate the farm. This approach is conservative and includes the most credible second-home owners. Because we omit some real nonresident owners or mistakenly assign them into the control group (farmers), our estimates would be a lower bound of the true effects of nonresident owners on land use.
Second, the assumption that noncommutable tax bills indicate agricultural recreational homes may fail when a big agricultural enterprise from the nearby urban area owns a large acreage of for-profit farmland. In this situation, the underlying assumption that nonresident owners are not optimizing agricultural profit may fail. In our data, we identify only a few extremely large parcels owned by nonresidents, and empirical evidence does not show this as a major concern (Table 3). However, if this issue were pervasive, it does bias our estimates downward because these profit-motivated nonresident owners would presumably operate similarly to a resident farmer.
Difference-in-Difference Matching Results: Average Treatment Effects on the Treated (ATT) for Columbia County with Sample Limited by Acreage
Third, owing to limitations in available land cover data, we combine data from the NLCD and the CDL. These land cover data are used to construct the outcome variable, and the estimation strategy is capable of addressing the random technical differences between these sources, as these differences are not systematically related to nonresident parcels. While economists apply these data sets in empirical studies (e.g., Hendricks, Smith, and Sumner 2014), questions have been raised about their quality (e.g., Irwin and Bockstael 2007; Wickham et al. 2013), but these are context-sensitive. As representatives of the criticism in economic research, Irwin and Bockstael (2007) suggest that the NLCD data are not accurate in a target raster cell (30 × 30 m) if surrounding cells are of different land cover types.23 This issue matters most when the data set is used to identify low-density development or rather small landscape changes. This problem does not influence our results, since identifying low-density developments usually involves land cover detection of around 0.05 acres, which is one-quarter of a cell, whereas the farms we study are mostly above the scale of 10 cells.24 Even if this issue did exist (i.e., the owner turns a tiny portion of land from crops to hay and the change is nondetectable because of limits in data quality), it would cause the estimated effect to be a lower bound of the true effect.
Identification Problems
The theory presented in the second section suggests that underutilization of nonresident-owned farmland naturally comes from nonresidents’ utility-maximizing decisions, given their flexible financial constraints and recreational amenity orientation. Our empirical work intends to quantify the agricultural land use changes caused by this nonresident farmland ownership. The identification issues can be classified into two types: selection bias due to the nonrandom assignment of nonresidents and measurement errors due to data limitation.
The nonresident ownership is not randomly assigned to the farmland: nonresident buyers choose which property to buy, and farmers choose whether to sell. It turns out that agricultural parcels with certain attributes are more likely to be owned by nonresidents, and these attributes are also correlated with the land use outcome. Methods ignoring this type of selection issue tend to yield inconsistent estimates. Given that there are no persuasive instrumental variables available (exogenous attributes influencing second-home transactions but not land use decisions), matching estimators are good choices, especially when the attained variables can adequately describe both the selection process and land use outcome (Blundell, Dearden, and Sianesi 2005). Owing to the relatively short time range of the data, we assume that if the parcel boundaries do not change throughout this period, we do not expect that the area, farmland quality, geography, precipitation, temperature, elevation, assessed values, or other attributes vary.25 We use a propensity score matching estimator based on observable parcel attributes to deal with the selection-on-observables problem.
Besides the selection-on-observables problem, reverse causation and selection-on-unobservables are relevant issues in this context. For example, the nonresident buyers are very likely to make purchase decisions based on prior land use, and the land use variable is usually autocorrelated. If second-home buyers prefer farmland with more land in hay, the land cover proportion of hay is likely to be higher in the nonresident purchased parcels, even if nonresident owners behave the same way as a farmer might. In this situation, the causal relation cannot be fully recovered by simply implementing regression or matching estimator methods without considering selection on prior outcomes. Moreover, a similar problem happens when the nonresident buyers prefer some time-invariant unobservables, and these unobservables happen to be correlated with land use outcomes. The reverse causation could be considered as a special case of the selection-on-unobservable problem, and both are usually addressed in the literature with a DID approach when these unobservables are time-invariant.
Strategies to deal with the selection problems assume that parcels stay the same throughout the study period. But the tax assessment data show that some agricultural parcels are splitting up or are merging,26 which makes it hard to address the selection issue. We also consider this as a major measurement error, since the acquired parcel boundaries cannot reflect these changes, and the resulting parcel attributes would be wrongly computed. The analysis only includes parcels with the same acreage and assessed values throughout the study period to avoid problems brought by changing parcels.
Another kind of measurement error might exist considering two different groups of land cover data we use: one group is from NLCD by the USGS, and the other group is from CDL by the USDA. Even though they use the same satellite data,27 the measurement standard by two groups of people across time may be different, which is suggested by Wickham et al. (2013). One major difference is that in CDL 2002, there is only alfalfa hay denoted, and we cannot differentiate other hay from grassland,28 whereas in later land cover data sets, alfalfa and other types of hay are clearly noted (thus CDL 2002 is not used in the analysis). Moreover, some measurement accuracy issues might persist in all data sets and bias the estimates. The measurement errors can be addressed by the DID-matching approach, since the time-invariant biases (persisting errors) will be taken out, while the technical differences between different data sets are not likely to be dependent on nonresident ownership (though there might be an attenuation effect because more noises are included).
Identification Strategy
To avoid major measurement errors and deal with the selection problems, we first exclude the parcels with experience of “major” divisions or mergers. Taking Columbia County as an example, we purge those parcels with an unbalanced panel of ownership information or with vastly changing acreage on record.29 A balanced panel for 813 stable agricultural parcels is acquired after this process.30 We assume that the split or merged parcels are influenced by nonresident owners in a similar way to resident owners. With this approach, we get a less biased estimator for avoiding major identification problems and still achieve a lower bound estimation for the population impact, as suggested already.
We first consider a fixed-effects model to solve the selection problems.31 However, the limitation in the available assessment data (2004–2010) and land cover information (data available for 2001, 2002, 2006, and 2008–2014) results in a final panel including complete information for only 2006, 2008, and 2010. Also, because the land cover does not usually change immediately, the changes in homeownership between 2006 and 2008 may not be revealed by the land cover in 2008 but would be in 2010. Thus, we end up with a DID approach that uses the outcome in a way that accommodates the time-lagged nature of land cover change and requires less data.
The main analysis combines the DID approach and a matching estimator to deal with the endogeneity problem as well as the remaining measurement errors. We take those parcels newly purchased (between 2001 and 2010) by nonresident owners as the treated group and parcels owned by residents, who are more likely to be actively farming throughout the period, as the control group. The average treatment effects on treated (ATT) are estimated based on a propensity score matching approach, in which the scores are generated through a logit model linking the treatment dummy (one if the farmland was purchased by a nonresident from a farmer in the study period, zero otherwise) to various parcel attributes. Intuitively, by this process, we compare the land use changes in the treated group in the study period with those in the control group, given that these paired parcels have nearly identical time-invariant attributes. This DID-matching estimator (Heckman, Ichimura, and Todd 1997) is the preferred estimator among many matching estimators (Smith and Todd 2005). It deals with the endogeneity of nonresident ownership in our context (selection on time-invariant observables and unobservables), and in practice, it performs better when the study integrates different data sets.
The validity of the DID-matching estimator requires three assumptions: the stable unit treatment value assumption (SUTVA), conditional parallel trends assumption (unconfoundedness or ignorable assignment), and the common support (overlap) assumption. Appendix B and Appendix Figure G3 reporting the details of these assumptions in our context; here we discuss the common support (overlap) assumption, since it is usually the one with the most concerns in empirical work. The common support assumption essentially suggests that for every nonresident parcel purchase (a treated observation), there must exist a farmer-owned parcel (in the control group) with roughly the same observed attributes and hence the same propensity score. This assumption is usually implemented loosely in empirical applications; for example, it is sometimes turned into a requirement of the same domain of the propensity scores for different groups (as in Dehejia and Wahba 1999, 2002).32 Based on carefully conducted simulations, Huber, Lechner, and Wunsch (2013) suggest that trimming observations with too much weight (i.e., with weak support) is important for all estimators and that radius matching (see Dehejia and Wahba 2002) combined with regression performs best overall.
On top of the DID-matching strategy, we also use a regression-adjustment approach to address the leftover discrepancies in covariates. Average nearest neighbor matching estimators assign the same weight to the chosen number of nearest neighbors, regardless of their distance in propensity score from the treated observation, which may be consistent on a large enough sample under some circumstances (Abadie and Imbens 2006) but is not likely to work here.33 Other pure matching estimators are also proven to perform worse than matching and regression-adjustment estimators (Abadie and Imbens 2011). Moreover, in a relatively small sample, nonexact matching leads to a potential bias through the leftover discrepancies in the matched sample (Caliendo and Kopeinig 2008), and regression adjustment would reduce this bias by using the difference in the covariates. Thus, we follow the recommendation by Rubin (1973) of combining matching with regression adjustment, which is used in many empirical studies (e.g., Heckman, Ichimura, and Todd 1998; Rubin and Thomas 2000; Lawley and Towe 2014).
Therefore, the DID-matching estimators will identify the ATT in this form:
3
where Y1,it and Y0,it denote the outcomes for observation i at time t prior to treatment for treated and control observations, respectively. Y1,it′ and Y0,it′ denote the posttreatment outcome. Di is a dummy denoting the treatment status, with Di = 1 denoting the treated observations (nonresident owned parcels), and Zi as the vector of time-invariant parcel attributes. Moreover, the regression-adjustment estimators to be presented will adjust this ATT by the differences in covariates between the matched pairs and can be interpreted by
4
where β0 is the vector of coefficients resulted from a regression of outcome on Zi, which is conducted merely on matched controls.
5. Results and Discussion
Second-home markets around NYC have been increasingly active in recent decades, as the property price in urban areas ascends to historical highs. Since 2001, nonresident ownership of farmland has increased by 22.7% in Columbia County, increased by 29.8% in Sullivan County, and remained stable in Ulster County, which has maintained rates hovering around 65%, demonstrating a saturated market. Appendix Table G1 shows these results.
The Impact of Nonresidents on Farmland
We apply our estimation strategies to Columbia County where a second-home market is active and expanding. Relying on the conceptual framework, regressing land cover proportions on parcel attributes may yield insights into the different management strategies between farmers and nonresidents. Preliminary regression models and details are shown in Appendix D. We show separate OLS regressions for resident owners and nonresident owners, and it is clear that their land cover management reactions to different parcel attributes are quite different.
To get the proposed ATT, we proceed with the DID-matching approach and transform the panel data into a cross-sectional data set. We divide the 813 transacted agricultural parcels into four groups: transacted from resident to nonresident during the study period, transacted from nonresident to resident during the same period, owned by residents throughout the study period, and owned by nonresident throughout the study period.
In the main specification, the focused treatment group consists of parcels transacted from resident to nonresident in the study period, which is denoted as treatment group 1, and the control group includes parcels owned by residents. We get 49 agricultural parcels in treatment group 1 and 609 in the control group.34 To construct the outcome, given that the study period is from 2002 to 2010,35 we take differences for each parcel between the land use proportions of 2012 and 2001.36 The summary statistics are shown in Table 1. The sample covariates are not balanced, suggesting that nonresidents do seek different parcels from an average property buyer. Parcels in the treatment group have a higher average land proportion in agriculture and lower proportion in hay before 2002.
Summary Statistics for Columbia
A logit model is used to generate propensity scores (shown in Appendix Figure G4) used in the DID-matching estimator. Logit results and covariates are shown in Appendix Table G2. These variables are time-invariant, simultaneously influencing treatment and outcome and not influenced by the anticipation of being treated. We exclude the prior land use from this logit model because, in a DID-matching estimation, controlling on pretreatment outcomes is biased and equivalent to simple matching (Chabé-Ferret 2015).37
Following the idea of trimming observations with weak support (Huber, Lechner, and Wunsch 2013), for one treated, we require six control observations in a propensity score radius of 0.01. During this process, 3 out of 49 treated are trimmed.38 Appendix Table G3 and Figure G5 show the balance tests and distributions before and after radius matching resampling (n = 6). We can conclude that this radius-matching approach generally ensures the overlap of covariates between the treatment and control group, improving from the unbalanced raw sample.
Table 2 shows the results of the OLS estimator (as baseline), radius matching estimators, and regression-adjusted matching estimators with different numbers of nearest neighbors restricted by a caliper of 0.01. We can see that with the full sample, the treatment (nonresident) effects estimated by OLS is −0.09 (significant at the 95% level) for agricultural land cover proportion change and 0.07 (90% level) for hay. All matching estimators suggest a greater impact than the OLS estimator (larger than 0.11, negative, and significant for agricultural usage change; larger than 0.1, positive, and significant for hay change), which points out the differences made by mitigating selection bias. Comparing with regression-adjusted estimators, average nearest neighbor matching generally estimates a smaller effect size and a higher standard error, which is due to three issues: the leftover discrepancies in covariates, unevenly distributed counterpart controls for treated observations near boundaries (with scores very close to maximum or minimum scores), and the failure of bootstrapping standard errors in matching estimators (Abadie and Imbens 2008). In the regression-adjustment specifications, the first two sources of potential bias are addressed, and robust standard errors (Abadie and Imbens 2006) are implemented, which results in estimates of over 0.11 significantly positive impact in hay change, robust to a different number of neighbors. Regression adjustment with n = 3 gives the highest ATT (−0.13 for agricultural, 0.13 for hay, both significant at 95% level), whereas regression adjustment with n = 6 gives the smallest ATT (−0.12 for agricultural, 0.11 for hay, both significant at 95% level). Propensity score tests suggest that the matching works better when the number of nearest neighbors rises.39 Thus, we take the results with n = 6 as the least biased ATTs. Compared with regular residents having identical farmland, on average in Columbia County the nonresident owners bring a decrease of at least 0.12 in farmland proportion used for agricultural (except hay) and an increase of 0.11 in farmland proportion used for hay.
Difference-in-Difference Matching Results: Average Treatment Effects on Treated (ATT) for Columbia County
The thinness of the market we are studying and the identification issues limit the sample size of our treated group, which brings one concern: the estimated results might be subject to the influence from the outliers. Thus, it is important to show that the results are similar on different subsamples. Under the consideration that very large agricultural parcels carry higher opportunity costs and thus are less likely to be recreational second homes, we conduct a further estimation with “less or equal to” restrictions on acreage. The results are shown in Table 3, where the cutoff criteria are set to 70 and 90 acres. In general, the regression-adjustment estimators for the smaller parcels consistently produce larger ATTs in agricultural use change; for example, for the group of acreage less than or equal to 90 acres, the estimated ATT (with n = 3) equals −0.15 for agriculture (significant at 90% level) and 0.14 for hay (not significant at 90% level but very close). This is consistent with the theory that recreation-oriented landowners tend to decrease agricultural usage while keeping the land in hay to attain the agricultural assessment.
Placebo Test and Alternative Hypothesis Test
To prove the validity of the estimated average treatment effects, we need three conditions: SUTVA, conditional parallel trends, and common support. Because SUTVA is not a major concern in our context, and the common support condition is enhanced by the caliper-matching process, the conditional parallel trends condition is the only obstacle left to validate the ATT assumptions. To test the conditional parallel trends assumption, we usually compare the treatment group’s pretreatment land use trend with the control group’s corresponding land use trend, conditional on a comprehensive set of covariates. One tricky thing here is that we only have one year’s land cover (2001) before the first relevant second-home transaction, which is not enough for a trend test. To deal with this, we take all such parcels that are transacted from residents to nonresidents from 2006 to 2010 as the new treatment group and estimate placebo DID-matching ATTs with land cover changes before the actual treatment (2001–2006) as the outcome.
We expect zero treatment effects from this test, since there is no treatment happening in the treatment group. Table 4 shows the placebo test results: four treated are purged by radius matching. Even though OLS gives significant results to the opposite direction of the estimated effects, all adjusted matching estimators give small, insignificant trend differences. In any event, there are no statistically significant differences in the prior conditional trends between the treatment group and control group across all the matching specifications, and the insignificant opposite-direction trend differences further guarantee our estimated ATTs as consistent lower bounds. These results not only secure the parallel trend condition but also rule out the possibility that random errors before the treatment dominate (due to the small sample size) the estimated effects.
Placebo Test Results-Average Treatment Effects on the Treated (ATT) for Columbia
Now that the fitness of the three conditions to our data set is tested, we are confident that the differences in land cover trends are caused by the second-home transactions. But are these effects led by the fact that nonresident owners are underusing farmland instead of the discontinuity in management associated with the transactions or ownership changes? To rule out this alternative hypothesis, we conducted the same DID-matching approach on another group of parcels (treatment group 2), which includes all parcels that transacted from nonresidents to residents from 2002 to 2010. The results are shown in Appendix Table E1, with no significant effects for all estimators across all land cover types, so it appears that the increase of farmland proportion used for hay is due primarily to the nonresident ownership and not simply the transition of ownership.
Based on the results of our estimation and tests, if an agricultural parcel in Columbia County is sold to a nonresident owner instead of a resident, the agricultural land cover proportion change during 2001–2012 will decrease at least by 0.11, and the hay proportion change during 2001–2012 will increase at least by 0.11. Moreover, nonresidents tend to produce hay on farmland with high capability and greater potential profit in another use. Furthermore, this percentage is the estimate on recent transactions and relatively quick changes in land use, and it does not measure land-use changes for second-home residents before the beginning of the study period.
A Robustness Check: The Impact of Nonresidents on Sullivan
As discussed previously and illustrated in Appendix Table G1, Columbia and Sullivan Counties are similar when it comes to the nonresident-owned ratio of farmland, and the agricultural parcels in Ulster look like a saturated second-home market. Since there are not many new nonresident owners in Ulster, we are only able to apply the DID-matching approach to Sullivan to get some ideas of the robustness and generalizability of estimated impacts for Columbia. Although Sullivan has a similar proportion of nonresident owners as Columbia, it is not geographically close to Columbia: the Hudson River and the Catskill Mountains are between them. Thus, similar results for Sullivan would suggest that the underlying behavior of nonresident owners is not county-specific, and the negative impact of nonresident ownership generally exists at least in emerging second-home counties.
Appendix Table G4 shows the DID-matching estimates for Sullivan, and the results are generally similar to the Columbia results: the impacts on agricultural usage and hay are sizable and significant. Because the sample size in Sullivan is even smaller, we increase the matching radius to 0.15 and place more trust in the specification with more nearest neighbors. Based on the adjusted matching estimations with n = 6, we claim that compared with regular residents, on average, the nonresident owners bring at least 0.9 less (statistically significant at 90% level) agricultural land use, 0.9 more (significant at 90% level) hay, and 0.11 more (statistically significant at 95% level) mismatched hay to farmland in Sullivan County. Also, like Columbia, the Sullivan data satisfy the SUTVA condition, the common support condition, and the conditional parallel trends condition (test results available on request).
As a conclusion, based on the DID-matching approach, we get very similar average treatment effects of nonresident owners on farmland in Columbia and Sullivan Counties, where agricultural second-home markets are emerging. We take the land cover change on resident-owned parcels as the counterfactual outcome for parcels newly owned by nonresidents and show that the differences in land cover change are caused by the nonresident owners’ land cover management behavior, which is quite different from that of farmers. Specifically, during the study period (2002–2010), if an agricultural parcel is owned by a nonresident instead of a resident, its agriculture proportion change will drop, on average, by about 0.1, and its hay proportion will rise by about 0.1.
6. Discussion and Total Impact Estimation
We have identified the non-resident owners’ impacts on agricultural land cover in two second-home counties of NYC, but one might wonder at the prevalence of this situation in the country at large and the potential magnitude of the aggregated impact. We assume other major urban areas with considerable wealth and population should also be a supply of second-home owners, who generate similar land cover trends to what is estimated here. To find urban-adjacent areas most like Columbia County, we use a distance measurement incorporating Mahalanobis distances and a few structural assumptions, which are described fully in Appendix F. For each county, this procedure considers its distance to all surrounding metropolitan areas and uses a comprehensive set of metropolitan characteristics (Appendix Table F1). It generates a ranking list with those counties most similar to Columbia ranked first. Not surprisingly, we find the most similar counties along the Northeast corridor from Washington, DC, to Boston. Many of these counties are known second-home spots, and they all fall under agricultural assessment tax provisions. To quantify an estimate of the total impact of nonresident owners on agriculture in this region, we use the estimate of 0.1 underutilized farmland acreage by nonresidents to the top 4% of the ranking list.
We use the acreage of land in farms in each selected county from the USDA National Agricultural Statistics Service (USDA NASS) data as the measure of the total area of agricultural land, and the acreage of land in production (harvested cropland) is used as the comparison.40 Table 5 summarizes the rough estimation of total underutilized acreage under different scenarios. Assuming that the selected counties in the ranking list have a similar situation as Columbia (20% of agricultural parcels are owned by nonresidents), the total acreage of underutilized land in farms would add up to at least 45,393 acres in 2012, which is about 5% of the total acreage of harvested cropland in the same counties. If these counties have reached a status of saturated second-home market (65% nonresidents), the total underutilized acreage by nonresidents will be 147,527 acres, which is about 16% of the total acreage of harvested cropland.
Estimation of the Total Impacts of Nonresident Owners
More importantly, one should note that the decrease of harvested land acreage is much larger than the decrease of land in farms, and this discrepancy suggests that the harvested proportion of land in farms is dropping across time. Given the definition of harvest cropland, this type of loss suggests land use changes in farms. In Table 5, the estimated increase in underutilized acreage from nonresidents (11,348) is comparable to the farmland loss reported by USDA NASS, regardless of whether we compare it to the decreased acreage of farmland in production (−70,211) or the decreased acreage of land in farms (−33,415). These facts illustrate the potentially important role of farmland underutilization in decreasing agricultural production, given that the underuse estimated here is defined in just one way; within nonresident-owned farms, it is most likely a lower-bound estimate. Moreover, this kind of underutilization problem does not just suggest agricultural production loss; it also links to misallocation issues: the numbers shown here also indicate the total amount of suboptimal agricultural tax benefits that might be given out.
Though not the direct focus of this article, the underutilization of peri-urban farmland could have negative impacts from other perspectives. One example is its impact on the ever popular “local” food supply (Darby et al. 2008; Hardesty 2008). Lands prime for local food production lie in the same belt of distance as the lands in the second-home shed.41 Urban dwellers buy recreational homes on farmland and pose a threat to the production of local foods, while these local products are highly valued by these very landowners (Thilmany, Bond, and Bond 2008). Moreover, agricultural land being underutilized may have social spillover effects among rural residents: they potentially pay a higher tax rate to balance out the tax credits offered to the nonresident owners.
7. Conclusion
Our analysis indicates that a considerable amount of farmland is underutilized by nonresident owners. This is an interesting finding and is a perfectly acceptable market outcome. However, under the incentives created by the flexible and unbinding criteria for an agricultural assessment, this underutilized land is still eligible for a tax credit. Upon enrolling in the credit, the broader base of rural taxpayers is effectively subsidizing this parcel’s underutilization. Based on results from a regression-adjusted DID-matching approach, we find patterns in second-home counties of NYC suggesting that, compared with farmers, nonresident owners of farmland place a smaller proportion of their land in crops and a larger proportion in hay. These results should be taken as lower bounds of the true impact of nonresident owners on agricultural land use because the way to identify the nonresident owners is rather conservative and omits long-term nonresident owners, and some identified nonresident owners might be actual farming enterprises.
A great deal of attention has been paid to farmland conversion, but few studies have considered the problem at the intensive margin of the agricultural parcels. Incentivized by the agricultural tax credits that can be easily attained, nonfarmers can operate farmland for primarily nonfarming objectives at a societal expense.42 This phenomenon, likely led by unbinding tax credit criteria, is a contradiction to the policy’s objective and might impose other consequences on rural communities. Admittedly, nonfarmer owners could provide certain social benefits by converting crop production to conservation land covers, and in practice, these social benefits could be paid back to the owners by programs like the Conservation Reserve Program (CRP).43 However, in our study area and in similar counties, CRP is perceived to target specific lands with issues limiting agricultural production (e.g., erosion), and its enrollment of farmland is limited. This likely suggests that converting farmland to conservation covers is generally not perceived as a socially optimal behavior, whereas such behaviors would be incentivized by agricultural tax credit nonetheless.
Future work can be done in several directions. The first is to generalize the impact evaluation to more detailed outcomes (e.g., getting farmers’ perception and valuation of nearby nonresident-owned farmland or classifying land cover into more detailed types indicating productivity, such as high-quality hay and low-quality hay). A second direction is to generalize the impact evaluation to other counties of interest. Finally, and more broadly, the effects on the supply of local food and the supply of available farmland to new and beginning farmers producing local products is an important policy outcome to examine.
Acknowledgments
We thank participants at the Agricultural and Applied Economics Association summer meetings (2017 and 2018) and seminar participants at the University of Connecticut for many helpful comments and suggestions as this work progressed through many stages and data improvements. This work was supported by the USDA, National Institute of Food and Agriculture, Hatch Project 1007175.
Footnotes
Appendix materials are freely available at http://le.uwpress.org and via the links in the electronic version of this article.
↵1 In this article, “nonresident owner” is generally a synonym for recreational or second-home owner. A similar concept is “non-operators” in Nickerson et al. (2012).
↵2 The COVID-19 pandemic has induced similar migrations from dense urban areas.
↵3 See §301-1, §301-2, and §301-4 (28) in New York Agricultural District law, available at http://www.agriculture.ny.gov/ap/agservices/25-AA.pdf. See also Bryant (1976).
↵4 They primarily use it for low-quality hay, according to State Board of Equalization and Assessment (1991).
↵5 The data limit a full examination of a plot that is actively managed for hay (with seeding and nutrient application) from a plot that is unmanaged and simply cut, often only once a season, to produce a low-quality feed product or bedding.
↵6 We choose a utility-maximizing framework instead of a profit-maximizing one, since the nonresidents are not necessarily maximizing profit.
↵7 That is, costs of inputs other than land vary across different farms. Specifically, farms with different conditions need different amounts of other inputs.
↵8 A likely situation is that nonresidents have relatively limited resource (i.e., time, equipment, or skills) to increase crop productivity.
↵9 Specifically, the 2011 edition of NLCD data. See https://www.mrlc.gov/data/legends/national-land-cover-database-2011-nlcd2011-legend.
↵11 The SSURGO database contains information about soil, which is collected by the National Cooperative Soil Survey; see http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/?cid=nrcs142p2_053627.
↵14 See http://gis.ny.gov.
↵15 See https://www.census.gov/topics/employment/commuting.html.
↵16 This analysis is based on yield data from SSURGO and price data from the USDA National Agricultural Statistics Service. Having alternative higher-profit options means the operators have high opportunity costs to plant hay on these farms.
↵17 Appendix Figure F1 illustrates the pressure imposed by major metropolitan areas on rural land markets.
↵18 Since tax bills are important, we assume that it is not the case that landowners have their tax bills sent to addresses other than their primary residences or offices (and their offices are close enough to be commutable from the primary residences). These assumptions will be reconsidered to identify individual nonresident owners. This figure is based on landowners’ billing addresses for tax documents in 2010, and where the tax documents are sent is obviously based on the billing addresses.
↵19 This assumption is confirmed in many articles documenting New York City residents’ movements during the COVID-19 pandemic; see (Tully and Stowe 2020).
↵20 In Jackson-Smith and Jensen (2009), agricultural importance was formally defined and calculated at county level. According to this definition, Columbia and Sullivan Counties are agriculturally important.
↵21 The analysis for Delaware County could not be conducted because of data limitation.
↵22 We confirmed that individual parcels use the agricultural tax credit from current county tax rolls.
↵23 Wickham et al. (2013) conducted an accuracy assessment of NLCD 2006 data and suggest that “the high overall agreement rate was driven by the large proportion of area of no change.” 300.
↵24 The fifth percentile of the sample acreage is about 2.2 acres, which means that 95% of the investigated farms include at least 10 cells.
↵25 At least, the changes in these attributes would not be big enough to influence the second-home transaction.
↵26 We can observe that some parcels with the same parcel ID have very different acreage across time.
↵27 The data are from the Landsat program, which is the longest-running enterprise for acquisition of satellite imagery of Earth.
↵28 Although the econometrics are supposed to deal with random errors, this issue is crucial enough to be a reason not to use CDL 2002 in our main analysis.
↵29 By unselecting parcels with “vastly changing acreage,” we mean the absolute value of percentage change in acreage is smaller than or equal to 5%.
↵30 This panel for ownership information ranges from 2001 to 2010. Specifically, we have data for 2001, 2002, 2004, 2006, 2008, 2010, among which the 2001 and 2002 ownership information is imputed based on sales data.
↵31 With the sample of stable parcels, an ideal approach to solve our identification problems would be a fixed effects panel model based on prematched sample as in Alix-Garcia, Sims, and Yañez-Pagans (2015), where the prematching deals with the selection on observable problem by ensuring ample covariate overlap, and fixed effects mitigate the biases brought by time-invariant unobservables and measurement errors.
↵32 To address the problem of lacking overlap, Crump et al. (2009) developed a systematic approach and suggested a simple rule of discarding propensity scores outside (0.1, 0.9), but this strategy is not applicable in this study because it will delete too many treated observations.
↵33 In a relatively small sample, nonexact matching leads to a potential bias through the leftover discrepancies in the matched sample (Caliendo and Kopeinig 2008).
↵34 Other publications using the similar sample size and similar method include Lawley and Towe (2014) and Bertram-Huemmer and Kraehnert (2017).
↵35 The study period lasts from 2002 to 2010 for two reasons: it is consistent with the eight-year cycle of the agricultural district certification and would result in a reasonable average estimate over time; it is also consistent with the temporal range of available data, allowing us to get the land cover before the first possible transaction and still have a while after the latest transaction.
↵36 Using the proportions of 2012 allows two years’ tolerance for the land cover change to take place while not including too much confounding effect from later unobserved ownership changes.
↵37 We do not use a simple matching controlling pretreatment outcome, since DID matching generally performs better according to previous studies (Smith and Todd 2005; Chabé-Ferret 2015). We also exclude variables not vital for outcome and not explaining treatment at all, including average minimum temperature, average distance along roads to nearest business districts, square footage of the house, and history of the house. They will increase the variance of estimated propensity scores and potentially cause problems in a sample with limited size (Augurzky and Schmidt 2001).
↵38 This does not vary when we set the trimming criterion to three, four, or five neighbors in a radius of 0.01.
↵39 These tests are conducted with the pstest command in Stata, which calculates several measures (the t-test, the standardized percentage bias, and the variance ratio) of the balancing of the specified variables.
↵40 The USDA National Agricultural Statistics Service defines a farm as any place from which $1,000 or more of agricultural products are produced and sold, or normally will be sold, during the year (https://www.nass.usda.gov/Charts_and_Maps/Farms_and_Land_in_Farms/index.php). Land in farms consists of land used for all kinds of operation, consistent with the land cover types investigated in this article.
↵41 Local food is defined as produced within 161 km (100 miles) by “locavores,” the local food movement.
↵42 The agricultural tax credits function as an incentive here for two reasons. First, the reason the properties are in our sample is that they were historically agricultural lands in or before the study period, which means they are qualified for and likely to receive agricultural tax credits. Second, switching to hay but not nonagricultural usage (e.g., natural grassland) suggests that reducing management effort and acquiring natural amenities are not the whole purpose of the land use switch; the precise target is to achieve acceptable levels of these two goals while maintaining low land holding costs. While this second point suggests nonresidents are making rational decisions balancing benefits and costs, the tax credits function essentially as the incentive.
↵43 CRP, providing $82 payment on average nationally (2020), is perceived to be a competing option in this context. Although New York state’s CRP rate is typically lower than the national average, it is reasonable to say that CRP is an alternative program (at least in some cases) to agricultural assessment in lowering land holding costs. However, CRP is generally not the choice in our studied region for a few reasons. (1) The studied properties are primarily agricultural land and do include many CRP properties (possibly some newly enrolled CRP properties), and official figures confirm that CRP acres are limited in the investigated counties (in 2014, based on https://conservation.ewg.org/crp_regions.php?fips=36000®ionname=NewYork, Columbia 449 acres, Sullivan 41 acres). (2) If a farmer switches from agriculture to CRP, he needs to pay higher property tax, while the CRP payment may not (usually does not) offset the discrepancy. (3) It is fair to say that CRP holds a relatively high bar in approval for enrollment—CRP is perceived to target specific lands with erosion issues and mostly in the Midwest, so in practice, the preferential tax policy is the program of choice to lower farmland holding costs for nonresidents in our study region.