Using Auxiliary Population Samples for Sample-Selection Correction in Models Based on Crowd-Sourced Volunteered Geographic Information

Trudy Ann Cameron and Sonja H. Kolstoe

Article Figures & Data

Figures

Tables

  • Table 1 Descriptive Statistics (Proportions) for Variables in First-Stage Engagement-Intensity Models
     qBus Sample ProportionseBird Sample Proportions
    Engagement data
    available
    1.0001.000
      1=Unfamiliar with
    eBird CS project
    0.8020.000
      2=Heard of eBird but
    not a member
    0.0830.000
      3=eBird member, but
    report rarely
    0.0310.391
      4=eBird member,
    report <1/2 of birds
    0.0290.280
      5=eBird member,
    report >1/2 of birds
    0.0300.177
      6=eBird member,
    report almost all
    birds
    0.0240.152
    Travel 1+ mile data
    available
    0.4420.769
      Trips 1+ miles = 00.3480.277
      Trips 1+ miles = (1,4)0.0630.113
      Trips 1+ miles = (4,7)0.0650.065
      Trips 1+ miles =
    (7,10)
    0.0480.025
      Trips 1+ miles =
    (10,21)
    0.0760.093
      Trips 1+ miles =
    (21,41)
    0.0650.065
      Trips 1+ miles =
    (41,72)
    0.0630.078
      Trips 1+ miles =
    (72,124)
    0.0650.067
      Trips 1+ miles =
    (124,174)
    0.0630.052
      Trips 1+ miles =
    (174,238)
    0.0630.032
      Trips 1+ miles =
    (238,364)
    0.0620.054
      Trips 1+ miles = 3650.0170.078
    Audubon CBC data
    available
    1.0001.000
      Has participated in
    CBC
    0.0920.528
    Bird-hunting data
    available
    1.0001.000
      Hunts birds0.2240.073
    Gender data available1.0000.994
      Gender: male0.4890.427
      Gender: female0.5110.573
    Age data available1.0000.993
      Age: ≤24 years0.1250.018
      Age: 25-34 years0.2240.065
      Age: 35-44 years0.1960.089
      Age: 45-54 years0.1350.146
      Age: 55-64 years0.1750.311
      Age: ≥65 years0.1450.370
    Income data available1.0000.804
      Income: <$25K0.1790.072
      Income: $25K-$50K0.2190.203
      Income: $50K-$75K0.1890.231
      Income: $75K-$100K0.1410.173
      Income: ≥$100K0.2720.321
    Region data available1.0001.000
      Region: West0.2251.000
      Region: Northeast0.1860.000
      Region: Midwest0.2170.000
      Region: South0.3720.000
    Employment status data
    available
    1.0000.849
      Employment status:
    full time
    0.4730.359
      Employment status:
    part time
    0.1320.080
      Employment status:
    looking for work
    0.0570.008
      Employment status:
    unemployed
    0.1450.066
      Employment status:
    retired
    0.1930.487
    Education data available1.0000.976
      Education: high
    school
    0.2260.036
      Education: some
    college
    0.3560.158
      Education: college
    grad
    0.2630.288
      Education: master’s
    degree
    0.1180.396
      Education: doctoral
    degree
    0.0380.121
    Observations4,1611,081
    • Note: Availability indicators are proportions of the total sample; group shares are proportions of the available data.

  • Table 2 Estimated Coefficients, Ordered-Probit Engagement-Level Models Examples with Maximum Heterogeneity
     Ordered Probit
    qBus Data
    Ordered Probit
    eBird Data
    Travel 1+mile data available0.136(0.661)a
      Trips 1+miles = 0−0.870(0.667)−2.828***(0.228)
      Trips 1+miles = [1,4)−0.260(0.678)−3.113***(0.268)
      Trips 1+miles = [4,7)−0.554(0.683)−1.862***(0.287)
      Trips 1+miles = [7,10)−0.388(0.681)−2.243***(0.372)
      Trips 1+miles = [10,21)−0.0566(0.667)−1.820***(0.250)
      Trips 1+miles = [21,41)−0.0189(0.668)−1.496***(0.262)
      Trips 1+miles = [41,72)0.287(0.665)−1.354***(0.254)
      Trips 1+miles = [72,124)0.518(0.663)−0.711***(0.258)
      Trips 1+miles = [124,174)0.400(0.663)−0.644**(0.293)
      Trips 1+miles = [174,238)0.487(0.662)−0.582*(0.321)
      Trips 1+miles = [238,364)0.730(0.661)−0.422(0.286)
      Trips 1+miles = 3650.699(0.687)b
    Has participated in CBC1.916***(0.0706)0.170(0.107)
    Hunts birds0.0640(0.0965)−0.0767(0.181)
    Gender: female−0.169***(0.0509)−0.111(0.107)
    Relative to Omitted Category: 45-54 years
    Age: ≤24 years0.539***(0.0991)0.994**(0.424)
    Age: 25-34 years0.545***(0.0871)0.293(0.216)
    Age: 35-44 years0.360***(0.0894)0.325*(0.192)
    Age: 55-64 years−0.207*(0.110)−0.117(0.159)
    Age: ≥65 years−0.298**(0.134)0.146(0.201)
    Relative to Omitted Category: $50K-$75K
    Income: <$25K−0.0590(0.0853)−0.0751(0.254)
    Income: $25K-$50K−0.0192(0.0774)0.143(0.150)
    Income: $75K-$100K−0.0423(0.0867)−0.0150(0.158)
    Income: ≥$100K−0.0118(0.0784)0.143(0.138)
    Relative to Omitted Category: West
    Region: Northeast0.164**(0.0737)b
    Region: Midwest−0.0186(0.0754)b
    Region: South0.0552(0.0660)b
    Relative to Omitted Category: Full Time
    Empl. status: part time0.0409(0.0751)−0.148(0.188)
    Empl. status: looking for work−0.173(0.108)−0.819(0.650)
    Empl. status: unemployed−0.109(0.0805)−0.0110(0.217)
    Empl. status: retired−0.148(0.107)−0.325**(0.163)
    Relative to Omitted category: 4-Year College Degree
    Education: high school−0.0294(0.0754)0.693**(0.309)
    Education: some college−0.128*(0.0671)0.00246(0.170)
    Education: master’s degree0.269***(0.0826)0.259**(0.121)
    Education: doctoral degree0.165(0.124)0.00792(0.165)
    Ordered-Probit Thresholds
    Cut11.279***(0.113)−2.651***(0.288)
    Cut21.895***(0.116)−1.309***(0.279)
    Cut32.258***(0.119)−0.363(0.274)
    Cut42.690***(0.123)c
    Cut53.341***(0.132)c
    Observations4,161572
    Max. log likelihood−2,390.32−582.44
    • Note:The first model is a six-level model using the full qBus sample. The second is a four-level model for the subset of 572 eBird survey respondents with complete data for this least-restricted specification. Specifications with fewer explanatory variables can use more observations in the eBird data set. Other models to accommodate all of the patterns of missing variables in the eBird data are in the Appendixes.

    • a Too few missing values to identify this coefficient.

    • b These indicators are zero for all observations in the eBird data.

    • c There are only four levels of engagement in the eBird data.

    • * p < 0.05;

    • ** p < 0.01;

    • *** p < 0.001.

  • Table 3 Descriptive Statistics
    MeanStd. Dev.
    Dependent Variable (Consideration-Set-Radius, Lower Bound of Chosen Interval)
    Self-reported maximum radius (i.e., extent) of travel in miles83.2858.10
    Explanatory Variable
    Employment status: employed0.3730.484
    Income data available0.8040.397
    Income in 10k, if reported7.0115.891
    Gender: female0.5700.495
    Age: <45 years0.1710.377
    Age: >64 years0.3670.482
    Education: graduate school0.5050.500
    No interest: perching birds0.0570.233
    No interest: other game birds0.1110.314
    Selection-Correction Options
    Binary-probit IMR2.0351.212
    Adjusted ordered probit IMR1.6680.978
    Engagement propensity (demeaned
    using qBus mean)
    0.6571.261
    Observations1,081
    • Note: Variables are for the outcome model, elicited from the eBird member survey sample for n = 1,081 respondents who answered the question about maximum one-way distance for a birding day trip.

  • Table 4 Consideration-Set-Radius Models without and with Engagement-Intensity Weights and Either Sample-Selection Corrections or Interactions between All Regressors and Demeaned Ordered-Probit Selection Propensities
    (1)(2)(3)(4)(5)
    ModelNaiveWeights OnlyBinary Probit IMROrdered Probit IMRDemeaned Propensity
    Main Variable
    Empl. Status: employed−0.0351
    (0.0696)
    −0.0613
    (0.0973)
    −0.0872
    (0.0943)
    −0.0934
    (0.0936)
    −0.299*
    (0.154)
    Income data available−0.429***
    (0.127)
    −0.362**
    (0.180)
    −0.342*
    (0.176)
    −0.354**
    (0.175)
    −0.223
    (0.186)
    ln(Income in 10K, if reported)0.218***
    (0.0533)
    0.171**
    (0.0730)
    0.156**
    (0.0718)
    0.158**
    (0.0710)
    0.0628
    (0.0887)
    Gender: female−0.127**
    (0.0590)
    −0.123
    (0.0796)
    −0.0374
    (0.0787)
    −0.0410
    (0.0783)
    0.000944
    (0.106)
    Age: <45 years0.185**
    (0.0835)
    0.225**
    (0.112)
    0.167
    (0.106)
    0.137
    (0.108)
    −0.317
    (0.197)
    Age: >64 years−0.0489
    (0.0713)
    −0.114
    (0.0982)
    −0.0287
    (0.100)
    −0.103
    (0.0961)
    −0.0110
    (0.115)
    Education: graduate school0.113*
    (0.0598)
    0.0132
    (0.0751)
    −0.0649
    (0.0741)
    −0.0642
    (0.0748)
    −0.259***
    (0.0958)
    No interest: perching birds−0.811***
    (0.148)
    −0.862***
    (0.198)
    −0.847***
    (0.205)
    −0.829***
    (0.202)
    −0.749***
    (0.190)
    No interest: other game birds−0.634***
    (0.107)
    −0.660***
    (0.130)
    −0.577***
    (0.131)
    −0.578***
    (0.130)
    −0.549***
    (0.127)
    Constant4.066***
    (0.0972)
    4.170***
    (0.127)
    4.494***
    (0.135)
    4.403***
    (0.131)
    4.102***
    (0.140)
    Selection-Correction Strategy
    Binary-probit IMR−0.169***
    (0.0305)
    Ordered probit IMR−0.386***
    (0.0689)
    Interaction between Main Variables and Demeaned Engagement Propensity (Demeaned Using qBus Sample Mean)
    Empl. Status: employed × engagement prop.
    (demeaned)
    0.185
    (0.158)
    Income data avail. × engagement prop.
    (demeaned)
    −0.120
    (00756)
    ln(Income in 10K, if reported) × engagement prop.
    (demeaned)
    0.134*
    (0.0692)
    Gender: female × engagement prop.
    (demeaned)
    0.224*
    (0.117)
    Age: <45 years × engagement prop.
    (demeaned)
    −0.00451
    (0.0733)
    Age: >64 years × engagement prop.
    (demeaned)
    −0.0223
    (0.0607)
    Education: graduate school × engagement prop.
    (demeaned)
    0.0176
    (0.0551)
    No interest: perching birds × engagement prop.
    (demeaned)
    −0.231
    (0.175)
    No interest: other game birds × engagement prop.
    (demeaned)
    0.160
    (0.107)
    Constant × engagement prop.
    (demeaned)
    0.282***
    (0.0813)
    ln(σε)−0.0803***
    (0.0237)
    −0.0583*
    (0.0306)
    −0.0788**
    (0.0313)
    −0.0808***
    (0.0310)
    −0.105***
    (0.0310)
    Observations1,0811,0811,0811,0811,081
    Log likelihood−2,411.41−2,429.96−2,409.82−2,408.03−2,382.37
    AIC4,844.834,881.924,843.644,840.054,806.74
    BIC4,899.674,936.774,903.474,899.884,911.44
    Weighted?NoYesYesYesYes
    • Note: Standard errors are in parentheses. Dependent variable: log of maximum one-way distance willingly traveled on a typical birdwatching day trip.

    • * p< 0.05;

    • ** p< 0.01;

    • *** p< 0.001.