Collector Reporting Bias
Eric Olson1
“Big Data” is quickly becoming a central theme of both concern and hope for archaeology’s digital future (see Huggett 2020; McCoy 2020; VanValkenburgh and Dufton 2020). Many archaeological reports, discoveries, and collections are being digitized (Olson 2017; Olson et al. 2021), allowing researchers an opportunity to analyze data from the comfort of their home computer. Projectile points, specifically, are “superabundant” (Shott 2020:245) and diagnostic of broad segments of time (Pitblado and Shott 2015; Shott 2008) which are useful to diachronic study of mobility (Mullet 2009; Nolan 2014; Seeman et al. 2020). As more artifacts are photographed, an increasing number of different analytical techniques can be applied to these data. Applications such as TPSdig (Rohlf 2015) and AGMT3-D (Herzlinger and Grosman 2018) have greatly improved the accessibility of geometric morphometric analysis and capturing linear measurements and angles from photographs.
The Central Ohio Archaeological Digitization Survey (COADS) demonstrates the analytical value of private collections, and the power of these new data capturing techniques (Olson et al. 2021). The caveat, however, is in the quality of the raw data. Context is everything, and that cannot be captured with an aimless snap of the camera shutter. Photographs need to have color, high resolution, provenience data (usually in a separate spreadsheet or embedded in the file name), and a scale. Then there is the issue of parallax, or the distortion of shape because of the angle from which the photograph was taken.
COADS factors these issues into the project design and provides a rough “baseline” dataset of the representative distributions of projectile point types through time and space. Over 16,000 projectile points and lithic tools were documented as part of COADS, with the aim of documenting as representative as possible samples of projectile points in the region. The same considerations cannot be said of digitized collections, captured by different researchers, self-reported by collectors, and taken with different equipment under different conditions.
The following study compares the frequency distributions of projectile points from COADS (Olson et al. 2021), Seeman et al. (2020) and 1282 points data mined from various online digitized sources. All three of these datasets represent projectile points from private collections. However, the distinctions between the datasets are how the data were captured. The dataset compiled for this study was mined from online digitized archives (Ohio Memory, Ohio State University’s Knowledge Bank), auction websites (eBay, Rowlands Relics, Estatesales.net and .com), and a limited number of private collections photographed by the author. The compilation of data was a multi-year process, with the general aim of adding data to datasets such as COADS or complimenting other projectile point datasets. What began as data mining slowly turned into a study of collector and market preferences, and the biases of private collecting in Ohio projectile point type distributions. Many professional archaeologists already have an anecdotal understanding of what gets bought and sold, what gets displayed, photographed, or generally disseminated between collectors. However, this study provides a quantitative breakdown of these biases.