TY - JOUR
T1 - An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
AU - Wilf, Peter
AU - Wing, Scott L.
AU - Meyer, Herbert W.
AU - Rose, Jacob A.
AU - Saha, Rohit
AU - Serre, Thomas
AU - Cúneo, N. Rubén
AU - Donovan, Michael P.
AU - Erwin, Diane M.
AU - Gandolfo, María A.
AU - González-Akre, Erika
AU - Herrera, Fabiany
AU - Hu, Shusheng
AU - Iglesias, Ari
AU - Johnson, Kirk R.
AU - Karim, Talia S.
AU - Zou, Xiaoyu
N1 - Funding Information:
Funding for this work came from NSF grants EAR-1925755, EAR-1925481, and EAR-1925552 (PW, TS, MAG, and others); DEB-1556666 and DEB-1556136 (PW, MAG, and others); and the National Park Service (HWM).
Funding Information:
Many researchers, staff, students, and volunteers contributed to the development of the collections aggregated here over many years. These include the investigators named in the manuscript, the collectors and field crews who made the primary collections around the world, and the collections staff, technicians, and volunteers at the numerous involved herbaria and fossil repositories. We further acknowledge the following, with apologies for any missing names. Assistance in the original assembly of the NCLC-W cleared-leaf collection, including performance of most specimen selection, registration, and leaf clearing and mounting: Sandy Wilson, Robyn Burnham, Russell O’Connell, H. Meyer, and others. Databasing and photography of NCLC-W at NMNH: Dane Miller, Ian Tom, Stephanie Bailey, and many volunteers at the Smithsonian Institution. Photography and curatorial support for NCLC-H at YPM: Whitney Barlow, Donna Beeson, Alyssa Cheung, Serra Vidinli Dedeoglu, Larry Gall, Michelle Garcia, Gabriela Gonzalez, William Guth, Zoe Kitchel, Linda Klise, Philip Kuchuk, Joanna Liu, Ivette Lopez, Steven Mordarski, Sally Palatto, Paul Pena, John Petrucelli, Ornella Rossi, Carl Russell, Jared Shayne, Harry Shyket, Robert Swerling, Cecilia Tenorio, Tim White. Sampling and imaging support for the Wing x-ray collection: Hazel Beehler, Keith Boi, Lynn Gillespie, Scott Krueger, David and Susan Rosen. Florissant fossil photography: Ashley Ferguson, Kelly Hattori, several other Geoscientist-in-the-Parks (GIP) interns, and Conni O’Connor. Additional fossil photography: Bárbara Cariglino, Cassi Knight, Lisa Merkhofer, Dana Royer. Cerrejón and Bogotá (SGC-ICP) support: Carlos Jaramillo and Mónica R. Carvalho. Additional database support: Sarah Allen, Sven Eberhardt, Ana Van Gulick, Jenny Kissell, Thao Nguyen, Edward Spagnuolo, Alysa Young. Museum curators and staff (other than the authors) who provided collections support and re-use permissions for images of fossils: Patricia Coorough Burke, Milwaukee Public Museum; Thomas Demere, San Diego Natural History Museum; Peta Hayes, Natural History Museum, London; Kathy Hollis and Jon Wingerath, NMNH; Ashley Klymiuk, Field Museum; Andrew Knoll and Michaela Schmull, Harvard University Herbaria; Kristen MacKenzie and Ian Miller, Denver Museum of Nature & Science; Steven Manchester and Hongshan Wang, University of Florida; Ruth O’Leary, American Museum of Natural History; Andrew Ross, National Museums Scotland. We thank Steven Manchester, Patrick Herendeen, Susanne Renner, one anonymous reviewer, and Editor Sandra Knapp for their helpful comments that improved the manuscript.
Publisher Copyright:
© 2021 Peter Wilf et al. All Rights Reserved.
PY - 2021
Y1 - 2021
N2 - Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic, represent extinct genera and species from extant families. Resources focused on leaf identification are remarkably scarce; however, the situation has improved due to the recent proliferation of digitized herbarium material, live-plant identification applications, and online collections of cleared and fossil leaf images. Nevertheless, the need remains for a specialized image dataset for comparative leaf architecture. We address this gap by assembling an open-access database of 30,252 images of vouchered leaf specimens vetted to family level, primarily of angiosperms, including 26,176 images of cleared and x-rayed leaves representing 354 families and 4,076 of fossil leaves from 48 families. The images maintain original resolution, have user-friendly filenames, and are vetted using APG and modern paleobotanical standards. The cleared and x-rayed leaves include the Jack A. Wolfe and Leo J. Hickey contributions to the National Cleared Leaf Collection and a collection of high-resolution scanned x-ray negatives, housed in the Division of Paleobotany, Department of Paleobiology, Smithsonian National Museum of Natural History, Washington D.C.; and the Daniel I. Axelrod Cleared Leaf Collection, housed at the University of California Museum of Paleontology, Berkeley. The fossil images include a sampling of Late Cretaceous to Eocene paleobotanical sites from the Western Hemisphere held at numerous institutions, especially from Florissant Fossil Beds National Monument (late Eocene, Colorado), as well as several other localities from the Late Cretaceous to Eocene of the Western USA and the early Paleogene of Colombia and southern Argentina. The dataset facilitates new research and education opportunities in paleobotany, comparative leaf architecture, systematics, and machine learning.
AB - Leaves are the most abundant and visible plant organ, both in the modern world and the fossil record. Identifying foliage to the correct plant family based on leaf architecture is a fundamental botanical skill that is also critical for isolated fossil leaves, which often, especially in the Cenozoic, represent extinct genera and species from extant families. Resources focused on leaf identification are remarkably scarce; however, the situation has improved due to the recent proliferation of digitized herbarium material, live-plant identification applications, and online collections of cleared and fossil leaf images. Nevertheless, the need remains for a specialized image dataset for comparative leaf architecture. We address this gap by assembling an open-access database of 30,252 images of vouchered leaf specimens vetted to family level, primarily of angiosperms, including 26,176 images of cleared and x-rayed leaves representing 354 families and 4,076 of fossil leaves from 48 families. The images maintain original resolution, have user-friendly filenames, and are vetted using APG and modern paleobotanical standards. The cleared and x-rayed leaves include the Jack A. Wolfe and Leo J. Hickey contributions to the National Cleared Leaf Collection and a collection of high-resolution scanned x-ray negatives, housed in the Division of Paleobotany, Department of Paleobiology, Smithsonian National Museum of Natural History, Washington D.C.; and the Daniel I. Axelrod Cleared Leaf Collection, housed at the University of California Museum of Paleontology, Berkeley. The fossil images include a sampling of Late Cretaceous to Eocene paleobotanical sites from the Western Hemisphere held at numerous institutions, especially from Florissant Fossil Beds National Monument (late Eocene, Colorado), as well as several other localities from the Late Cretaceous to Eocene of the Western USA and the early Paleogene of Colombia and southern Argentina. The dataset facilitates new research and education opportunities in paleobotany, comparative leaf architecture, systematics, and machine learning.
UR - http://www.scopus.com/inward/record.url?scp=85123891749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123891749&partnerID=8YFLogxK
U2 - 10.3897/PHYTOKEYS.187.72350
DO - 10.3897/PHYTOKEYS.187.72350
M3 - Article
AN - SCOPUS:85123891749
SN - 1314-2011
VL - 187
SP - 93
EP - 128
JO - PhytoKeys
JF - PhytoKeys
ER -