Information models for managing plot records and multiple taxonomies in vegetation databanks
PEET R.K.
Department of Biology CB#3280, University
of North Carolina, Chapel Hill, NC 27599-3280, USA.
Email: peet@unc.edu.
Community ecology is on the brink of a dramatic transformation that will be made possible by the emergence of the new field ofecoinformatics. Just as availability of massive amounts of gene frequency data resulted in the new field of bioinformatics, which has dramatically altered molecular biology, so too can we expect community ecology to be reshaped by the forthcoming availability of massive amounts of information on species co-occurrences, site attributes, and species attributes. Understanding of ecological communities is likely to increase greatly as large numbers of plot records become available. For ecoinformatics to realize its potential, public archives are needed for plot storage and preservation, plot access and identification, and plot documentation in literature and databases. To maximize the availability and utility of such data, ecologists need to develop and conform to standard data structures and exchange formats.
I illustrate key elements in the future information infrastructure of vegetation science with examples from the ongoing North American VegBank project. Components include a national vegetation plots database to serve the data on which the US national vegetation classification will be based, a taxonomic database that contains both species concepts and names so as to resolve ambiguities implicit in contemporary plant nomenclature, and a vegetation classification database. As part of the VegBank project we have developed a general data model for co-occurrence records to facilitate archiving, recovery, and sharing of vegetation data. In addition, databases of organism and community taxa need to be linked. However, taxonomic standards for both organisms and communities vary with time, place, and investigator such that taxa of organisms and communities frequently have multiple names and those same names frequently have been applied to multiple taxon concepts. When we combine diverse data into a single database we need to reconcile those different standards. The traditional solution of using standardized lists fails to allow effective dataset integration because (1) online lists are periodically updated, thereby presenting a moving target, (2) one name can be used for multiple taxonomic concepts and one concept can be labeled with multiple names, and (3) different parties have different perspectives on acceptable names and the meanings associated with them. We have developed a general data model for semantic mediation of types of organisms and communities to complement our taxon co-occurrence data model.
There is enormous potential for exchanging and merging data sets among databanks. At the same time large data sets of heterogeneous origin pose challenges, both technical and scientific. An international data exchange format for vegetation plots will be critical. The VegBank enterprise provides a staring point. IAVS should assume its natural leadership role and guide the development of an international data exchange format, and perhaps a uniform data model, so as to facilitate database queries across many independent plot archives. This might best be realized by establishment of a new IAVS Working Group for data standards to review and build on standards developed by the VegBank program and other parallel efforts. The working group should specifically address data exchange formats, requirements for extended queries, data ownership and intellectual property rights, confidentiality issues, and management of multiple taxonomies for organisms and communities.