Metadata (i.e., data describing about data) of digital objects plays an important role in digital libraries and archives, and thus its quality needs to be maintained well. However, as digital objects evolve over time, their associated metadata evolves as well, causing a consistency issue. Since various functionalities of applications containing digital objects (e.g., digital library, public image repository) are based on metadata, evolving metadata directly affects the quality of such applications. To make matters worse, modern data applications are often large-scale (having millions of digital objects) and are constructed by software agents or crawlers (thus often having automatically populated and erroneous metadata). In such an environment, it is challenging to quickly and accurately identify evolving metadata and fix them (if needed) while applications keep running. Despite the importance and implications of the problem, the conventional solutions have been very limited. Most of existing metadata-related approaches either focus on the model and semantics of metadata, or simply keep authority file of some sort for evolving metadata, and never fully exploit its potential usage from the system point of view. On the other hand, the question that we raise in this paper is "when millions of digital objects and their metadata are given, (1) how to quickly identify evolving metadata in various context? and (2) once the evolving metadata are identified, how to incorporate them into the system?" The significance of this paper is that we investigate scalable algorithmic solution toward the identification of evolving metadata and emphasize the role of "systems" for maintenance, and argue that "systems" must keep track of metadata changes pro-actively, and leverage on the learned knowledge in their various services.
All Science Journal Classification (ASJC) codes
- Library and Information Sciences