Workload studies of large-scale systems may help locating possible bottlenecks and improving performances. However, previous workload analysis for Web applications is typically focused on generic platforms, neglecting the unique characteristics exhibited in various domains of these applications. It is observed that different application domains have intrinsically heterogeneous characteristics, which have a direct impact on the system performance. In this study, we present an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns. Logs of a computer science literature digital library, CiteSeer, are collected and analyzed. We intentionally remove service details specific to CiteSeer. We believe our analysis is applicable to other systems with similar characteristics. While many of our findings are consistent with previous Web analysis, we discover several unique characteristics of scientific literature digital library workload. Furthermore, we discuss how to utilize our findings to improve system performance.
All Science Journal Classification (ASJC) codes
- Library and Information Sciences