TY - GEN
T1 - WILSON
T2 - Advances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021
AU - Liao, Yiming
AU - Wang, Shuguang
AU - Lee, Dongwon
N1 - Funding Information:
The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. This work was in part supported by NSF awards #1742702, #1820609, #1909702, #1915801, and #1934782. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsors. Last but not least, we truly appreciate Everdeen Mason, Sophie Ho and their journalist team at the Washington Post for the valuable feedback and support.
Publisher Copyright:
© 2021 Copyright held by the owner/author(s).
PY - 2021
Y1 - 2021
N2 - Major news media frequently uses the method of news timeline summarization to summarize important daily news over major events across the timeline. While various sophisticated methods have been proposed to generate both concise and complete news timelines, in practice, generating timelines from a large number of news articles not only faces quality issues but also encounters the challenge of generation speed, which all existing methods have neglected. To mitigate these issues, in this work, we propose to speed up timeline generation by dividing the whole summarization task into sub-summarization tasks, adopting the “divide and conquer" philosophy: (1) date selection and (2) text summarization. Furthermore, since existing methods in news timeline summarization pay less attention to the date selection than text summarization, in this paper, we re-examine the role of date selection in news timeline summarization and demonstrate that accurate date selection “alone" can significantly contribute to the task of news timeline summarization. Leveraging on the explicit date selection, then, we propose a simple yet fast and effective news timeline summarization method, named WILSON (neWs tImeLine SummarizatiON). Experimented on two widely used timeline summarization benchmark datasets, timeline17 and crisis, empirical evaluation shows that WILSON outperforms state-of-the-art approaches in both speed and ROUGE scores, significantly improving ROUGE-2 F1 scores by 9.5%∼17.7% and reducing generation time by two orders of magnitude. A further user study with professional journalists also validates the superiority of WILSON. Finally, we build a real-time news timeline summarization system and achieve encouraging results on an industrial-level corpus.
AB - Major news media frequently uses the method of news timeline summarization to summarize important daily news over major events across the timeline. While various sophisticated methods have been proposed to generate both concise and complete news timelines, in practice, generating timelines from a large number of news articles not only faces quality issues but also encounters the challenge of generation speed, which all existing methods have neglected. To mitigate these issues, in this work, we propose to speed up timeline generation by dividing the whole summarization task into sub-summarization tasks, adopting the “divide and conquer" philosophy: (1) date selection and (2) text summarization. Furthermore, since existing methods in news timeline summarization pay less attention to the date selection than text summarization, in this paper, we re-examine the role of date selection in news timeline summarization and demonstrate that accurate date selection “alone" can significantly contribute to the task of news timeline summarization. Leveraging on the explicit date selection, then, we propose a simple yet fast and effective news timeline summarization method, named WILSON (neWs tImeLine SummarizatiON). Experimented on two widely used timeline summarization benchmark datasets, timeline17 and crisis, empirical evaluation shows that WILSON outperforms state-of-the-art approaches in both speed and ROUGE scores, significantly improving ROUGE-2 F1 scores by 9.5%∼17.7% and reducing generation time by two orders of magnitude. A further user study with professional journalists also validates the superiority of WILSON. Finally, we build a real-time news timeline summarization system and achieve encouraging results on an industrial-level corpus.
UR - http://www.scopus.com/inward/record.url?scp=85113712012&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113712012&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2021.73
DO - 10.5441/002/edbt.2021.73
M3 - Conference contribution
AN - SCOPUS:85113712012
T3 - Advances in Database Technology - EDBT
SP - 635
EP - 645
BT - Advances in Database Technology - EDBT 2021
A2 - Velegrakis, Yannis
A2 - Velegrakis, Yannis
A2 - Zeinalipour, Demetris
A2 - Chrysanthis, Panos K.
A2 - Chrysanthis, Panos K.
A2 - Guerra, Francesco
PB - OpenProceedings.org
Y2 - 23 March 2021 through 26 March 2021
ER -