This paper proposes and describes a computational system for the automatic analysis of thematic structure, as defined in Systemic Functional Linguistics, in written English. The system takes an English text as input and produces as output an analysis of the thematic structure of each sentence in the text. The system is evaluated using data from The Wall Street Journal section of the Penn Treebank (Marcus et al. 1993) and the British Academic Written English corpus (Gardner & Nesi 2013). An experiment using these data shows that the system achieves a high degree of reliability in regard to both identifying theme-rheme boundaries and determining several of the linguistic properties of the identified themes, including syntactic nodes, theme function, markedness, mood types, and theme roles. To illustrate how the system is used, we describe an example application designed to compare collections of novice and expert academic writing in terms of thematic structure.
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language