Previous methods of analyzing the substance of political attention have had to make several restrictive assumptions or been prohibitively costly when applied to large-scale political texts. Here, we describe a topic model for legislative speech, a statistical learning model that uses word choices to infer topical categories covered in a set of speeches and to identify the topic of specific speeches. Our method estimates, rather than assumes, the substance of topics, the keywords that identify topics, and the hierarchical nesting of topics. We use the topic model to examine the agenda in the U.S. Senate from 1997 to 2004. Using a new database of over 118,000 speeches (70,000,000 words) from the Congressional Record, our model reveals speech topic categories that are both distinctive and meaningfully interrelated and a richer view of democratic agenda dynamics than had previously been possible.
All Science Journal Classification (ASJC) codes
- Sociology and Political Science
- Political Science and International Relations