Two sides of a coin: Separating personal communication and public dissemination accounts in Twitter

Peifeng Yin, Nilam Ram, Wang Chien Lee, Conrad Tucker, Shashank Khandelwal, Marcel Salathé

Research output: Contribution to journalConference article

8 Scopus citations

Abstract

There are millions of accounts in Twitter. In this paper, we categorize twitter accounts into two types, namely Personal Communication Account (PCA) and Public Dissemination Account (PDA). PCAs are accounts operated by individuals and are used to express that individual's thoughts and feelings. PDAs, on the other hand, refer to accounts owned by non-individuals such as companies, governments, etc. Generally, Tweets in PDA (i) disseminate a specific type of information (e.g., job openings, shopping deals, car accidents) rather than sharing an individual's personal life; and (ii) may be produced by non-human entities (e.g., bots). We aim to develop techniques for identifying PDAs so as to (i) facilitate social scientists to reduce "noise" in their study of human behaviors, and (ii) to index them for potential recommendation to users looking for specific types of information. Through analysis, we find these two types of accounts follow different temporal, spatial and textual patterns. Accordingly we develop probabilistic models based on these features to identify PDAs. We also conduct a series of experiments to evaluate those algorithms for cleaning the Twitter data stream.

Original languageEnglish (US)
Pages (from-to)163-175
Number of pages13
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8443 LNAI
Issue numberPART 1
DOIs
StatePublished - Jan 1 2014
Event18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014 - Tainan, Taiwan, Province of China
Duration: May 13 2014May 16 2014

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this