Landscape Built of Words: Topic Modeling of an Online
May 31st, 2008
Submitted to the NSF Visualization Contest
Alex Dragulescu & Aaron Zinman
Advisor: Judith Donath
Sociable Media Group
MIT Media Lab
Twitter is a popular microblogging site in which over 1 million people post short answers to the question “what are you doing?” 3 million of these messages (“tweets”) are posted each day. How can we discover the patterns of interest as they ebb and flow with time? How can we make sense of individual inhabitants and their networks?
We use LDA to compress the semantic space into meaningful clusters of "topics" using statistical distributions of words in the corpus (5 million posts from 60K users).
Building the Landscape:
Multiscale dimensional reduction of the kullback-lieber distances of the topic distributions over the documents gives us a 2D landscape. Over time, topic mountains rise and fall according to the frequency of topic membership in the corpus.
These visualizations show the distribution of topics over a year of twitter. They help us understand and compare individual users. And see them in the context of their local social networks.