This program visualizes Twitter hashtag usage over time as a collection of circles, each corresponding to a hashtag. The size of the circles are proportional to the usage frequency of the hashtag at a point in time.
This program is written in Lua and uses the LOVE framework for graphics and user input. A python script was used to collect and format Twitter data. The Box2d physics engine was used to handle collisions between circles.
The following video shows the program in action. The dataset in this video consists of 120,000 tweets spanning a 6 hour period from 3pm to 9pm (PST) on October 15th, 2013.
Data is collected by using the Twitter Firehose which is offered by Twitter's Streaming API. Only tweets in the English language which contained at least one hashtag mention are collected. Relevant data is then serialized into chunks of data in the form of Lua tables. Data chunks have a naming convention of:
The data chunks are then fed into the program where the user is able to set the speed of playback.
A useful feature of this program is the ability for the user to select a hashtag bubble to view the defining words of a hashtag. Defining words are selected by using the term frequency-inverse document frequency (tf-idf) statistic method.
The following image displays a tf-idf visualization for the hashtag "#ENGvPOL". The size of the hashtag bubble is proportional to the relevance of the tf-idf statistic.
The following sample raw tf-idf data is from a dataset collected on December 5th, 2013.Click for a sample of raw tf-idf data
A notable event on this day was the death of Nelson Mandela. The following table shows the defining terms along with their tf-idf scores for the hashtag #RIPNelsonMandela.