Analyzing Twitter Feed Data for Subjectivity and Polarity

Recently, I became aware of the TextBlob library. This library is built on top of the Natural Language Toolkit. This library allows you to analyze text for “part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more” (per the documentation page). The features that really interested me were the ability to analyze text for both its subjectivity as well as its polarity (how positive or negative the text is). For example, a statement such as “The Earth is round” is a neutral statement (neither positive or negative) and is entirely objective. On the other hand, a statement such as “I love you” or “I hate you” is neither objective or neutral (one is very positive, one is very negative). The library allows you to analyze these statements to provide numerical basis for how neutral and objective these statements are.

I came up with the idea to utilize this library to analyze individual’s Twitter feeds. This provides a method to know, on average, how objective or subjective and how positive or negative an individual is when posting statuses on Twitter. So, I wrote a python script using Tweepy to analyze a user’s individual subjectivity and neutrality. The source code itself can be found on my github here.

The script is pretty simple. You pass in a consumer key and secret (given by registering for an API key on the Twitter dev pages,) as well as a user name to analyze, a maximum tweet count to look at, and whether or not to include retweets. The script will look through all of the tweets of the selected user (up to 800, which is limited by Twitter’s own API) and will show the average polarity and subjectivity. It also shows the most positive and negative tweets, as well as the most subjective and objective tweets. A tweet with a subjectivity value of 0.0 is considered to be very objective, while a tweet with a subjectivity of 1.0 is considered to be very subjective. Similarly, the polarity ranges from -1.0, very negative, to 1.0, very positive. I have included an example of the output from analyzing @macklemore‘s tweets.

Average Polarity of Tweets: 0.151289301423
Average Subjectivity of Tweets: 0.347877407229
Most Negative Tweet: I picked the wrong night to drink 2 energy drinks.... Can't sleep. Can't wait. LETS GO!!!!!!! #seahawks http://t.co/YHfWiM7t4L
Most Positive Tweet: Leonardo blocking out the haters. He can't see you. My best friend. #pals #animals&me #fox #vintageGǪ http://t.co/LEDfOuEJTW
Most Objective Tweet: Seahawks plane headed to NYC.....TURN UP!!!!!!! #superbowl #seahawks http://t.co/fM6JpfnT7H
Most Subjective Tweet: SEATTLE... I'm pumped to announce that for our Key Arena show on the 11th we're bringing out Sir Mix-AGǪ http://t.co/v3XMmvD6hT

As you can see, Macklemore is slightly positive in all of the tweets he has made. Further, his tweets lean towards being objective, rather than subjective. You can read his most positive and negative, and most objective and subjective tweets for fun, if you’d like. If the results seem potentially inaccurate, it’s because they use the built-in language data for analyses. TextBlob can be trained with custom input to improve results. In this instance, the most negative tweet returned by the script doesn’t actually seem all that negative. Further training could alleviate this.

This script could be expanded to further use cases, such as analyzing the average polarity and subjectivity directed towards a user, or the average subjectivity and polarity of a hashtag. It’s fairly simple at the moment, but the results themselves are interesting nonetheless. Please, feel free to analyze your own Twitter feeds and discover something about how you tweet!

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>