This section of the project is focused on the sentiment analysis performed on the tweets themselves. The program was first used to pull and analyze Tweets, so I could get a better sense of how to clean the tweets so TextBlob can perform accurate analysis. The tweet text was cleaned from the csv file and then passed to the function below, which performs standard sentiment matching using tokenization and a lexicon.
This code is pretty simple but it is important to recognize how it works
This process involves a couple of steps:
- First, get a large dictionary, or lexicon, as it is commonly known, of words and give them a polarity, from [-1,1], where -1 is really bad and 1 is really good.
- Then, each word in the tweet is matched to one in the dictionary and given that default lexicon value and catagorized based on the type of word and the sentence, ex. verb, subject, etc.
- Lastly, the Textblob comes up with a multiplier for each word based on this context and combines them to come up with a single polarity of the tweet, and a confidence level for this estimate.
The code I used is below:
sentiment.py
import textblob
import tweepy
import pandas
import sys
from textblob import TextBlob
consumer_key = "zgxU0sbLkEB93I3I3wLEDjRok"
consumer_secret = "nmV0Q8bNsNsFlKffVKbNy20vX2IAWNtzLbcHpNsZaoYAysgxfI"
access_token = "961470586827440128-EFWGLphaPXOUQkjCZNjzcdgXfmtTYcy"
access_token_secret = "TZFVLqznwo6lgITQlVk6T9lxIKaAimV7WShXVnYItnFYL"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret,)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
colnames=['date', 'text', 'followers']
df = pandas.read_csv('tweetdata/'+sys.argv[1]+'tweets.csv',encoding='latin-1', names=colnames, header=None)
df['polarity'] = 0.0000
df['sentiment_confidence'] = 0.0000
for index,row in df.iterrows():
analysis = TextBlob(df['text'][index])
sentiment, confidence = analysis.sentiment
df.at[index,'polarity'] = sentiment
df.at[index,'sentiment_confidence'] = confidence
df.to_csv('sentimentdata/'+sys.argv[1]+'sentiment.csv')