Sentiment Analysis Using the SentiWordNet Lexicon

Srishti Sharma
3 min readJun 30, 2021

--

Sentiment Analysis is the computational study of opinions, sentiments and emotions expressed in text. Earlier, most text information processing methods (e.g., web search, text mining) worked with factual information. In the past decade or so a shift has been observed and now with the rapid proliferation of websites on the Internet; there exists huge volumes of opinionated text in the form of online user posts on social networking sites or online product/hotel/restaurant or other reviews. World-over, people seek the help of online reviews in making decisions.

According to a study conducted by the Kesley Group with comScore, Inc. (NASDAQ: SCOR), a leader in measuring the digital world, more than three-quarters of review readers in nearly every category reported that reviews had a significant influence on their purchases, with hotels ranking the highest. Furthermore, as per this study, 97% of those surveyed who made a purchase based on an online review, found the review to be accurate. As such, researchers as well as organizations world over have realized the untapped potential of Sentiment Analysis.

The most basic approaches for Sentiment Analysis include Lexicon based (using a dictionary or thesaurus), Machine Learning based (using some ML algorithm) and a hybrid of these two. In this post, I am going to cover the most basic approach for Sentiment Analysis, i.e. Lexicon based Sentiment Analysis. The lexicon that I will be using is the SentiWordNet lexicon.

WordNet

  1. WordNet is a lexical database composing English words, grouped as synonyms into what is known as synsets.
  2. It is a freely available tool, which can be downloaded from its official website.
  3. While WordNet can be loosely termed as a Thesaurus, it is said to be more semantically accurate, since it stores synonyms of words put together is specific contexts.
  4. All the words are linked together by the ISA relationship (more commonly, Generalisation). For example, a car is a type of vehicle, just as a truck.

SentiWordNet

  1. SentiWordNet operates on the database provided by WordNet.
  2. The additional functionality that it provides is the measure of positivity, negativity or neutrality as is required for Sentiment Analysis.

Thus, every synset s is associated with a Pos(s): a positivity score Neg(s): a negativity score Obj(s): an objectivity (neutrality) score

Pos(s) + Neg(s) + Obj(s) = 1

The scores are very precise, pertaining to the word itself alongwith its context. All three scores range within the values [0,1].

The Algorithm

Step 1 Data preprocessing must be performed on the dataset, including removal of stopwords or punctuation marks. The sentences can be stored in Python dictionaries to make it easier to manipulate.

Step 2 While using SentiWordNet, it is important to find out the Parts of Speech for each word present in the dictionaries. Parts of Speech include -

Noun (n) Verb (v) Adjective (a) Adverb Preposition Conjunction Pronoun Interjection The first three are the most commonly used while reviewing sentiments of a sentence

Step 3 The polarity of each word, in context with POS tagging, is found out using the sentiwordnet functions — pos_score(), neg_score() and obj_score().

Example

Let us consider the sentence — I disliked the movie.

The overall sentiment of the above sentence is negative. The same can be demonstrated using the SentiWordNet functions described below.

The negativity score for the word dislike (the verb form) is 0.5. The remaining tokens, like I and the in the sentence will be filtered out during preprocessing Meanwhile, the positivity and negativity score of movie is zero, thus making its objectivity score 1.0. Thus, the overall sentiment of the sentence will be negative, since only positive and negative terms are used to calculate the sentiment.

In [20]:

import nltk
nltk.download('sentiwordnet')
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
list(swn.senti_synsets('slow'))
sentence='It was a really good day'
from nltk.tag import pos_tag
token = nltk.word_tokenize(sentence)
after_tagging = nltk.pos_tag(token)
print (token)
print (after_tagging)
def penn_to_wn(tag):
"""
Convert between the PennTreebank tags to simple Wordnet tags
"""
if tag.startswith('J'):
return wn.ADJ
elif tag.startswith('N'):
return wn.NOUN
elif tag.startswith('R'):
return wn.ADV
elif tag.startswith('V'):
return wn.VERB
return None
sentiment = 0.0
#tokens_count = 0
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
for word, tag in after_tagging:
wn_tag = penn_to_wn(tag)
if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
continue

lemma = lemmatizer.lemmatize(word, pos=wn_tag)
if not lemma:
continue

synsets = wn.synsets(lemma, pos=wn_tag)
if not synsets:
continue

# Take the first sense, the most common
synset = synsets[0]
swn_synset = swn.senti_synset(synset.name())
print(swn_synset)

sentiment += swn_synset.pos_score() - swn_synset.neg_score()
tokens_count += 1
print (sentiment)

--

--

Responses (1)