Opinion mining using ensemble text hidden Markov models for text classification

Mangi Kang, Jaelim Ahn, Kichun Lee

Research output: Contribution to journalArticleResearchpeer-review

16 Citations (Scopus)

Abstract

With the rapid growth of social media, text mining is extensively utilized in practical fields, and opinion mining, also known as sentiment analysis, plays an important role in analyzing opinion and sentiment in texts. Methods in opinion mining generally depend on a sentiment lexicon, which is a set of predefined key words that express sentiment. Opinion mining requires proper sentiment words to be extracted in advance and has difficulty classifying sentences that imply an opinion without using any sentiment key words. This paper presents a new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon. We sought to learn text patterns representing sentiment through ensemble TextHMMs. Our method defines hidden variables in TextHMMs by semantic cluster information in consideration of the co-occurrence of words, and thus calculates the sentiment orientation of sentences by fitted TextHMMs. To reflect diverse patterns, we applied an ensemble of TextHMM-based classifiers. In the experiments with a benchmark data set, we show that this method is superior to some existing methods and particularly has potential to classify implicit opinions. We also demonstrate the practicality of the proposed method in a real-life data set of online market reviews.

Original languageEnglish
Pages (from-to)218-227
Number of pages10
JournalExpert Systems with Applications
Volume94
DOIs
StatePublished - 2018 Mar 15

Fingerprint

Hidden Markov models
Classifiers
Semantics
Experiments

Keywords

  • Boosting
  • Clustering
  • Ensemble
  • Hidden Markov models
  • Opinion mining
  • Sentiment analysis

Cite this

@article{8fb18b1634c845b2a6334f790fad60ca,
title = "Opinion mining using ensemble text hidden Markov models for text classification",
abstract = "With the rapid growth of social media, text mining is extensively utilized in practical fields, and opinion mining, also known as sentiment analysis, plays an important role in analyzing opinion and sentiment in texts. Methods in opinion mining generally depend on a sentiment lexicon, which is a set of predefined key words that express sentiment. Opinion mining requires proper sentiment words to be extracted in advance and has difficulty classifying sentences that imply an opinion without using any sentiment key words. This paper presents a new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon. We sought to learn text patterns representing sentiment through ensemble TextHMMs. Our method defines hidden variables in TextHMMs by semantic cluster information in consideration of the co-occurrence of words, and thus calculates the sentiment orientation of sentences by fitted TextHMMs. To reflect diverse patterns, we applied an ensemble of TextHMM-based classifiers. In the experiments with a benchmark data set, we show that this method is superior to some existing methods and particularly has potential to classify implicit opinions. We also demonstrate the practicality of the proposed method in a real-life data set of online market reviews.",
keywords = "Boosting, Clustering, Ensemble, Hidden Markov models, Opinion mining, Sentiment analysis",
author = "Mangi Kang and Jaelim Ahn and Kichun Lee",
year = "2018",
month = "3",
day = "15",
doi = "10.1016/j.eswa.2017.07.019",
language = "English",
volume = "94",
pages = "218--227",
journal = "Expert Systems with Applications",
issn = "0957-4174",

}

Opinion mining using ensemble text hidden Markov models for text classification. / Kang, Mangi; Ahn, Jaelim; Lee, Kichun.

In: Expert Systems with Applications, Vol. 94, 15.03.2018, p. 218-227.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Opinion mining using ensemble text hidden Markov models for text classification

AU - Kang, Mangi

AU - Ahn, Jaelim

AU - Lee, Kichun

PY - 2018/3/15

Y1 - 2018/3/15

N2 - With the rapid growth of social media, text mining is extensively utilized in practical fields, and opinion mining, also known as sentiment analysis, plays an important role in analyzing opinion and sentiment in texts. Methods in opinion mining generally depend on a sentiment lexicon, which is a set of predefined key words that express sentiment. Opinion mining requires proper sentiment words to be extracted in advance and has difficulty classifying sentences that imply an opinion without using any sentiment key words. This paper presents a new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon. We sought to learn text patterns representing sentiment through ensemble TextHMMs. Our method defines hidden variables in TextHMMs by semantic cluster information in consideration of the co-occurrence of words, and thus calculates the sentiment orientation of sentences by fitted TextHMMs. To reflect diverse patterns, we applied an ensemble of TextHMM-based classifiers. In the experiments with a benchmark data set, we show that this method is superior to some existing methods and particularly has potential to classify implicit opinions. We also demonstrate the practicality of the proposed method in a real-life data set of online market reviews.

AB - With the rapid growth of social media, text mining is extensively utilized in practical fields, and opinion mining, also known as sentiment analysis, plays an important role in analyzing opinion and sentiment in texts. Methods in opinion mining generally depend on a sentiment lexicon, which is a set of predefined key words that express sentiment. Opinion mining requires proper sentiment words to be extracted in advance and has difficulty classifying sentences that imply an opinion without using any sentiment key words. This paper presents a new sentiment analysis method, based on text-based hidden Markov models (TextHMMs), for text classification that uses a sequence of words in training texts instead of a predefined sentiment lexicon. We sought to learn text patterns representing sentiment through ensemble TextHMMs. Our method defines hidden variables in TextHMMs by semantic cluster information in consideration of the co-occurrence of words, and thus calculates the sentiment orientation of sentences by fitted TextHMMs. To reflect diverse patterns, we applied an ensemble of TextHMM-based classifiers. In the experiments with a benchmark data set, we show that this method is superior to some existing methods and particularly has potential to classify implicit opinions. We also demonstrate the practicality of the proposed method in a real-life data set of online market reviews.

KW - Boosting

KW - Clustering

KW - Ensemble

KW - Hidden Markov models

KW - Opinion mining

KW - Sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=85026395526&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2017.07.019

DO - 10.1016/j.eswa.2017.07.019

M3 - Article

VL - 94

SP - 218

EP - 227

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -