Any business is obliged to understand clients — their needs, their opinions, their satisfaction with the product. In case of large web-based companies we need to analyse hundreds of thousands or even millions of opinions to different products, and simply searching for pre-defined “good” or “bad” words in the comments is not enough. With rise of machine learning, in particular, deep neural networks, sentiment analysis — the problem of understanding the emotional tone of a text has been solved with very high accuracy. In this article we want to show the best way to solve this problem today — with word representations, deep learning and GPUs, and show business cases, where sentiment analysis can be applied. Performance of current methods is amazing — the code, that you can launch in a minute, can give you a model with accuracy of 90%.
A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as “angry”, “sad”, and “happy”.
And basically that’s what we need. For example, given a service feedback “I’ve used your product XXX and I haven’t been disappointed, only the opposite!” we need to understand, that this client, and, maybe, thousands of others are satisfied with product XXX. Unfortunately, lot of rule-based or keyword extraction models for sentiment analysis will not find in this sentence any word as “happy”, “lucky”, “good” and may pay attention to “disappointed” or “haven’t”, which can lead to misclassification of this sentence to neutral or even negative tone. Modern machine learning solutions will help us to avoid these and others problems.
In 2017 to almost any problem that can be stated “classify my piece of text to some category” deep neural networks can be applied. In case of text, which can be treated as a sequence of words, recurrent neural networks are suited very well — they take word by word from a sentence and learn their structure to perform some particular task. More detailed review of them you can find in awesome Andrej Karpathy blog post.
Recurrent neural network are able to catch dependencies among words in different positions and understand language-level particularities like splitting words and their prefixes in German (“Wo hast du dein Handy her?”). Moreover, modern deep neural network architectures can be equipped with artificial attention module, that can help to learn on which part of sentence or text to concentrate to understand the text better — this concept was borrowed from computer vision community, were attention of part of the image is also very important topic.
Okay, we know that neural network can learn dependencies between words, but what about words as entities? How to understand, that “good” and “awesome” are positive, and “lagging” and “broken” are mostly about negative things? And how to understand “I had a broken chair, but I could fix it with your screwdrivers” that doesn’t have any really positive word, but has one negative — “broken”? To learn similarities between different words it’s became popular to use models, that can project every word to an element of vector space, where we can define concept of “similarity” as a distance between vectors. One of the basic, but very powerful solutions is word2vec. On the picture below you can see examples how he find similarities between concepts, more examples you can find here.
How to get up with all these neural networks and vectors? The pipeline looks like following:
- Gather a labeled dataset in your domain
- Train a word vector on all texts model to learn every word representation
- Define a recurrent neural network for classification
- Transform raw texts into sequences of word vectors and train a neural network.
model = Sequential()
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
Adapt to particular business case
The only one problem left — what if you don’t have a big enough labeled dataset or you don’t have it at all? The concept of transfer learning is going to help in this situation. The main idea is to take some external dataset (for example, something open-source about movies), learn a big neural network model on this dataset to learn main dependencies and then finish training a model with your small dataset — given main concepts of language and emotions from big open-source dataset (movies reviews) we can easily add new concepts from our area (food for example).
If we don’t have a dataset at all, it’s not a reason to give up. We still have our reviews and we can work with them. Let’s remember, that inputs to our recurrent neural network are word vectors. The idea you can apply in this case — to learn a word vector model on two datasets — the labelled one about movies and not-labelled one about food — this model can be a bit redundant, but it will understand both concepts — and after you can try to pass “mixed” word vectors into trained on movies neural network.
There are more sophisticated methods for unsupervised sentiment analysis and domain adaptation, but we omit them in this article.
Sentiment analysis can be applied in following applications:
- Customer review analysis and service evaluation (e-commerce, booking, services)
- Advanced A/B testing
- Marketing plan improvement based on large-scale feedback analysis
- Inform and make operational improvements or capital expenditures
- Recommender system performance enhancement
- Part of conversational system
- Social media monitoring and abusive content filtering
Analysis of customer opinions is a “must have” for any modern business. With sentiment analysis applied to different cases you can significantly improve metrics and understand further ways of developing your business with AI. In this article we described basic pipeline of sentiment analysis problem, showed some ways to overcome lack of tagged data problem and showed several important applications.
@2017 HPA | High Performance Analytics