An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

阿新 • • 發佈：2018-12-28

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

E-commerce has revolutionized the way we shop. That phone you’ve been saving up to buy for months? It’s just a search and a few clicks away. Items are delivered within a matter of days (sometimes even the next day!).

For online retailers, there are no constraints related to inventory management or space management They can sell as many different products as they want. Brick and mortar stores can keep only a limited number of products due to the finite space they have available.

I remember when I used to place orders for books at my local bookstore, and it used to take over a week for the book to arrive. It seems like a story from the ancient times now!

But online shopping comes with its own caveats. One of the biggest challenges is verifying the authenticity of a product. Is it as good as advertised on the e-commerce site? Will the product last more than a year? Are the reviews given by other customers really true or are they false advertising? These are important questions customers need to ask before splurging their money.

This is a great place to experiment and apply Natural Language Processing (NLP) techniques. This article will help you understand the significance of harnessing online product reviews with the help of Topic Modeling.

Please go through the below articles in case you need a quick refresher on Topic Modeling:

Importance of Online Reviews
Problem Statement
Why Topic Modeling for this task?
Python Implementation
Other methods to leverage online reviews
What’s Next?

Importance of Online Reviews

A few days back, I took the e-commerce plunge and purchased a smartphone online. It was well within my budget, and it had an above decent rating of 4.5 out of 5.

Unfortunately, it turned out to be a bad decision as the battery backup was well below par. I didn’t go through the reviews of the product and made a hasty decision to buy it based on its ratings alone. And I know I’m not the only one out there who made this mistake!

Ratings alone do not give a complete picture of the products we wish to purchase, as I found to my detriment. So, as a precautionary measure, I always advise people to read a product’s reviews before deciding whether to buy it or not.

But then an interesting problem comes up. What if the number of reviews is in the hundreds or thousands? It’s just not feasible to go through all those reviews, right? And this is where natural language processing comes up trumps.

Setting the Problem Statement

A problem statement is the seed from which your analysis blooms. Therefore, it is really important to have a solid, clear and well-defined problem statement.

How we can analyze a large number of online reviews using Natural Language Processing (NLP)? Let’s define this problem.

Online product reviews are a great source of information for consumers. From the sellers’ point of view, online reviews can be used to gauge the consumers feedback on the products or services they are selling. However, since these online reviews are quite often overwhelming in terms of numbers and information, an intelligent system, capable of finding key insights (topics) from these reviews, will be of great help for both the consumers and the sellers. This system will serve two purposes:

Enable consumers to quickly extract the key topics covered by the reviews without having to go through all of them
Help the sellers/retailers get consumer feedback in the form of topics (extracted from the consumer reviews)

To solve this task, we will use the concept of Topic Modeling (LDA) on Amazon Automotive Review data. You can download it from this link. Similar datasets for other categories of products can be found here.

Why Should you use Topic Modeling for this task?

As the name suggests, Topic Modeling is a process to automatically identify topics present in a text object and to derive hidden patterns exhibited by a text corpus. Topic Models are very useful for multiple purposes, including:

Document clustering
Organizing large blocks of textual data
Information retrieval from unstructured text
Feature selection

A good topic model, when trained on some text about the stock market, should result in topics like “bid”, “trading”, “dividend”, “exchange”, etc. The below image illustrates how a typical topic model works:

In our case, instead of text documents, we have thousands of online product reviews for the items listed under the ‘Automotive’ category. Our aim here is to extract a certain number of groups of important words from the reviews. These groups of words are basically the topics which would help in ascertaining what the consumers are actually talking about in the reviews.

Python Implementation

In this section, we’ll power up our Jupyter notebooks (or any other IDE you use for Python!). Here we’ll work on the problem statement defined above to extract useful topics from our online reviews dataset using the concept of Latent Dirichlet Allocation (LDA).

Note: As I mentioned in the introduction, I highly recommend going through this article to understand what LDA is and how it works.

Let’s first load all the necessary libraries:

import nltk from nltk import FreqDist nltk.download('stopwords') # run this one time

import pandas as pd pd.set_option("display.max_colwidth", 200) import numpy as np import re import spacy import gensim from gensim import corpora

# libraries for visualization import pyLDAvis import pyLDAvis.gensim import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline

To import the data, first extract the data to your working directory and then use the read_json( ) function of pandas to read it into a pandas dataframe.

df = pd.read_json('Automotive_5.json', lines=True) df.head()

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

Table of Contents

Importance of Online Reviews

Setting the Problem Statement

Why Should you use Topic Modeling for this task?

Python Implementation

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

A Multivariate Time Series Guide to Forecasting and Modeling (with Python codes)

【論文:麥克風陣列增強】An alternative approach to linearly constrained adaptive beamforming

[論文學習]An Effective Approach for Mining Mobile User Habits：一種高效挖掘移動使用者習慣的方法

An intuitive introduction to support vector machines using R

Getting ahead with an Agile approach to growth

How to Scale Machine Learning Data From Scratch With Python

How to Visualize Time Series Residual Forecast Errors with Python

How To Train an Object Detection Classifier for Multiple Objects Using TensorFlow (GPU) on Windows10

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using…

文獻閱讀--A systematic approach to identify novel cancer drug targets using machine learning, inhibitor

How To Configure VMware fencing using fence

CF814B An express train to reveries

[React Intl] Format a Date Relative to the Current Date Using react-intl FormattedRelative

extract-text-webpack-plugin（you may need an appropriate loader to handle this file type）

A Bayesian Approach to Deep Neural Network Adaptation with Applications to Robust Automatic Speech Recognition

Can't connect to X11 window server using 'localhost:0.0' 的解決

android studio的模擬器waiting for target device to come online原因

03.敏捷估計與規劃——An Agile Approach筆記

python-An Informal Introduction to Python

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)

Table of Contents

Importance of Online Reviews

Setting the Problem Statement

Why Should you use Topic Modeling for this task?

Python Implementation

相關推薦