1. 程式人生 > >How to build your own Twitter Sentiment Analysis Tool

How to build your own Twitter Sentiment Analysis Tool

Twitter Sentiment AnalysisIn this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer language of your choice.

Update: The Datumbox Machine Learning Framework is now open-source and free to download. If you want to build a Sentiment Analysis classifier without hitting the API limitations, use the com.datumbox.applications.nlp.TextClassifier class.

You can find the complete PHP code of the Twitter Sentiment Analysis tool

on Github.

Social Media Monitoring & Sentiment Analysis

Social Media Monitoring is one of the hottest topics nowadays. As more and more companies use Social Media Marketing to promote their brands, it became necessary for them to be able to evaluate the effectiveness of their campaigns.

Building a Social Media Monitoring tool requires at least 2 modules: one that evaluates how many people are influenced by the campaign and one that finds out what people think about the brand.

Evaluating the generated buzz is usually performed by using various KPIs such as the number of followers/friends, the number of likes/shares/RTs per post and more complex ones such as the engagement rate, the response rate and other composite metrics. Quantifying the buzz is usually straightforward and can be performed by using basic statistics.

On the other hand, being able to evaluate the opinion of the users is not a trivial matter. Evaluating their opinions requires performing Sentiment Analysis, which is the task of identifying automatically the polarity, the subjectivity and the emotional states of particular document or sentence. It requires using Machine Learning and Natural Language Processing techniques and this is where most of the developers hit the wall when they try to build their own tools.

Thankfully Datumbox simplifies the process of using Machine Learning since it offers several API functions which allow you to build custom Social Media Monitoring tools in no time. Some of the services that are available to the API users are the Sentiment Analysis, the Twitter Sentiment Analysis and the Subjectivity Analysis API functions. In this article we will focus only on the Twitter Sentiment Analysis method, nevertheless as you can easily find out the rest of the functions work similarly.

Performing Sentiment Analysis on Twitter

Performing Sentiment Analysis on Twitter is trickier than doing it for large reviews. This is because the tweets are very short (only about 140 characters) and usually contain slangs, emoticons, hash tags and other twitter specific jargon. This is the reason why Datumbox offers a completely different classifier for performing Sentiment Analysis on Twitter.

Building the Sentiment Analysis tool

In order to build the Sentiment Analysis tool we will need 2 things: First of all be able to connect on Twitter and search for tweets that contain a particular keyword. Second evaluate the polarity (positive, negative or neutral) of the tweets based on their words. For the first task we will use the Twitter REST API 1.1v and for the second the Datumbox API 1.0v.

To speed up the development we will use 2 classes: The great PHP-Twitter-API client written by Tim Whitlock and the Datumbox PHP-API-Client offered by our service. As you will soon find out getting the tweets via the Twitter API is the most complicated task of this tutorial.

Create your own Twitter Application

Unfortunately Twitter made it more complicated for developers to use their API. In order to be able to search for particular tweets you must authenticate yourself by using OAuth protocol. Fortunately the API client of Tim takes care most of the tasks and enables a fast and easy integration. Still you are required to create a new Twitter application before using the library.

So go to Twitter Applications Console, login by using your credentials, click on “Create new Application” button and fill in the form to register a new app. When you create it select the application and go to the “Details” tab (the first tab) and on the bottom of the page click the “Create my access token” button. Once you do this, go to the “OAuth tool” tab and note down the values: Consumer Key, Consumer secret, Access token and Access token secret.

Get your Datumbox API key

To access the Datumbox API sign up for a free account and visit your API Credentials panel to get your API Key.

Developing the Twitter Sentiment Analysis class

All we need to do in order to develop the tool is write a TwitterSentimentAnalysis class which uses the Twitter and Datumbox API Clients to fetch the tweets and evaluate their polarity.

Below you can see the code along with the necessary comments.

<?php
class TwitterSentimentAnalysis {
    
    protected $datumbox_api_key; //Your Datumbox API Key. Get it from http://www.datumbox.com/apikeys/view/
    
    protected $consumer_key; //Your Twitter Consumer Key. Get it from https://dev.twitter.com/apps
    protected $consumer_secret; //Your Twitter Consumer Secret. Get it from https://dev.twitter.com/apps
    protected $access_key; //Your Twitter Access Key. Get it from https://dev.twitter.com/apps
    protected $access_secret; //Your Twitter Access Secret. Get it from https://dev.twitter.com/apps
    
    /**
    * The constructor of the class
    * 
    * @param string $datumbox_api_key   Your Datumbox API Key
    * @param string $consumer_key       Your Twitter Consumer Key
    * @param string $consumer_secret    Your Twitter Consumer Secret
    * @param string $access_key         Your Twitter Access Key
    * @param string $access_secret      Your Twitter Access Secret
    * 
    * @return TwitterSentimentAnalysis  
    */
    public function __construct($datumbox_api_key, $consumer_key, $consumer_secret, $access_key, $access_secret){
        $this->datumbox_api_key=$datumbox_api_key;
        
        $this->consumer_key=$consumer_key;
        $this->consumer_secret=$consumer_secret;
        $this->access_key=$access_key;
        $this->access_secret=$access_secret;
    }
    
    /**
    * This function fetches the twitter list and evaluates their sentiment
    * 
    * @param array $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets
    * 
    * @return array
    */
    public function sentimentAnalysis($twitterSearchParams) {
        $tweets=$this->getTweets($twitterSearchParams);
        
        return $this->findSentiment($tweets);
    }
    
    /**
    * Calls the Search/tweets method of the Twitter API for particular Twitter Search Parameters and returns the list of tweets that match the search criteria.
    * 
    * @param mixed $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets
    * 
    * @return array $tweets
    */
    protected function getTweets($twitterSearchParams) {
        $Client = new TwitterApiClient(); //Use the TwitterAPIClient
        $Client->set_oauth ($this->consumer_key, $this->consumer_secret, $this->access_key, $this->access_secret);

        $tweets = $Client->call('search/tweets', $twitterSearchParams, 'GET' ); //call the service and get the list of tweets
        
        unset($Client);
        
        return $tweets;
    }
    
    protected function findSentiment($tweets) {
        $DatumboxAPI = new DatumboxAPI($this->datumbox_api_key); //initialize the DatumboxAPI client
        
        $results=array();
        foreach($tweets['statuses'] as $tweet) { //foreach of the tweets that we received
            if(isset($tweet['metadata']['iso_language_code']) && $tweet['metadata']['iso_language_code']=='en') { //perform sentiment analysis only for the English Tweets
                $sentiment=$DatumboxAPI->TwitterSentimentAnalysis($tweet['text']); //call Datumbox service to get the sentiment
                
                if($sentiment!=false) { //if the sentiment is not false, the API call was successful.
                    $results[]=array( //add the tweet message in the results
                        'id'=>$tweet['id_str'],
                        'user'=>$tweet['user']['name'],
                        'text'=>$tweet['text'],
                        'url'=>'https://twitter.com/'.$tweet['user']['name'].'/status/'.$tweet['id_str'],
                        
                        'sentiment'=>$sentiment,
                    );
                }
            }
            
        }
        
        unset($tweets);
        unset($DatumboxAPI);
        
        return $results;
    }
}
?>

What we do is pass to the constructor the necessary keys for all the services. Then on the public sentimentAnalysis function we first call Twitter service in order to get the list of tweets which much our search parameters and then we call for each tweet the Datumbox service to get is polarity.

This is it! You ready to use this class to perform Sentiment Analysis on tweets and build your own Social Media Monitoring tool. You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.

Extra: Detailed Information about the Twitter Sentiment Analysis Classifier

This part is optional for those of you who are interested in learning how Datumbox’s Twitter Sentiment Analysis works.

In order to detect the Sentiment of the tweets we used our Machine Learning framework to build a classifier capable of detecting Positive, Negative and Neutral tweets. Our training set consisted of 1.2 million tweets evenly distributed across the 3 categories. We tokenized the tweets by extracting their bigrams and by taking into account the URLs, the hash tags, the usernames and the emoticons.

In order to select the best features we used several different algorithms and at the end we chose the Mutual Information. Finally after performing several tests with various models and configurations we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem (strangely enough Naïve Bayes beat SVM, Max Entropy and other classifiers which are known to perform usually better than NB). To evaluate the results we used the 10-fold cross-validation method and our best performing classifier achieves an accuracy of 83.26%.

Did you like the article? Please take a minute to share it on Twitter.