Ei! Real-time Sentiment Analysis of Microblogging Online Social Network Streams       


 photoPaulo Cavalin photo

Ei! Real-time Sentiment Analysis of Microblogging Online Social Network Streams - overview

Overview and goals

From sport games to marketing campaigns, millions of people react and interact in Online Social Networks (OSN), especially on microblogging services like Twitter. As people react to these events by expressing their opinions and feelings through messages, a large stream of data is generated per minute. Capturing human behavior from this data stream during the event’s lifespan can lead to valuable insights about the event, such as what are people thinking about a given player or product or how one can improve a given service.

To conduct this task successfully one has to cope with both technological and scientific challenges. Often a large network and a large volume of data has to be processed in a fraction of minutes (or even seconds), which requires an appropriate infra-structure to host the system. Also, the algorithms to be developed need to be efficient enough not to waste resources. Also, these algorithms must conduct the desired analyses very precisely to produce meaningful reports. For this reason, this project is very multidisciplinary, involving Cognitive Science, High Performance Computing, Natural Language Processing, Data Mining, Machine Learning, Data Visualization, among others.

Ei! at the 2013 FIFA Confederations Cup

Ei! was implemented for real-time sentiment analysis of streaming Twitter data during the games of the Brazilian squad at the 2013 FIFA Confederations Cup. The main goal was to analyze the sentiment of Brazilian Twitter users about Brazil's national soccer team. This included their sentiment about each player, the coach and the team itself.

For this system, we have implemented a Machine Learning-based classifier to classify the polarity of each tweet during the game. For some meaningful time windows, such as first and second half and the interval, the system provided a report of the most frequent topics, terms and their co-occurrences during the interval, and the corresponding volume of messages classified with positive, negative or neutral polarity. These results were made available to both an online app and a nation-wide television network.

The architecture was designed on the IBM InfoSphere Streams Plattform, for large availability and scalability, using the Streams Processing Language (SPL), and was run on the IBM Smart Cloud Enterprise service. Efficient algorithms were developed to process Twitter texts and deal with their high variability with minimum latency.

In the figure below we demonstrate one of the reports provided by this system. This figure contains the volume of tweets for the Brazil versus Spain game, the competition's final. The highest volume peak, with about 125 thousand tweets in a 5-minute interval, was reached during the 19:39-19:44 interval, after Brazil's defender David Luiz incredibly saved a goal score from Spain. From this interval we could observe that the Twitter users reacted in accordance to what was happening in the game being broadcasted on the TV. David Luiz's defense has been one of the most acclaimed plays of the entire competition. During this game the system was able to handle 1.5 million of tweets.


The results provided by the system can be further exported to different visualizations, such as word clouds and polarity plots.


Beyond soccer games

The system developed for the confederations cup can be applied to many other domains. From fashion businesses to manufacturers, retailers, service providers, anyone can make better use of social networks analysis for the same purpose: to know what people think and reach them more efficiently.