Geotagging     

links

Geotagging - overview


What is the Geotagger?

The Geotagger leverages machine learning algorithms to assign a geographical location to a piece of text. This service takes text data as input, and outputs the most probable location from a discrete set of pre-defined locations, such as metropolitan cities or countries. Geotagger enables data partition by location. As a result, it makes location-based applications feasible (e.g., local event detection, regional sentiment analysis) and avoids dealing with massive irrelevant data.


For instance, "i'm heading to the MCG this arvo to watch the cricket can't wait to see the aussie middle order collapse... *sigh*" is assigned to Melbourne, Victoria, Australia. The text doesn't contain gazetted terms such as "Melbourne", but the geotagger is able to geolocate the text on basis of location indicative words like "aussie", "MCG" and "arvo".


These location indicative words are automatically learnt from a large collection of Twitter data. The tagger is tuned for social media data, but it is also applicable to other general text domain.

Accuracy Statistics:
Test CaseCity-level accuracyCountry-level accuracy
For a single English tweet 21.9% 68.2%
For a given twitter user
(200 recent english tweets in the timeline)
40.6% 90.1%

 

Published papers:

Text-Based Twitter User Geolocation Prediction, in Journal of Artificial Intelligence, 2014.
Identifying Diseases, Drugs and Symptoms in Twitter, MedInfo 2015
Investigating Public Health Surveillance using Twitter, BioNLP 2015

One filed patent:

DYNAMIC MODELING OF GEOSPATIAL WORDS IN SOCIAL MEDIA, Bo HAN, Christopher Butler, Jennifer Lai, 2015.

Projects using the Geotagger:

Geotagger service is successfully being used in Project SMART, an IBM Research social media analytics project, to geolocate tweets without user enabled geolocation. It has also been used for the Australian Crisis Tracker project and MedTweetsproject from IBM Research Australia. Press/Links to projects using Geotagger:
SMART

Australian Crisis Tracker