Cloud Geospatial     

links

Raghu K. Ganti photo photo

Cloud Geospatial - overview


The IBM Cloud Geospatial team has been developing geospatial and spatiotemporal analytics for over a decade now. Our approach is developer centric and we created a library that enables a non-GIS expert to work with geospatial data in a seamless manner. The key features of this library are:

1. Solves a 40+ year old problem in geospatial analytics:
Traditional approaches to geospatial analytics rely on a planar spatial engine. Since it is impossible to map say both New York and London into the same plane (without incurring significant distortions), the world has been divided into about 8,848 piece-wise planar regions - an operation called geospatial projection is used to convert latitude/longitude from an ellipsoidal manifold to a x/y on a plane. 
 
An obvious issue with prior approach is that there is a (0, 0) in each of these planes, but one cannot meaningfully apply a spatial operation (e.g., Euclidean distance) between (x1, y1) and (x2, y2) if they belonged to two different planes. One can argue that it is uncommon to apply an operation across geometries in New York and London, but the problem remains even in the same geography. It is well known that no planar projection a simultaneously preserve even two of the following four properties: topology (contains, intersects), angles, distance, area. For instance, if one used a topology preserving planar projection, then the distortion in distance calculation is roughly cosine of the latitude - in New York, cos(lat = 40) is approximately 0.75 - this results in 1/0.75 or 33% distortion in distances!
 
Our approach completely eliminates projections! We perform all operations on ellipsoidal/spherical Earth - all geometries (such as points, lines, polygons) and geospatial operations are supported on the true manifold using accurate math constructs such as great arcs and oriented geometries. Hence it becomes easy to support geospatial analytics on platforms such as Watson studio (without going the projections gate keeper) or simply punting the problem of projections to the developer. Indeed this makes geospatial analytics more readily available to citizen data scientists nor incur additional costs arising from calls to a web server (e.g, a GIS server or a database spatial blade).
 
2. New spatial indexing algorithms that have replaced traditional the R*-tree algorithm:
The 1984 paper on R*-trees was the de facto standard for geospatial indexing for decades. The emergence of big data platforms (key-value stores, Apache HBase, object store) and cloud platforms (label based routing in Kubernetes, IBM streams) required new hash functions on geospatial data that can convert a geometry into label(s). This made it nearly impossible to port older spatial indexes (such as R* tree or grid index) into big data / cloud platforms. 
 
Our approach introduced elastic space filling curves which: (i) provably performs no slower than 2x of an R*tree and an average case that is very close to 1x of R*tree; (ii) a hash function to translate geometry into labels and thus enable geospatial operations on big data / cloud platforms; (iii) a telescopic (elastic) hash that naturally handles hot spots (think of NY city and upstate NY); (iv) hash function implementation using bit arithmetic (making it very fast in software and implementations on hardware - GPU/FPGA). The ease of integration of our approach is best exemplified by the anecdotal example: when column store were introduced it was way easier to integrate our indexing approach, than port the existing solution from a database row store; and our approach has produced significant performance improvements over the prior spatial indexing approach (sample workloads from large insurance companies which used to take minutes now takes just seconds). This approach has indeed become the new de facto standard for spatial indexing on big data platforms and cloud platforms in several offerings.
 
3. Spatial ML:
New geometry types (latitude segments) were introduced to make the output of hash function (in 2) a first class geometry on which all geospatial operations could be applied (in 1). Now, the output of the hash function on a geometry is also geometry! This essentially provided a uniform approach to both spatial operations and machine learning. Trajectories could now be hashed into a sequence - and when required they could be treated as symbols (e.g., one can apply string edit distance algorithms and perform trajectory clustering) and when required they could be treated as geometries (e.g., for spatial auto-regression). This really opened up the space for spatial machine learning allowing a suite of techniques (such as FP growth, PrefixSpan, frequent subsequence mining) in conjunction with statistical techniques (such as Hidden Markov Models and mapping of geospatial data into tensors for deep learning). 
 
In addition, the library supports machine learning on both raster and vector data. On raster data the capabilities include building roof top identification, damage assessment from wildfire, detecting water bodies and vegetation. The library also supports 2.5D raster data analysis (sometimes also referred to as RGB-D: Red, Blue, Green with Depth information) arising from Infrared and Lidar backscatter for size and angle estimation.