With the emergence and proliferation of Internet of Things (IoT), more and more sensor devices are deployed to instrument our planet to provide enterprises, cities and individuals with better sense of what’s happening in the physical world. This leads to the challenge of efficiently managing massive data generated by sensors. Compared with traditional applications such as business transactions and web applications, IoT applications introduce different workload on data management system. Traditional applications read data more than write, and the data types could be very diverse and complicated, so the existing data management systems, e.g. relational databases, have been optimized for queries and support of complex data types. While in IoT applications, the majority of data is sensor generated time series data, and there can be huge number of sensors pumping data continuously into the data management system. In addition, many IoT applications require random access to data in past several years, a typical example is looking into meter reading in same season in past years so as to predict the power usage of one home. This requires the data management system to host several years of data online instead of archiving most of them in warehouse while only keeping limited amount of data online. So different design principles should be adopted in designing data management system for IoT.
The Real-time Operational Database (RODB) project targets to develop a new type of database to support IoT applications, with the following features: - Single SQL interface to access both time-series data and relational data, including join of these two types of data - 100X better write throughput than relational database on a single server in supporting time series data - Efficient storage management with 10~100X compression ratio - Virtual table view to make data distribution transparent to user, with auto scaling - In database analytics on time-series data to simplify development and improve performance of analytical applications
The key technologies to enable the features above include: - Novel data structures and index for time series data - Data compression - Query optimization - In database pattern matching RODB has been applied to multiple IoT application scenarios, such as demand-response in smart grid, gas well condition monitoring, bridge monitoring, and seismic data collection. The picture below is the user interface of the gas well condition monitoring system built on RODB with its performance test result showing write throughput at 6 million data points per second.