Smarter Natural Resources - Natural Resources Data-driven Analytics


Data-driven decision making has become a standard practice in many areas and the use of data mining to extract useful information from vast amounts of data has benefited e-commerce, internet search and internet marketing. And the application of machine learning algorithms to help predict unknown outcomes using limited information has become fundamental for recommendation systems, natural language processing, search engines, and computer vision systems, among many others.

Natural resources companies generate massive amounts of data in many forms - primarily as byproducts of daily business activities (data about extraction of known resources) or byproducts of new resource exploration. Additional data from technical literature and Internet sources can be turned into insights to help decision makers and companies improve natural resources discovery and exploration.

The current availability of huge computational power and effective and efficient data mining and machine learning technologies make it possible the construction of systems that can explore the large volume of data generated by natural resources companies. On the other hand, the construction of such intelligent systems for natural resources area has many technical challenges. For example, one challenge is to construct algorithms that can deal with diverse data types: structured, unstructured, geospatial, temporal, image, numerical and textual data. Another challenge is to construct scalable machine learning algorithms that can learn from very large datasets, as well as to develop algorithms that can learn with limited supervision.

The main research interests of the Natural Resources Data-driven analytics focus area are:

  • To apply existing data mining and machine learning techniques to structured and unstructured complex data from natural resources industries
  • To create novel machine learning algorithms that can work with diverse data types: structured, unstructured, geospatial, temporal, image, numerical and textual data
  • To develop scalable machine learning algorithms that can work with very large amount of data from natural resources industries
  • To develop algorithms that can learn with small set of examples, as well as with missing data

Impact and Benefits

Many of the primary natural resources business processes can be impacted by converting collected data into useful information. Some of the main benefits for exploration and extraction processes are:

  • Exploration: data-driven approaches can help in the quantity and quality prediction of a new resource location. This information is extremely useful to justify the high infrastructural costs of committing to a particular resource location
  • Production or Exploitation: Using data-driven analytics, one can use historical data to better decide which kind of technologies or strategies are more suitable to a particular resource location

Exemplary Projects:

  • Anomaly detection in sensor networks used in agriculture, mining and oil & gas areas
  • Condition-based maintenance applied to equipment in oil & gas, mining, and agriculture
  • Expand the Deep QA technology (Watson Computer) using data from the oil & gas field (e.g., tech reports, papers, books) to build a system that can help reservoir engineers to make better decisions
  • In agriculture, given information about the whether, type of land, futures contract, etc, one could use machine learning algorithms to predict the best crop for a particular region

Selected Publications:

Ana Cristina Bicharra Garcia, Cristiana Bentes, Rafael Heitor C. de Melo, Bianca Zadrozny, Thadeu J. P. Penna. Sensor data analysis for equipment monitoring. Knowledge and Information Systems. 28(2): 333-364 (2011

Rafael B. Pereira, Alexandre Plastino, Bianca Zadrozny, Luiz Henrique de Campos Merschmann, Alex Alves Freitas. Lazy attribute selection: Choosing attributes at classification time. Intelligent Data Analysis 15(5): 715-732 (2011

Rafael B. Pereira, Alexandre Plastino, Bianca Zadrozny, Luiz Henrique de Campos Merschmann, Alex Alves Freitas. Improving Lazy Attribute Selection. JIDM 2(3): 447-462 (2011)

Alina Beygelzimer, John Langford, Bianca Zadrozny. Tutorial summary: Reductions in machine learning. ICML 2009: 172

Bianca Zadrozny, Gisele L. Pappa, Wagner Meira Jr., Marcos André Gonçalves, Leonardo C. da Rocha, Thiago Salles. Exploiting contexts to deal with uncertainty in classification. KDD Workshop on Knowledge Discovery from Uncertain Data 2009: 19-22

Claudia Perlich, Saharon Rosset, Bianca Zadrozny. Modeling Quantiles. Encyclopedia of Data Warehousing and Mining 2009: 1324-1329

Claudia Perlich, Saharon Rosset, Richard D. Lawrence, Bianca Zadrozny. High-quantile modeling for customer wallet estimation and other applications. KDD 2007: 977-985

Saharon Rosset, Claudia Perlich, Bianca Zadrozny. Ranking-based evaluation of regression models. Knowledge and Information Systems. 12(3): 331-353 (2007)

Naoki Abe, Bianca Zadrozny, John Langford. Outlier detection by active learning. KDD 2006: 504-509

Bianca Zadrozny. Learning and evaluating classifiers under sample selection bias. ICML 2004

Cícero N. dos Santos and Ruy L. Milidiú. Entropy Guided Transformation Learning: Algorithms and Applications. 1. ed. Londres: Springer, 2012. v.1. 78p.

Eraldo R. Fernandes, Cícero N. dos Santos and Ruy L. Milidiú. Latent Structure Perceptron with Feature Induction for Unrestricted Coreference Resolution. Proceedings of the Sixteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, South Korea, 2012

Cícero N. dos Santos and Davi Carvalho. Rule and Tree Ensembles for Unrestricted Coreference Resolution. Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task. Portland, Oregon, p. 51-55, 2011.

Alex L. Ramos and Cícero N. dos Santos. Computer network intrusion detection using committees of classification algorithms. Proceedings of SBSeg'2011, Brasília, Brazil, 2011. (in portuguese)

Cícero N. dos Santos, Ruy L. Milidiú, Carlos E. M. Crestana and Eraldo R. Fernandes. ETL Ensembles for Chunking, NER and SRL. Proceedings of CICLing 2010, Iasi, Romania, p. 100-112, 2010.

Ruy L. Milidiú, Leandro Alvim and Cícero N. dos Santos. Daily Volume Forecasting Using High-Frequency Predictors. Proceedings of IASTED'2010, Innsbruck, Austria, 2010.

Ruy L. Milidiú, Cícero N. dos Santos and Julio C. Duarte. Portuguese Corpus-Based Learning using ETL. Journal of the Brazilian Computer Society (ISSN 0104-6500), Number 4, Vol. 14, pp. 17–27, December 2008.

Ruy L. Milidiú, Cícero N. dos Santos and Julio C. Duarte. Phrase Chunking using Entropy Guided Transformation Learning. Proceedings of ACL 2008, Columbus, Ohio, USA, 2008.

Cícero N. dos Santos, Ruy L. Milidiú and Raúl P. Renteria. Portuguese Part-of-Speech Tagging using Entropy Guided Transfor-mation Learning. Proceedings of PROPOR 2008, Aveiro, Portugal, 2008.

T. Lappas, M. Vieira, D. Gunopulos, V. Tsotras. On The Spatiotemporal Burstiness of Terms. In the Proc. of the VLDB Endowment (PVLDB), 2012

M. Vieira, H. Razente, M. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina Jr, V. Tsotras. DivDB: A System for Diversifying Query Results. In the 37th Int'l Conf. on Very Large Data Bases (VLDB) [Demo], 2011

M. Vieira, H. Razente, M. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina Jr, V. Tsotras. On Query Result Diversification. In the 27th IEEE Int'l Conf. on Data Engineering (ICDE), 2011

M. Vieira, E. Frias-Martinez, P. Bakalov, V. Frias-Martinez, V. Tsotras. Querying Spatio-Temporal Patterns in Mobile Phone-Call Databases. In the 11th IEEE Int'l Conf. on Mobile Data Management (MDM), 2010

M. Vieira, E. Frias-Martinez, N. Oliver, V. Frias-Martinez. Characterizing Dense Urban Areas from Mobile Phone-Call Data: Discovery and Social Dynamics. In the 2nd IEEE Int'l Conf. on Social Computing (SocialCom), 2010

M. Vieira, P. Bakalov, V. Tsotras. Querying Trajectories Using Flexible Patterns. In the 13th Int'l Conf. on Extending Database Technology (EDBT), 2010

M. Vieira, P. Bakalov, V. Tsotras. On-Line Discovery of Flock Patterns in Spatio-Temporal Data. In the 17th ACM Int'l Symp. on Advances in Geographic Information Systems (SIGSPATIAL), 2009