My research is concerned with a variety of data quality, information extraction (NLP) and ETL (for data warehousing) problems, towards answering keyword, natural language, or SQL/SPARQL queries based on high-quality (trustable) structured data. Please find my publications at DBLP. (For Brazilians, here's my CNPq Lattes: http://lattes.cnpq.br/3537386106760841).
In my Ph.D. at LNCC/Brazil, supervised by Prof. Fabio Porto, I have developed a technique, named Y-DB, to extract synthetic scientific datasets from competing mathematical models (seen as alternative hypotheses, and given in MathML), and then generate a probabilistic relational database whose structure is defined automatically with correctness guarantees. The ``magic'' here is to unveil the implicit structure in a mathematical model and translate it automatically onto a probabilistic relational model (so-called U-relations). As part of that research, I have fixed the status of a classical AI algorithm on causal reasoning proposed by Nobel-laureate Herbert Simon. In my postdoc at the University of Michigan/Ann Arbor, supervised by Prof. H. V. Jagadish, I have developed a Bayesian smoothing algorithm, Bsmooth, for the disambiguation of search and natural language queries issued against a relational database, by building on information available from the database schema and a user-interaction log.
Currently, besides my previous work on the management of large-scale hypotheses and models, an important part of my research is aligned with the broad field of automatic construction of knowledge bases (so-called AKBC), which relies on natural language processing and generally available information sources like Wikidata and Wordnet. My focus is on data quality, specifically new abstractions of integrity constraints and their scalable checking. This is particularly important, as KB's are ever more used to answer web search queries in the public domain. For me, keeping KBs under a rigorous, scientific notion of integrity is key for the explanation of query results in a data-driven world, hence also an important topic of AI ethics.
At IBM Research Brazil, I am also working on `cognitive computing' applications to serve, mostly, the natural resources industry (agriculture, oil and gas). I am part of both the data-driven analytics research group led by Bianca Zadrozny, and the software engineering research group led by Renato Cerqueira.
I earned my Ph.D. in Computational Modeling (with focus on Data Science) in January 2015, as mentioned, from the National Laboratory for Scientific Computing (LNCC) in Brazil. Recently my thesis has been nominated to the 2016 nation-wide edition of 'Prêmio CAPES de Teses'. During my Ph.D. I've been awarded an IBM PhD Fellowship (2013-14), and a FAPERJ 'Bolsa Nota 10' PhD Distinguished Scholarship (2013-15). I also hold a M.Sc. and a B.Sc. in Computer Science from the Federal University of Espirito Santo (UFES) in Brazil.