photo Douglas R. (Doug) Burdick photo Mauricio A. Hernandez photo photo Yunyao Li photo Lucian Popa photo

Midas - HIL

HIL is a high-level scripting language that combines, within a single declarative framework, multiple types of operations that are needed in entity integration flows. Such operations include: 1) mapping of the data (e.g., from the various extracted facts to a target model or any intermediate schema), 2) resolving and merging references to the same real-world entity (i.e., entity resolution or entity linking), 3) fusion and transformation of data (including temporal analysis, that is, the ability transforming a collection of unprocessed but time-stamped facts into objects with clearly defined timeline or history).

The entity resolution fragment of HIL can express both deterministic matching conditions (returning true/false), as well as probabilistic matching algorithms (where various matching functions return numerical scores which can then be aggregated or used for thresholding). In addition, HIL includes constructs for blocking to deal with large data size, constructs for expressing 1:1 or 1:N matching constraints, and more advanced constructs to express various policies to disambiguate between conflicting links. 

HIL allows the ability to partially define the types or schema of the entities that are manipulated, and uses polymorphic type inference so that users are shielded from having to specify or know all the attributes of the data.

A salient design feature of HIL is that it shields its users from the lower-level details of the particular runtimes. HIL is designed in such a way that it decouples the high-level specification of the entity resolution and integration operations from the actual runtime operations. Once expressed in HIL, entity resolution and integration algorithms can then be compiled into various runtimes for distributed computation, including Jaql on Map/Reduce and, more recently, Scalding (a Scala-based framework on top of Cascading), and Spark.

More complete information about the HIL language, including the language documentation, and its use in the Social Master Data Management can be found here: HIL in InfoSphere Master Data Management