Jey Han Lau  Jey Han Lau photo       

contact information

Research Scientist
Melbourne Research Laboratory, Melbourne, Australia
  +613dash9626dash6490

links

Professional Associations

Professional Associations:  Association for Computational Linguistics

profile


Jey Han obtained his PhD in Computer Science from the University Melbourne in 2013, with a PhD thesis on LDA topic models. Before joining IBM, he was a research associate in King's College London, working on a project with to develop stochastic models that represent the syntactic knowledge that all native speakers share.

 

Jey Han's general interest is in unsupervised learning, an area which develops algorithms to discover structure in languages with minimal or zero supervision. He has worked with applying these algorithms to variety of natural language problems, from discovering word meanings to detecting novel events in social media to predicting the well-formedness of a natural language sentence.

 

RESEARCH INTERESTS

  • Unsupervised Learning
  • Deep Learning
  • Language Models
  • Bayesian Graphical Models
  • Cognition and Linguistic Knowledge

 

SOFTWARE

  • Predominant Sense: Program for learning predominant senses, described in Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models.
  • On-line Topic Model: A Python implementation of the on-line topic model described in On-line Trend Analysis with Topic Models: #twitter trends detection topic model online.
  • HDP WSI: A Hierarchical Dirichlet Process (HDP) model for word sense induction, described in  Word Sense Induction for Novel Sense Detection.
  • Topic Interpretability: Evaluation of semantic interpretability of topics, described in Automatic Evaluation of Topic Coherence.
  • Acceptability Prediction: Unsupervised prediction of sentence acceptability, described inUnsupervised Prediction of Acceptability Judgements.
  • Topic Coherence Sensitivity: Improved scripts to compute topic coherence, described in The Sensitivity of Topic Coherence Evaluation to Topic Cardinality.
  • Doc2Vec: Python scripts to train and infer paragraph vectors, described in An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation.
  • Topically Driven Language Model: A neural language model driven by topical information, described in Topically Driven Neural Language Model.
  • DeepGeo - Twitter Geolocation Prediction: Described in End-to-end Network for Twitter Geolocation Prediction and Hashing.

 

STUDENTS

Masters

  • Steven Xu (MSc student; completed 2017; co-supervised with Timothy Baldwin at The University of Melbourne)
  • Shraey Bhatia (MSc student; completed 2017; co-supervised with Timothy Baldwin at The University of Melbourne)
  • Andrew Bennett (Msc student; completed 2016; co-supervised with Timothy Baldwin at The University of Melbourne)

PhD

  • Adel Foda (PhD student; current; co-supervised with Timothy Baldwin at The University of Melbourne)
  • Shraey Bhatia (PhD student; current; co-supervised with Timothy Baldwin at The University of Melbourne)