On efficient Viterbi decoding for hidden semi-Markov models
Ritendra Datta, Jianying Hu, et al.
ICPR 2008
In this paper, we present strategies to incorporate long context information directly during the first pass decoding and also for the second pass lattice re-scoring in speech recognition systems. Long-span language models that capture complex syntactic and/or semantic information are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive increase in the size of the sentence-hypotheses search space. Typically, n-gram language models are used in the first pass to produce N-best lists, which are then re-scored using long-span models. Such a pipeline produces biased first pass output, resulting in sub-optimal performance during re-scoring. In this paper we show that computationally tractable variational approximations of the long-span and complex language models are a better choice than the standard n-gram model for the first pass decoding and also for lattice re-scoring. © 2012 Elsevier B.V. All rights reserved.
Ritendra Datta, Jianying Hu, et al.
ICPR 2008
Hans-Werner Fink, Heinz Schmid, et al.
Journal of the Optical Society of America A: Optics and Image Science, and Vision
Arnon Amir, Michael Lindenbaum
IEEE Transactions on Pattern Analysis and Machine Intelligence
Yaniv Altshuler, Vladimir Yanovski, et al.
ICARA 2009