HMM-BiMM: Hidden Markov Model-based word segmentation via improved Bi-directional Maximal Matching algorithm(SCI)
.A development team with both strength and technology.
项目简介:Combining with the Hidden Markov Model and Bi-directional Maximal Matching algorithm, a new word segmentation algorithm, HMM-BiMM, is presented. In terms of the sub-dictionary matching, it can implement a fast word segmentation. After segmenting the text by the Bidirectional Maximal Matching (BiMM), the remaining text connected by the remaining single words will be segmented again by the strategy of the Hidden Markov Model (HMM). By the HMM, this algorithm can realize the dictionary dynamic update by the new segmentation words and improve the segmentation accuracy accordingly. Compared with five representative algorithms in the real-world clinical text (symptom), we show that the HMM-BiMM algorithm achieves the highest efficiency and accuracy for symptom text segmentation. In detail, this algorithm has around 3% in precision and 70% in running time improved to the BiMM.