############################################################################# ## Copyright (c) 1996, Carnegie Mellon University, Cambridge University, ## Ronald Rosenfeld and Philip Clarkson ############################################################################# ============================================================================= =============== This file was produced by the CMU-Cambridge =============== =============== Statistical Language Modeling Toolkit =============== ============================================================================= This is a 1-gram language model, based on a vocabulary of 28 words, which begins "", "a", "b"... This is an OPEN-vocabulary model (type 1) (OOVs were mapped to UNK, which is treated as any other vocabulary word) Good-Turing discounting was applied. 1-gram frequency of frequency : 4 1-gram discounting ratios : 0.67 This file is in the ARPA-standard format introduced by Doug Paul. p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3) else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2) else p(wd3|w2) p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2) else bo_wt_1(wd1)*p_1(wd2) All probs and back-off weights (bo_wt) are given in log10 form. Data formats: Beginning of data mark: \data\ ngram 1=nr # number of 1-grams \1-grams: p_1 wd_1 end of data mark: \end\ \data\ ngram 1=29 \1-grams: -3.5087 0.0000 -98.9999 0.0000 -1.1422 a 0.0000 -1.4032 b 0.0000 -1.6336 c 0.0000 -1.4937 d 0.0000 -1.0272 e 0.0000 -1.9176 f 0.0000 -1.4518 g 0.0000 -1.2873 h 0.0000 -1.1834 i 0.0000 -2.1864 j 0.0000 -1.8011 k 0.0000 -1.2951 l 0.0000 -1.5922 m 0.0000 -1.2186 n 0.0000 -1.2288 o 0.0000 -1.4461 p 0.0000 -3.5087 q 0.0000 -1.0538 r 0.0000 -1.2288 s 0.0000 -1.0996 t 0.0000 -1.4349 u 0.0000 -1.8141 v 0.0000 -1.6891 w 0.0000 -3.5087 x 0.0000 -1.7885 y 0.0000 -2.4295 z 0.0000 -3.5087 0.0000 \end\