72 lines
2 KiB
Text
72 lines
2 KiB
Text
|
|
#############################################################################
|
||
|
|
## Copyright (c) 1996, Carnegie Mellon University, Cambridge University,
|
||
|
|
## Ronald Rosenfeld and Philip Clarkson
|
||
|
|
#############################################################################
|
||
|
|
=============================================================================
|
||
|
|
=============== This file was produced by the CMU-Cambridge ===============
|
||
|
|
=============== Statistical Language Modeling Toolkit ===============
|
||
|
|
=============================================================================
|
||
|
|
This is a 1-gram language model, based on a vocabulary of 28 words,
|
||
|
|
which begins "<s>", "a", "b"...
|
||
|
|
This is an OPEN-vocabulary model (type 1)
|
||
|
|
(OOVs were mapped to UNK, which is treated as any other vocabulary word)
|
||
|
|
Good-Turing discounting was applied.
|
||
|
|
1-gram frequency of frequency : 4
|
||
|
|
1-gram discounting ratios : 0.67
|
||
|
|
This file is in the ARPA-standard format introduced by Doug Paul.
|
||
|
|
|
||
|
|
p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3)
|
||
|
|
else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
|
||
|
|
else p(wd3|w2)
|
||
|
|
|
||
|
|
p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
|
||
|
|
else bo_wt_1(wd1)*p_1(wd2)
|
||
|
|
|
||
|
|
All probs and back-off weights (bo_wt) are given in log10 form.
|
||
|
|
|
||
|
|
Data formats:
|
||
|
|
|
||
|
|
Beginning of data mark: \data\
|
||
|
|
ngram 1=nr # number of 1-grams
|
||
|
|
|
||
|
|
\1-grams:
|
||
|
|
p_1 wd_1
|
||
|
|
|
||
|
|
end of data mark: \end\
|
||
|
|
|
||
|
|
\data\
|
||
|
|
ngram 1=29
|
||
|
|
|
||
|
|
\1-grams:
|
||
|
|
-3.5087 <UNK> 0.0000
|
||
|
|
-98.9999 <s> 0.0000
|
||
|
|
-1.1422 a 0.0000
|
||
|
|
-1.4032 b 0.0000
|
||
|
|
-1.6336 c 0.0000
|
||
|
|
-1.4937 d 0.0000
|
||
|
|
-1.0272 e 0.0000
|
||
|
|
-1.9176 f 0.0000
|
||
|
|
-1.4518 g 0.0000
|
||
|
|
-1.2873 h 0.0000
|
||
|
|
-1.1834 i 0.0000
|
||
|
|
-2.1864 j 0.0000
|
||
|
|
-1.8011 k 0.0000
|
||
|
|
-1.2951 l 0.0000
|
||
|
|
-1.5922 m 0.0000
|
||
|
|
-1.2186 n 0.0000
|
||
|
|
-1.2288 o 0.0000
|
||
|
|
-1.4461 p 0.0000
|
||
|
|
-3.5087 q 0.0000
|
||
|
|
-1.0538 r 0.0000
|
||
|
|
-1.2288 s 0.0000
|
||
|
|
-1.0996 t 0.0000
|
||
|
|
-1.4349 u 0.0000
|
||
|
|
-1.8141 v 0.0000
|
||
|
|
-1.6891 w 0.0000
|
||
|
|
-3.5087 x 0.0000
|
||
|
|
-1.7885 y 0.0000
|
||
|
|
-2.4295 z 0.0000
|
||
|
|
-3.5087 </s> 0.0000
|
||
|
|
|
||
|
|
\end\
|