114 lines
3 KiB
Text
Executable file
114 lines
3 KiB
Text
Executable file
#############################################################################
|
|
## Copyright (c) 1996, Carnegie Mellon University, Cambridge University,
|
|
## Ronald Rosenfeld and Philip Clarkson
|
|
#############################################################################
|
|
=============================================================================
|
|
=============== This file was produced by the CMU-Cambridge ===============
|
|
=============== Statistical Language Modeling Toolkit ===============
|
|
=============================================================================
|
|
This is a 1-gram language model, based on a vocabulary of 71 words,
|
|
which begins "<s>", "</s>", "a"...
|
|
This is an OPEN-vocabulary model (type 1)
|
|
(OOVs were mapped to UNK, which is treated as any other vocabulary word)
|
|
Good-Turing discounting was applied.
|
|
1-gram frequency of frequency : 16
|
|
1-gram discounting ratios : 0.89
|
|
This file is in the ARPA-standard format introduced by Doug Paul.
|
|
|
|
p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3)
|
|
else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
|
|
else p(wd3|w2)
|
|
|
|
p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
|
|
else bo_wt_1(wd1)*p_1(wd2)
|
|
|
|
All probs and back-off weights (bo_wt) are given in log10 form.
|
|
|
|
Data formats:
|
|
|
|
Beginning of data mark: \data\
|
|
ngram 1=nr # number of 1-grams
|
|
|
|
\1-grams:
|
|
p_1 wd_1
|
|
|
|
end of data mark: \end\
|
|
|
|
\data\
|
|
ngram 1=72
|
|
|
|
\1-grams:
|
|
-3.3174 <UNK> 0.0000
|
|
-99.0000 <s> 0.0000
|
|
-3.3754 </s> 0.0000
|
|
-3.3174 a 0.0000
|
|
-3.3174 april 0.0000
|
|
-3.3174 area 0.0000
|
|
-2.6642 august 0.0000
|
|
-3.3174 code 0.0000
|
|
-3.3174 december 0.0000
|
|
-1.1980 eight 0.0000
|
|
-3.3174 eighteen 0.0000
|
|
-3.3174 eighteenth 0.0000
|
|
-3.3174 eighth 0.0000
|
|
-2.6642 eighty 0.0000
|
|
-2.6642 eleven 0.0000
|
|
-3.3174 eleventh 0.0000
|
|
-1.3420 enter 0.0000
|
|
-2.9652 february 0.0000
|
|
-2.6642 fifteen 0.0000
|
|
-2.9652 fifteenth 0.0000
|
|
-3.3174 fifth 0.0000
|
|
-1.8513 fifty 0.0000
|
|
-2.7891 first 0.0000
|
|
-1.0621 five 0.0000
|
|
-2.0621 forty 0.0000
|
|
-1.1659 four 0.0000
|
|
-2.4211 fourteen 0.0000
|
|
-3.3174 fourth 0.0000
|
|
-2.2662 go 0.0000
|
|
-2.1871 help 0.0000
|
|
-2.9652 hundred 0.0000
|
|
-2.9652 january 0.0000
|
|
-2.7891 july 0.0000
|
|
-2.9652 june 0.0000
|
|
-2.7891 march 0.0000
|
|
-2.7891 may 0.0000
|
|
-1.4274 nine 0.0000
|
|
-1.7891 nineteen 0.0000
|
|
-2.6642 ninety 0.0000
|
|
-2.9652 ninth 0.0000
|
|
-2.0621 no 0.0000
|
|
-2.7891 october 0.0000
|
|
-2.9652 of 0.0000
|
|
-1.5180 oh 0.0000
|
|
-0.8842 one 0.0000
|
|
-2.0358 repeat 0.0000
|
|
-2.9652 second 0.0000
|
|
-2.5673 september 0.0000
|
|
-1.2619 seven 0.0000
|
|
-2.7891 seventeen 0.0000
|
|
-3.3174 seventh 0.0000
|
|
-2.1871 seventy 0.0000
|
|
-1.1556 six 0.0000
|
|
-2.7891 sixteen 0.0000
|
|
-3.3174 sixteenth 0.0000
|
|
-3.3174 sixth 0.0000
|
|
-1.4954 sixty 0.0000
|
|
-2.2248 start 0.0000
|
|
-2.0621 stop 0.0000
|
|
-2.2248 ten 0.0000
|
|
-2.7891 third 0.0000
|
|
-2.7891 thirtieth 0.0000
|
|
-2.0110 thirty 0.0000
|
|
-2.7891 thousand 0.0000
|
|
-1.2368 three 0.0000
|
|
-2.9652 twelfth 0.0000
|
|
-2.3631 twelve 0.0000
|
|
-3.3174 twentieth 0.0000
|
|
-1.7891 twenty 0.0000
|
|
-0.9199 two 0.0000
|
|
-1.9440 yes 0.0000
|
|
-1.8513 zero 0.0000
|
|
|
|
\end\
|