This paper presents the co-occurrence dictionary based on Thai phenomena. The theoretical background, the data structure, the dictionary development and word collocation information are described in details. At present, 75,000 word collocations have been added in the co-occurrence dictionary with the help of linguists who made much effort in encoding the linguistic information. Hopefully, the word collocation information presented in this paper will be the useful resources for the natural language processing studies, and second language acquisition.
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usage [1]. Recently, the significant of word collocations in language is interested by lexicographers. With the great endeavor, they try to study word combination systematically. Cognitive science, Psychology and Linguistics are merged for describing the lexical combination. As a result, the co-occurrence dictionary containing collocation information has been constructed for many languages.
Considering Thai language processing, one of the obstruction problems is the interpretation of the relationship of the word collocations, both syntactic and semantic relation. Even if, the linguists try to abstract the formal rules to explain how the words are combined, the result is still not satisfied. Another trend has been discussed, many lexicographers, linguists, and computerists agree to extract the cooccurrence information from the large corpus. The stochastic methodology is served as a tool to calculate the possibility of the co-occurrence.
Virach Sornlertlamvanich virach@nwg.nectec.or.th
Wantanee Phantachat
Linguistics and Knowledge Science Laboratory (LINKS)
National Electronics and Computer Technology Center
National Science and Technology Development Agency
Ministry of Science, Technology and Environment
22F Gypsum Metropolitan Tower
539/2 Sriayudhya Rd., Rajthevi
Bangkok 10400, Thailand
Center of the International Cooperation for Computerization (CICC)
Machine Translation System Laboratory
Fuji bldg. 5F, 30-9, Shiba 5-Chome, Minato-ku,
Tokyo 108, Japan
| Attachment | Size |
|---|---|
| thai-co-occurrence-dictionary.pdf | 183.98 KB |
Comments
Looking for meaning of FIXP
Although somewhat dated, I found this report only recently while working on a new Thai - English dictionary. I was looking for the meaning of "FIXP" which is used in this document but not defined or explained. If anyone knows the meaning of "FIXP" I would like to hear from you.
Post new comment