The paper deals with the problem of building a Chinese corpus as a specialized search system for extracting terms from texts in the field of teaching Chinese as a foreign language. The corpus also serves for building a terminology database. This process of building the corpus is implemented in 3 main steps: selection of Chinese texts in the field of teaching Chinese as a foreign language for building a corpus; segmentation and POS (Part-of-speech) tagging of text words and providing basic linguistic information; extracting terms from the corpus and compiling a list of terms.
|