The largest electronic collection of texts in the Crimean Tatar language has been created. In Ukraine, the National Corpus of the Crimean Tatar Language (NCCTL) has been launched, representing the most extensive electronic collection of texts in Crimean Tatar covering various genres and historical epochs. The electronic database will contribute to the development of philological research and projects using the Crimean Tatar language, as reported by the Ministry of Reintegration press service.
"The corpus serves as a comprehensive tool for language research and a foundation for implementing the Crimean Tatar language in operating systems, online translators, and spelling check programs. It is a practical tool for linguists, students, and developers who will create systems and projects using the Crimean Tatar language," noted the Ministry of Reintegration.
The creation of the NCCTL was initiated by the Ministry of Reintegration as part of the implementation of the Strategy for the Development of the Crimean Tatar Language for 2022–2032. The project is implemented by the QIRI’M Young public organization with the support of the Swiss-Ukrainian EGAP Program, executed by the East Europe Foundation, and Kyiv National Taras Shevchenko University.
The work on creating the corpus took a year and involved around 30 participants from various parts of Ukraine and the world. Over 900 materials, including literary and scientific works and periodicals, were analyzed in the process.
"This project is intended to be a significant step in preserving and developing the Crimean Tatar language. With the NCCTL database, new electronic dictionaries can be created, as well as programs for correction and machine translation of texts in the Crimean Tatar language. Such developments will contribute to the popularization of the language in both everyday life and in scientific and literary spheres. Additionally, the linguistic base of the NCCTL will expand the possibilities of the Crimean Tatar language in international technical and educational forums," as stated by the National Corpus of the Crimean Tatar Language.
Thanks to the National Corpus of the Crimean Tatar Language, users can:
- Explore the context of word usage.
- Find archaisms/neologisms.
- Investigate the most common and rare collocations.
- Analyze the language features of a specific author, and much more.
Furthermore, the NCCTL is a powerful tool for lexicography development. It can be used:
- To search for the most illustrative examples of word usage when creating explanatory dictionaries.
- To analyze word usage over different epochs when creating historical dictionaries.
- For the automatic identification and processing of language by machine translation systems and spell-check programs.