Support OJ 
Contribute Today
En
Support OJ Contribute Today
Search mobile
Culture

A Ukrainian LLM will be trained using Google’s Gemma model

A Ukrainian LLM will be trained using Google’s Gemma model
Article top vertical

The Ministry of Digital Transformation of Ukraine announces that, together with Kyivstar, it has selected a large language model to serve as the foundation for the national Ukrainian LLM — Gemma 3 from Google.

"We are building a Ukrainian LLM based on a publicly available open-source model. The main task in its development is to pre-train it on our unique data. When selecting the model, the focus was on how well it already processes texts in Ukrainian and its controllability during training. This will help minimize linguistic and ethical risks in our LLM," said Danylo Tsivok, Chief AI Officer of the Ministry of Digital Transformation and CEO of WINWIN AI Center of Excellence.

The chosen model will be adapted for the Ukrainian language, including:

  • Improving the Ukrainian tokenizer — this will enhance the model’s performance with Ukrainian texts, reduce errors when generating Ukrainian content, and optimize computational costs;
  • Further training the model on unique Ukrainian-language texts currently being collected by experts;
  • Creating benchmarks (tests) for more precise fine-tuning and future use of the model.

"The choice of Gemma ensures an optimal balance between performance and resources, as well as high-quality training for the Ukrainian LLM. The model supports over 140 languages, including Ukrainian, handles up to 128,000 tokens, offers multimodal capabilities, and has a flexible architecture that allows adaptation for various tasks," said Mykhailo Nestor, Director of Digital Product Development at Kyivstar.

Key advantages of selecting this model include:

  • Optimal performance-to-resource ratio — Gemma delivers high-quality output with efficient infrastructure requirements. It is one of the best open models in terms of size-to-quality ratio.
  • Multilingual support — the model already includes Ukrainian and can be easily adapted through further training.
  • Multimodality — the model can process and analyze not only text but also images.
  • Extended tokenizer — the large token capacity allows accurate and efficient text processing and fine-tuning. The model has a long context window of 128,000 tokens.
  • Multiple model sizes — enabling flexible selection of the model size for specific applications.
  • Proven success — Gemma has been successfully used to create Ukrainian LLMs, including Lapa LLM and MamayLM.

"It is a great honor that the Ministry of Digital Transformation of Ukraine and Kyivstar chose Gemma as the foundation for the national Ukrainian large language model (LLM). This decision highlights Gemma’s strategic value, offering an optimal balance of performance and resources, as well as strong multilingual support. Building on Gemma’s success as the base for leading Ukrainian LLMs, the Ministry and partners are committed to continuing to support this key initiative aimed at advancing the digital experience in Ukraine," said Krzysztof Kaziów, Head of Customer Engineering, Google Cloud, Central and Eastern Europe.

Gemma has already demonstrated excellent results as the base model for MamayLM and Lapa LLM — the first and currently leading Ukrainian LLMs — as well as for INSAIT BgGPT, a modern LLM for the Bulgarian language.

Share this article

Facebook Twitter LinkendIn