Search All Site Content

Total Index: 6627 publications.

Subscribe to our Mailing List!

Sign up for our mailing list to keep up to date on all the latest developments.

The Peninsula

The Sovereign AI Debate and Prospects of “Korean” AI

Published January 8, 2025
Author: Haeyoon Kim
Category: Technology

The debate over sovereign artificial intelligence (AI) highlights the need and desire for nations to develop their own AI systems, including large language models (LLMs), tailored to their unique cultural and historical contexts. This idea has gained traction in Seoul, where US LLMs like ChatGPT have often fallen short in addressing questions that require a deep and nuanced understanding of the Korean context. These limitations of English-centric AI models have driven South Korea, alongside many other countries, to pursue AI ambitions that align more closely with Korea’s linguistic and cultural nuances. Yet, the rise of sovereign AI also raises serious concerns such as state-driven manipulation, most clearly evidenced by Chinese LLMs, as well as the risks of promoting techno-nationalism and protectionist AI policies amid escalating US-China competition. As the global AI race heats up, nations face the challenge of balancing national interests and technological survival with the potential dangers of perpetuating biases and control, all while competing for their identity and security as they enter the uncertain terrain of the Trump 2.0 era.

The Global Quest for Sovereign AI

It was NVIDIA founder and CEO Jensen Huang’s comment at the World Government Summit in February 2024 that sparked the concept of “sovereign” AI. Explaining the concept, he argued that “every country needs to own the production of their own intelligence” as the world faces the beginning of a new industrial revolution. He explained that because sovereign AI “codifies your culture, your society’s intelligence, your common sense, your history,” all countries “must take that data, refine that data, and own your own national intelligence” without allowing others to do it for them.

The issue of AI and data sovereignty codifying culture and history gained great attention in Seoul after news reports surfaced that when ChatGPT is asked, “Which country does Dokdo belong to,” OpenAI’s LLM responds that the island’s ownership is “disputed” between South Korea and Japan. When Meta’s LLM Llama and Google’s Gemini are asked the same question, they deliver similar responses, frustrating many Koreans. These results underscore concerns about leading generative AI models—mostly developed by US tech companies and predominantly trained on English data—struggling to accurately address context-sensitive, non-English content.

While ChatGPT and other Western LLMs can respond to questions in Korean, their heavy reliance on English-centric data can limit their ability to address Korea-specific contexts, as they lack a nuanced understanding of local culture, language intricacies, and specialized knowledge. This issue extends beyond merely perceiving ChatGPT as a poor translator; as a dominant US LLM, it could “perpetuate and even validate misinformation” to over 200 million active weekly users (as of August 2024). This challenge applies not only to the case of South Korea but also to many other countries that speak less prominent languages, considering today’s leading generative AIs’ limited understanding of the non-English speaking world.

Acknowledging its importance, building homegrown AI has become a global competition among governments, particularly for middle powers, as the United States and China continue to lead AI development by a significant margin. In Europe, French startup Mistral AI has emerged as a major player in this landscape, developing LLMs such as Mistral Large that serve as European alternatives to US models. In the Middle East, the UAE’s Technology Innovation Institute introduced its first LLM, Falcon, and Saudi Arabia is also making strides under its Project Transcendence initiative, pledging 100 billion USD to establish itself as a regional AI powerhouse.

Meanwhile, India, home to a vast pool of AI talent, has launched several initiatives aligned with Prime Minister Narendra Modi’s IndiaAI Mission. A notable example is BharatGen, recently introduced as “the world’s first government-funded multimodal LLM project,” which aims to collect and curate India-centric data to represent the nation’s diverse languages, dialects, and cultural contexts. In Southeast Asia, the region’s first LLM, Sea-Lion, claims to differ from Western and Chinese models by being “trained on content produced in Southeast Asian languages,” and, therefore, “better understands Southeast Asia’s diverse contexts, languages, and cultures.”

South Korea’s Efforts for “Korean” AI at Home and Abroad

Pursuing this effort in South Korea, the Yoon Suk Yeol administration announced a “national all-out effort” declaration in September to position Seoul as a global leader in the AI race. At the inaugural meeting of the Presidential Committee on AI, Yoon outlined his vision for South Korea to become one of the “top three AI powerhouses” by 2027. This goal aims to elevate the country’s position—seventh in Stanford’s latest global AI vibrancy ranking—to third, behind the United States and China. To achieve this, he has pledged to establish a National AI Computing Center, with an investment of 2 trillion KRW (1.52 billion USD). While the ongoing political turmoil in Seoul following Yoon’s abortive martial law declaration and subsequent impeachments could present immediate obstacles, these efforts are expected to remain an integral part of South Korea’s long-term strategy for AI development.

In this environment, South Korea’s tech giant, Naver, has been most vocal about its AI ambition, showcased through the development of a leading Korean LLM, HyperCLOVA X. While the model claims to be multilingual, it emphasizes optimization for the Korean language and—more importantly—its familiarity with Korean norms and values. To support this, Naver disclosed that its model has trained on 6,500 times more Korean data than the latest version of ChatGPT and uses a Korean-optimized tokenizer, enabling its models to encode data more efficiently than English-centric models.

The competition for more advanced LLMs is multifaceted, but securing the right data is one of the most critical components. This is because large, high-quality, and diverse datasets are essential for training these generative AI models. While some big tech companies publicly release or provide access to certain LLMs, they often remain tight-lipped about the specific details of data collection, model architecture, and training processes by citing such information as valuable assets that reflect their technological expertise and competitive advantage. As the country’s leading search engine, Naver has access to extensive data on Korean users’ behavior and language patterns. While Google has long been the dominant search engine worldwide, Naver has maintained its lead in South Korea by adapting to local preferences and offering a range of data-driven services, including online shopping, blogs, community platforms, and map services.

Building on its strong domestic foundation, including its rich data sources, and advanced technology and cloud capabilities, Naver is broadening its attention to sovereign AI initiatives outside South Korea. Highlighting its potential to lead in non-English-speaking markets, CEO Choi Soo-yeon emphasized the importance of global AI diversity, stating, “Naver will provide necessary technologies for building sovereign AI for different nations and institutions to add more diversity in the era of AI.” Her remarks reflect the realistic challenges of competing with US and Chinese AI leaders and the crucial need for partnerships in other diverse regions with larger populations and untapped user bases. This approach addresses concerns that relying solely on the domestic market may not be sustainable for Korean AI services in the long term. In September 2024, Naver took a significant step in this direction by signing an initial agreement with Saudi Arabia’s AI agency to jointly develop an Arabic-based LLM, marking a move beyond its traditional domestic focus.

Pitfalls of Sovereign AI and Challenges Ahead

Despite the rapidly growing demand for sovereign AI worldwide, one of the key concerns is the potential for exploitation by governments or a small number of national AI champions using LLMs to promote specific ideologies or narratives. This means that sovereign AI could serve as a tool to push certain government agendas, manipulate public opinion, and ultimately foster state-driven information control. In this context, it is noteworthy that Russian President Vladimir Putin justifies the development of Russian sovereign AI by claiming that the Westen models are biased against Moscow. Similarly, it is notable that ChatGPT’s Chinese rival, ERNIE Bot, which is primarily trained on Chinese data, reportedly “echoes the talking points of Chinese officials and state media.” Beijing has even introduced “Chat Xi PT,” fed by Chinese President Xi Jinping’s political philosophy, to “disseminate Xi’s ideas on politics, economics and culture.”

At the same time, concerns surrounding sovereign AI reinforce the legitimacy of Taiwan’s development of TAIDE, the nation’s self-built LLM designed to distance itself from ERNIE Bot. Despite ERNIE’s Chinese-language friendliness and purported user base of over 200 million within a year of its launch, Taiwan is committed to developing a local generative AI more aligned with its values and customs. As the race for sovereign AI escalates and nations strive to preserve their identity and security, the prevailing sentiment is that unless a nation owns sovereign AI, it will be controlled by others who already do. However, this raises a critical question of how nations can develop their own trustworthy AI without replicating the biases and control they seek to avoid.

Furthermore, advocating for sovereign AI can contribute to techno-nationalism, which has become increasingly prevalent amid the intensifying US-China competition. This trend risks encouraging protectionist AI policies, potentially limiting opportunities for global cooperation. In this context, some critics in Seoul have linked Korean AI efforts to hyper-nationalism, drawing comparisons to the movement promoting domestic product consumption that emerged following the Asian Financial Crisis in the late 1990s. Similarly, they suggest that exhorting Korean users to adopt HyperCLOVA X over ChatGPT is driven by appeals to nationalistic sentiments rather than by an objective evaluation of its performance and global competitiveness.

Against this backdrop, discussions surrounding sovereign AI are poised to enter a new phase with Donald Trump’s return to the White House and his approach to generative AI. With the establishment of a new White House AI and Crypto Czar role, his strategy is expected to involve restricting US AI technologies from being used by or transferred to China through heightened export controls—a move that could significantly impact countries developing their own AI systems, including South Korea. While the details are unclear, Trump’s America-First agenda will likely intensify national drives to build sovereign AI, ultimately fueling greater global competition in AI development. The outcome of this competition will have lasting implications for countries determined to shape their technological futures and, by extension, their destinies in the age of AI.

Haeyoon Kim is a Non-Resident Fellow at the Korea Economic Institute. The views expressed here are the author’s alone.

Photo from Shutterstock.

KEI is registered under the FARA as an agent of the Korea Institute for International Economic Policy, a public corporation established by the government of the Republic of Korea. Additional information is available at the Department of Justice, Washington, DC.

Return to the Peninsula

Stay Informed
Register to receive updates from KEI