A double-edged sword in the AI race

While DeepSeek’s open-source nature triggers privacy risks, India accelerates the development of its own AI framework to harness local datasets for growth.

author-image
Voice&Data Bureau
New Update
image

On 20 January this year, a very little-known Chinese upstart named DeepSeek released what it claimed to be a foundational Artificial Intelligence (AI) model called R1—seemingly designed at the same scale as the world’s largest, most complex commercially released foundational large language model (LLM) until now, the OpenAI o1 Pro. In what now feels like seconds, DeepSeek became the most sought-after worldwide for the way it handled data and restructured the needs.

Advertisment

Before anything else, some added context: foundational LLMs are massively complex AI algorithms trained on vast troves of data by a company. This database includes as diverse a range of data as possible—US firm OpenAI is notoriously embroiled in an industrial argument and legal battles around the world for using ‘copyrighted’ data scoured from across the Internet.

image

“What should worry us is that China is investing heavily in AI, with 50 tier-II labs and nearly 10 anthropic labs focusing on related technologies.”- AJAI CHOWDHRY, Founder of HCL Infosystems

Advertisment

This diversity of data gives LLMs a lot of context to understand most things around them in a natural language format. No wonder, then, that ever since the public release of the now-famous ChatGPT, people have become wary and predicted that AI would soon “take away our jobs.” Case in point: Generative AI (GenAI), which is the application layer sitting atop the LLMs, uses the foundational models to understand, think, and speak like humans.

image

“It will be critical for companies to make a thorough risk assessment of products and suppliers that may incorporate DeepSeek or any future LLM.”- CHESTER WISNIEWSKI, Director & Global Field Chief Technology Officer, Sophos

Advertisment

So far, most GenAI applications require mammoth models that use vast amounts of computing to retrieve all of their training data. DeepSeek, on the other hand, transformed this approach. As many stakeholders highlighted, DeepSeek would use a simple algorithm to understand a query’s broad topic and then retrieve only the specific dataset required for the query.

By DeepSeek’s own whitepaper admission, this can reduce the cost of compute required for GenAI applications by 90% and produce similarly deep results as, say, OpenAI’s o1 Pro.

What Does it Mean for India?

Advertisment

Datasets play imperative roles here. At stake with DeepSeek is the sheer act of structuring data and creating data cohorts that ensure a more organised operational structure. This would also give AI firms greater clarity over their datasets and the factor of ‘explainability’ in the data that underpins an AI system.

It is this that industry executives have voiced out as concerns galore. According to media reports, Union IT Minister Ashwini Vaishnaw highlighted on 30 January that India will be using its proprietary Indic language datasets to run them on open-source AI models with the country’s own compute to develop its foundational model soon. The move will likely also take cues from the Ministry of Electronics and IT (MeitY)’s public datasets venture, Bhashini, which has been building organised local language repositories for years now.

image

Advertisment

“CIOs must establish rigorous governance and validation frameworks to ensure their GenAI solutions meet performance, security, and ethical benchmarks.”-RYAN COX, Global Head – AI, Synechron

Industry stakeholders, however, warn that this may not be enough. Ajai Chowdhry, Founder of HCL Infosystems, said the concerns are deep-rooted in today’s global geopolitical balance. “What should worry us is that China is investing heavily in AI, with 50 tier-II labs and nearly 10 anthropic labs, including lesser-known companies like Minimax, Quin, and Kimi.ai focusing on related technologies. This should be a huge wake-up call for India, and we must work in emergency mode to win the AI race,” he said.

Chowdhry added that regarding the role of data underpinning the AI model, “every country is on its own, and strategic autonomy is extremely critical.”

Advertisment

“Our aspiration, along with our extraordinary talents and resources, can be the driving force. The intelligence, skill, and talent of the Indian people are well-known. Imported chips and enormous data centres are not necessary for innovation. The capacity exists within the country, and we must create our GPUs and develop state-of-the-art models, such as LLMs, using Indian language datasets—considering that we possess an abundance of such data that can be employed as a tactical advantage,” he said.

What are the Concerns?

Several experts have highlighted that countries such as India should be careful while adopting such a model. The onus will be on dataset implementation with an AI model that is yet to be vetted.

Advertisment

Chester Wisniewski, Director and Global Field Chief Technology Officer at managed technology services provider Sophos, said, “DeepSeek’s ‘open source’ nature opens it up for exploration—by both adversaries and enthusiasts. Like Llama (by Meta Platforms), it can be played with to have the guardrails removed.” This could lead to abuse by cybercriminals, although it is important to note that running DeepSeek still requires far more resources than the average cybercriminal has.

He further stated that the more pressing issue for companies could be the likelihood of various products and companies adopting DeepSeek due to its cost-effectiveness, leading to potentially significant privacy risks.

“As with any other AI model, it will be critical for companies to make a thorough risk assessment, which extends to any products and suppliers that may incorporate DeepSeek or any future LLM. They also need to be certain they have the right expertise to make an informed decision,” Wisniewski added.

Consultants also noted that exposing the model to India’s data could lead to the data itself being exposed and exploited. Ryan Cox, Global Head of AI at Synechron, said that the real challenge of this ground-breaking AI model is “governance.”

“Some models are developed with some level of censorship, particularly concerning sensitive political topics, cultural issues, or content that might be considered inappropriate or against Chinese government policies. Open-weight models lack built-in security certifications, placing the compliance burden on the deploying organisation. CIOs and technology leaders must establish rigorous governance and validation frameworks to qualitatively ensure their generative AI solutions meet performance, security, and ethical benchmarks,” Cox said.

Cox, Wisniewski, and Chowdhry agree that the fundamental role of data in generative AI is now going to come to the fore and could pose equal challenges and opportunities in developing foundational capabilities in India.

Going forward, therefore, data exposure to a model such as DeepSeek can disrupt India’s AI data goals. A senior data consultant to MeitY said on conditions of anonymity: “The Centre is aware of the data challenge in front of us. India will start using any model only after the country’s brightest minds properly vet it—who will then validate the security factor of our indigenous AI model. Much will remain to be seen regarding a long-term picture—which we are working on right now for making the model and the AI dataset in question.”

By Vernika Awal

feedbackvnd@cybermedia.co.in