In July 2024, at the World Artificial Intelligence Conference in Shanghai, Dataocean AI released the Belt and Road Data Flywheel Intelligent Agent and Corpus. (Photo provided by Dataocean AI)
The interface of the multilingual database for the Belt and Road countries (Provided by Dataocean AI)
Thonglaw Sayavong, an entrepreneur from Vientiane, Laos, felt unprecedented excitement when he first used his smartphone to introduce his handwoven textiles to potential clients in Russia. "Before, my handmade scarves could only be sold to tourists," he said. "Now, through a mobile App, overseas customers not only understood but were moved to tears when hearing me share weaving stories in Lao. It lets the world discover our culture," he added, with his face beamed with pride.
Previously, language barriers stood like insurmountable walls for small business owners like Sayavong, hindering their global integration. Today, empowered by a business App with multilingual AI capabilities, he conducts seamless product presentations and negotiations anytime. Behind this technology lies the Belt and Road Data Flywheel intelligent agent developed by Beijing Haitian Ruisheng Science Technology Ltd. (Dataocean AI, 688787.SH), quietly transforming cross-lingual communication for millions worldwide.
Decoding the Tower of Babel: How AI Overcomes Multilingual Barriers in "Going Global"
People across regions in the world exhibit distinct facial features, linguistic expressions, behavioral habits, and writing systems. For AI companies expanding globally, enabling localized interactions is paramount - ensuring AI systems can recognize individuals by voice, identify people via facial features, and interpret textual content.
Of the world's 7,000-plus languages, only a dozen see frequent use. While many translation devices, either online or offline, achieve professional-grade capability. However, lesser-spoken languages face compounded problems, including inadequate research, scarce training data, and diverse and complicated application scenarios, resulting in great challenge in developing multilingual AI systems for lesser-spoken languages, particularly arduous for speech recognition, synthesis technologies and related tech amalgamations.
Clear and accurate communication among peoples from different countries and regions forms the cornerstone for AI applications entering overseas markets. Addressing industry Gordian Knots like "multilingual complexity and under-resourced languages", multilingual data products and services developed by Dataocean AI now have extended to intelligent speech, computer vision, and natural language processing (NLP). The company operates more than 1,100 speech databases, covering 205 languages and dialects across the globe, and its self-built pronunciation lexicon systems support 1,400-plus languages.
Forging Secured and Efficient Data Engines
Advancing domestic AI computing power and intelligent computing centers and guiding enterprises to actively engage in the data field to explore application scenarios of large models, is of great significance for promoting the transformation and upgrading of manufacturing and related sectors in China. "Dataocean AI specializes in AI and large model data, plus data-element R&D and marketing," stated Huang Yukai, CTO of Dataocean AI. Relying on 100% domestically developed technical and production systems, Dataocean AI has ironed out the Belt and Road Data Flywheel intelligent agent. By adopting localized deployment and domain enhancement technologies, integrating multi-language large models and an automated Retrieval-Augmented Generation (RAG) engine, it can be connected to the system within 30 minutes. This intelligent agent integrates multilingual knowledge bases from industries such as infrastructure, trade, and finance, ensuring accurate decision-making for cross-border cooperation data.
At present, the Data Flywheel intelligent agent provides technical support for localizing projects such as data collection, transcription, and pronunciation dictionary production. It has achieved "out-of-the-box usability" in various industry application scenarios including digital government and enterprises, smart healthcare, and intelligent manufacturing, significantly reducing the application cost of large models.
The interactive interface of the Belt and Road Data Flywheel intelligent agent (Photo provided by Dataocean AI)
Building Infrastructure for AI Inclusiveness, Technological Equity
Up to now, Dataocean AI has helped over 200 Chinese AI enterprises to expand their products overseas. It has played a significant supporting role in the development of AI technologies such as speech recognition, speech synthesis, natural language processing, machine translation, handwriting and OCR recognition, mainly in the Eastern European language system, for many leading Chinese AI technology companies and research institutions including Huawei, Alibaba, Tencent, Baidu, ByteDance, Xiaomi, iFlytek, the Chinese Academy of Sciences, the University of Science and Technology of China, and the Pujiang National Laboratory.
Dataocean AI also enables AI localization with partners from Shanghai Cooperation Organization (SCO) countries, advancing intelligent industry analysis, training and application, jointly promoting use of AI to drive sustainable development across economic, social, and cultural sectors.
"Our linguistic corpus collaborations with Indian and Russian enterprises provide speech data collection, recognition, and synthesis services to regional AI developers," noted an Dataocean AI representative, meanwhile the company's European localized team, covering data delivery and marketing operations in multiple countries in the region, works in tandem with China-based R&D teams, forming an all-around data capability matrix.
Data plays a pivotal role in artificial intelligence, serving as the fundamental infrastructure for large AI models, which determines the advancement, accuracy, security, and equity of AI systems. Dataocean AI's Belt and Road Data Flywheel intelligent agent represents a critical infrastructure for global AI inclusiveness and technological equity. It holds significant importance in bridging the digital gap and fostering shared prosperity. In the future, the company will continue to explore the use of AI-powered capabilities to establish inclusive, open, and diverse platforms and mechanisms for cross-cultural online exchanges. (By Liu Wanqiu)