When it comes to artificial intelligence, the same names always come to mind by now: ChatGPT, at most Google’s Bard. Although the United States is the global linchpin of the industry and China the direct competitor, the rest of the world is also trying to stay on the cutting edge.
Such is the case in the United Arab Emirates, where a group linked to Abu Dhabi’s ruling family has launched what it described as the world’s most advanced Arabic artificial intelligence software.
The product is called Jais and is an open-source model available for use by the more than four hundred million Arabic speakers in the world, built on a data set in Arabic and English.
Table of Contents
Jais, a regional model
Unveiled in late August, the model is a collaboration between G42, an artificial intelligence company chaired by the UAE’s United Arab Emirates national security adviser Sheikh Tahnoon bin Zayed al-Nahyan, Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence (Mbzuai) and Cerebras, a California-based company in the field.
The launch comes as the United Arab Emirates and Saudi Arabia have purchased thousands of high-performance Nvidia chips needed to develop artificial intelligence as part of a global race to secure the supplies needed to power the industry.
Previously, the United Arab Emirates had developed an open-source large language model (Llm), known as Falcon, at the Technology innovation institute in Masdar City, Abu Dhabi. This, using more than three hundred Nvidia chips. Earlier this year, Cerebras signed a $100 million deal to supply nine supercomputers to G42, one of the largest such contracts.
“The UAE has been a pioneer in this area. We are ahead of our time, hopefully. We see it as a global race,” said Andrew Jackson, managing director of Inception, G42’s AI applied research unit. Most of the master’s degree programs are focused on English. Arabic is one of the most widely spoken languages in the world. Why shouldn’t the Arabic-speaking community have an Llm?”
The global challenge
Today’s most advanced Llm include (of course) the GPT-4 that powers OpenAI’s ChatGPT. But also Google’s PaLM that supports the Bard chatbot and Meta’s open-source LLaMA model. All are capable of understanding and generating Arabic text. But experts say Arabic within these models, which can run in up to 100 languages, would be ineffective.
According to its creators, Jais performs better than Falcon and open-source models such as LLaMA when benchmarked on its accuracy in Arabic. Falcon’s developers also disclosed that the software has not been pre-trained in Arabic.
Jais was also designed to have a more accurate understanding of the culture and context of the region, unlike most U.S.-centric models. Prior to launch, extensive testing was conducted to eliminate harmful or sensitive content as well as offensive or inappropriate materials that do not represent the values of the organizations involved in developing the model.
Jais is named after the highest mountain in the Emirates and was trained for twenty-one days on a subset of Cerebras’ Condor Galaxy 1 AI supercomputer by a team in Abu Dhabi. G42 collaborated with other Abu Dhabi entities as launch partners to use the technology. Thus including the Abu Dhabi National Oil Company, the Mubadala wealth fund, and Etihad Airways.
One of the challenges in training the model was the lack of high-quality Arabic language data available online as opposed to English. Jais uses both modern standard Arabic, understood throughout the Middle East, and the various dialects spoken in the region, drawing on media, social media, and codes.