If you are an English speaker surfing Wikipedia, there are more than 5 million articles for your perusal. English Wikipedia is a labyrinth of links that could have you clicking for years, but if you speak a different language, you might click through everything in just a couple of minutes.
In Swedish, the second most popular language on the crowd-sourced encyclopedia, there are nearly 3 million articles. But if you live in neighboring Norway, there are less than half a million entries. If you are one of the 10 million people who speak Zulu, there are less than a thousand. And if your only tongue is Hiri Motu, one of the official languages of Papua New Guinea, well Wikipedia probably just isn’t for you: Wikipedia indexes just three articles in the language so Wikipedians closed the language’s encyclopedia.
Depending on whether you speak English, Norwegian or Hiri Motu, the internet is a very different place.
The internet is global but it is also regional. Cats are to the U.S. and Japan what goats are to Brazil and Uganda. If you speak an uncommon language, the internet can feel downright rural. The problem isn’t just getting online, but whether there will be anything for people who get online to actually do.
“What’s critical to understand is that, with the next billion users coming online, we’re going to see a wide variety of new languages represented online,” said An Xiao Mina, a co-founder of the Civic Beat and a technologist at Meedan working to build a platform to translate social media. “We live in a world of many internets, where even if you reduce the limits of geography, censorship and connectivity, language prevents large swaths of people from connecting with each other.”
But it’s not just ‘obscure’ languages that are discriminated against on the web.
Even use of Arabic—the sixth most commonly spoken language in the world and the fourth most common language among internet users—was until recently limited on many mobile phones. In some places on the internet, it still is. To cope, Arabic speakers developed “Arabizi”, a combination of Roman letters and numbers that make it easier to chat. Arabizi is a essentially a transliteration of Arabic into English characters, using numbers to stand in for some of the letters that don’t have direct counterparts in sound, like 7 for ح (ha), which sounds a bit like a guttural “h.”
It’s an ingenious solution, but one that shouldn’t have to exist. When emoji exploded in popularity, developers across all platforms worked quickly to make it easily usable on their devices. Why so slow with Arabic?
Arabic Wikipedia, by the way, has just 400,000 articles. A language spoken by more than 400 million people is less represented than Swedish, a language spoken by just 9 million. The demographics of the internet have historically been very different from that of the offline world, and those colonization effects are dramatic.
Recent research has shown that speaking English is a significant factor in determining whether someone adopts use of the web. Some languages are not well represented online, but others, like Tibetan, are completely invisible, unusable on browsers, operating systems, and keyboards.
The Tibetan blogger Dechen Pemba recently wrote about the frustrations of not being able to access the Tibetan language on a phone. Google, he wrote, failed to develop a Tibetan language interface and only recently incorporated the Tibetan language font on some Android phones. (That’s one way for Apple, which does support Tibetan, to win customers from Android.)
“Given that the Tibetan literary tradition goes back to the 7th century … my pet hate is when Tibetan language is described as ‘obscure,'” he wrote. “I wonder how it is possible that the language of Tibetan Buddhism and Tibetan Buddhists, comprising of as many as 60 million people, can be wilfully left behind in terms of modern technology?”
Facebook’s Free Basics program was controversial in India in large part because it limited the internet resources the digitally disadvantaged would have access to. Would it include access to domestic violence protection programs, or would it be a walled ghetto devoted to social media and online shopping? Language barriers can also force internet users into digital ghettos, or force them to forsake their mother tongue (and its culture) to escape them.
“The fact that a lot of groups have very little local-language content is problematic because it can contribute to a global homogenization of ideas and culture, and perhaps even knowledge itself,” said Mark Graham, a research fellow at the Oxford Internet Institute.
Graham predicts negative impacts on cultural diversity if the Internet’s language is predominantly English, Chinese, and Spanish. A version of this, for example, is happening right now in Iceland, where the packaging on so many imported goods is in English that it’s becoming more common than Icelandic in every day life.
A linguistically divided internet can also lead to the creation of monocultural bubbles. Wikipedia provides a good example: one study showed that most content on Wikipedia is available in exclusively one language. Even English Wikipedia only has articles that correspond with about half the topics of German Wikipedia.
“The Chinese internet is a good example of this,” Graham said. “There are more Chinese internet users online than internet users from any other country. So, this has meant that there is a lot of content out there in Chinese. Which, in turn, means that it is easy for Chinese internet users to exist in their own ‘filter bubble’—not really exposed to different content on the broader Web.”
Mina pointed out that the web’s prioritization of mainstream languages also leaves many tools for political organization and speaking out off-limits to marginalized groups.
“If you don’t speak a top ten language, the internet you have access to is extremely limited,” Mina told me. “Imagine going to a Chinese restaurant and just trying to order based on pictures.”
Graham told me he’d like to see more online spaces like Wikipedia that are digital commons where users can contribute content in any language they like, allowing local internet users to essential built their own web. But getting those digital commons filled with content first requires creating incentives to get people online in the first place. And part of that means making content that is already out there accessible across the boundaries of language. Mina is interested in chipping away at those boundaries by creating technology that translates social media content from one language to another. Scott Hale, a data scientist focused on bilingualism at the Oxford Internet Institute, told me that user interfaces could help break down language barriers by allowing users to interact with them in multiple languages at once. Most online interfaces—Google and Facebook among them— are designed with monolingual users in mind, only surfacing content in one language at a time. Allowing people to easily toggle between languages is one way to break down the linguistic silos that online life creates.
“You can’t just put a bunch of people in the network and expect that they connect,” Mina said.
The internet was supposed to be the thing that made all of our differences irrelevant, that erased borders and boundaries by translating everything into 1s and 0s. But online borders definitely exist with language boundaries that can be impenetrable.