The Internet: A Lingustic Revolution

David Crystal

A linguist can’t help but be impressed by the Internet. It is an extraordinarily diverse medium, holding a mirror up to many sides of our linguistic nature. The World Wide Web, in particular, offers a home to virtually all the styles which have so far developed in the written language – newspapers, scientific reports, bulletins, novels, poems, prayers – you name it, you’ll find a page on it. Indeed, it is introducing us to styles of written expression which none of us have ever seen before. It has often been said, the Internet is a revolution – yes, indeed, but it is also a linguistic revolution.

The Internet is not a single thing. It consists of several domains – e-mails, the World Wide Web, chatrooms (those which exist in real time and those which do not), and the world of fantasy games. Each offers us novel possibilities of human communication which I think can genuinely be called revolutionary.

In e-mails, what is revolutionary is not the way some of its users are cavalier about their typing accuracy, permitting misspellings, and omitting capitalization and punctuation. This is a rather minor effect, which rarely interferes with intelligibility. It is patently a special style arising out of the pressures operating on users of the medium, plus a natural desire (especially among younger – or younger-minded – users) to be idiosyncratic and daring. And that is how it is perceived. If I receive an e-mail from Smith in which he mis-spells a word, I do not conclude from this that ‘Smith can’t spell’. I simply conclude that he was in a hurry. I know this because I do the same thing myself, when I am in a hurry. There is nothing truly revolutionary here.

What is revolutionary about e-mails is the way the medium permits what is called framing. You receive a message which contains, say, three different points in a single paragraph. You can, if you want, reply to each of these points by taking the paragraph, splitting it up into three parts, and then responding to each part separately, so that the message you send back then looks a bit like a play dialogue. Then, your sender can do the same thing to your responses, and when you get the message back, you see his replies to your replies. You can then send the lot on to someone else for further comments, and when it comes back, there are now three voices framed on the screen. And so it can go on – replies within replies within replies – and all unified within the same screen typography. There’s never been anything like this in the history of human written communication.

The pages of the Web offer a different kind of revolutionary development. The one thing we can say about traditional writing is that it is permanent. You open a book at page 6, close the book, then open it at page 6 again. You expect to see the same thing. You would be more than a little surprised if the page had changed in the interim. But this kind of impermanence is perfectly normal on the Web – where indeed you can see the page changing in front of your eyes. Words appear and disappear, in varying colours. Sentences slide onto the screen and off again. Letters dance before your eyes. The Web is truly part of a new, animated linguistic channel – more dynamic than traditional writing, and more permanent than traditional speech. It is neither speech nor writing. It is a new medium.

Real-time Internet discussion groups – chatrooms – also offer a revolutionary set of possibilities. You see on your screen messages coming in from all over the world. If there are 30 people in the room, then you could be seeing 30 different messages, all making various contributions to the theme, but often clustering into half a dozen or more sub-conversations. It’s like being in a cocktail party where there are other conversations going on all around you. In the party, of course, you can’t pay attention to them. In a chatroom you can’t avoid them. It has never been possible before, in the history of human communication, to ‘listen’ to 30 people at once. Now you can. Moreover, you can respond to as many of them as your mental powers and typing speed permit. This too is a revolutionary state of affairs, as far as speech is concerned.

But there’s a further reason for the revolutionary status of the Internet – the fact that it offers a home to all languages – as soon as their communities have a functioning computer technology, of course. Its increasingly multilingual character has been the most notable change since it started out – not very long ago – as a totally English medium. There’s a story the former US vice-president Al Gore tells. He was reporting the remark of the 8-year-old son of Kyrgyzstan’s President Akayev, who told his father that he had to learn English. When asked why, the child apparently replied: ‘Because, daddy, the computer speaks English.’

For many, indeed, the language of the Internet ‘is’ English. There was a headline in The New York Times in 1996 which said simply: ‘World, Wide, Web: 3 English Words’. The article, by Michael Specter, went on to say: ‘if you want to take full advantage of the Internet there is only one real way to do it: learn English’. He did acknowledge the arrival of other languages: ‘As the Web grows’, he said, ‘the number of people on it who speak French, say, or Russian will become more varied and that variety will be expressed on the Web. That is why it is a fundamentally democratic technology’, he said, ‘but it won’t necessarily happen soon.’

Well, the evidence is growing that this conclusion was wrong. With the Internet’s globalization, the presence of other languages has steadily risen. By the mid-1990s, a widely quoted figure was that about 80% of the Net was in English. This figure derived from the first major study of language distribution on the Internet, carried out in 1997 by Babel, a joint initiative of the Internet Society and Alis Technologies. This showed English well ahead, but with several other languages entering the ring – notably German, Japanese, French, and Spanish.

Since then, the estimates for English have been steadily falling. Some commentators are now predicting that before long the Web (and the Internet as a whole) will be predominantly non-English, as communications infrastructure develops in Europe, Asia, Africa, and South America. A recent Global Reach survey estimated that people with Internet access in non-English-speaking countries increased between 1995 and 2000 from 7 million to 136 million. In 1998, there was another surprise: the number of newly created Web sites not in English passed the total for newly created sites that were in English. And at a conference on Search Engine Strategies in London in 2000, a representative of Alta Vista was predicting that by 2002 less than 50% of the Web would be in English. In certain parts of the world, the local language is already dominant. According to one Japanese Internet author, Yoshi Mikami, 90% of Web pages in Japan are already in Japanese.

Spend an hour hunting for languages on the World Wide Web and you’ll find hundreds. Last year I spent a few days tracking down as many examples as I could find, for my book Language and the Internet. I found one site, called World Language Resources, which lists products for 728 languages. I found an African resource list which covered several local languages; Yoruba, for example, was illustrated by some 5000 words, along with proverbs, naming patterns, and greetings. Another site dealt with no less than 87 European minority languages. Some of the sites were very small in content, of course, but nonetheless extensive in range: one gave the Lord’s Prayer in nearly 500 languages.

Nobody has yet worked out just how many languages have obtained a modicum of presence on the Web. I found over 1000 quite quickly. It’s not difficult to find evidence of a Net presence for all the more frequently used languages in the world, and for a large number of minority languages too. I’d guess that about a quarter of the world’s languages – that’s about 1500 – have some sort of cyber existence now.

It’s important to point out that in all these examples I’m talking about language presence in a real sense. These aren’t sites which only analyse or talk about languages, from the point of view of linguistics or some other academic subject. They’re sites which allow us to see languages as they are. In many cases, the total Web presence, in terms of number of pages, is quite small. The crucial point is that the languages are out there, even if they’re represented by only a sprinkling of sites.

The Internet is the ideal medium for minority languages. If you are a speaker or supporter of an endangered language – an aboriginal language, say, or one of the Celtic languages – you’re keen to give the language some publicity, to draw its plight to the attention of the world. Previously, this was very difficult to do. It was hard to attract a newspaper article on the subject, and the cost of a newspaper advertisement was prohibitive. It was virtually impossible to get a radio or television programme devoted to it. But now, with Web pages and e-mail waiting to be used, you can get your message out in next to no time, in your own language – with a translation as well, if you want – and in front of a global audience whose potential size makes traditional media audiences look minuscule by comparison. Chat rooms, moreover, are a boon to speakers living in isolation from each other, as now there can be a virtual speech community to which they can belong.

On the other hand, I have to recognize that developing a significant cyber-presence for a language is not easy. There’s a sort of ‘critical mass’ of Internet penetration which has to build up in a country, before a language develops a vibrant cyber-life. It’s not much use, really, to have just one or two sites in a local language on the Web. People wanting to use or find out about the language would soon get bored. The number of sites has to build up until, suddenly, everybody’s using them and adding to them and talking about them. That’s a magic moment, and only a few hundred languages have so far reached it. In the jargon of the Internet, there needs to be lots of good ‘content’ in the local languages out there, and until there is, people will stay using the languages that have managed to accumulate content – English, in particular.

So the character of a multilingual Internet isn’t entirely clear. It will all depend on how quickly new sites can build up a local language momentum. There are also a number of practical difficulties. Until quite recently there were real problems in using the characters of the keyboard to cope with the alphabetical diversity of the world’s languages. Because it was the English alphabet that was the standard, only a very few non-English accents and diacritics could be handled. If it was a foreign word with some strange-looking accent marks, the Internet software would simply ignore them, and assume they weren’t important. This can still happen – but things have moved on a great deal since then. First, the basic set of keyboard characters, the so-called ASCII set, was extended, so that the commoner non-English diacritics could be included. But even then it only allowed up to 256 characters – and there are far more letter shapes in the world than that. Just think of the array of shapes you find in Arabic, Hindi, Chinese, Korean, and the many other languages which don’t use the Latin alphabet. Today, a new coding system, the UNICODE system, is much more sophisticated: it allows the representation on screen of over 65,000 characters. That should be plenty – but the implementation of this system is still in its infancy.

My feeling is that the future looks good for Web multilingualism, and a number of influential people seem to share this view. Ned Thomas, for instance, is editor of a bulletin called Contact – the quarterly publication of the European Bureau of Lesser Used Languages. In an editorial last year he said: ‘It is not the case … that all languages will be marginalized on the Net by English. On the contrary, there will be a great demand for multilingual Web sites, for multilingual data retrieval, for machine translation, for voice recognition systems to be multilingual.’ And Tyler Chambers, the creator of various Web language projects, agrees: ‘the future of the Internet’, he says, ‘is even more multilingualism and cross-cultural exploration and understanding than we’ve already seen.’ I agree. The Web offers a World Wide Welcome for global linguistic diversity.