Research has shown that the language spoken around babies at a very early age is in most occasions the language they will prefer over any other language they hear or speak for the rest of their lives. Thought processes are language-based and this is mostly evident when one speaks a language they are not native to. For example, an English speaker trying to speak Hebrew might mistake the gender of a noun they know in Hebrew, since they are not used to nouns having gender, much more easily than they would mistake the conjugation of a verb in English. However, almost all mainstream programming languages and frameworks are based on the English language, since English is an internationally spoken language and presents the lowest common denominator. This forces non-native English speaking programmers to either translate or, when no translation is found appropriate, transliterate terms from their native tongue to English, mostly resulting in broken English in the codebase.
For almost all non-native English speakers confronted with their first programming language, this poses a serious deterrent: They have to both learn the language and the terms used in the language. Take the word char, as one of the simplest examples: many people, including myself, were taught that this word was pronounced tch-ar (though correctly pronounced like far), rather than k-ar, a form which is easily evident from the fact that it is an abbreviation of character. Out of all the programmers I know, most of whom have had years of experience in the field, about half still pronounce it wrong.
This problem was solved in Israel with the introduction of a pseudo-code language for students, called Ivrit Mivnit (עברית מבנית, translated: Structured Hebrew), which is a language similar in concept to Visual Basic, in which it is very verbose. However, except for a single implementation of a compiler I know of, written by a friend and myself as a school project with our own attempt to standardize the syntax, Structured Hebrew remains – literally – on paper.
Today Oren Eini, better known as Ayende Rahien, wrote about why you can’t code in Hebrew using C# and especially under Visual Studio. His points talk of the unreadable mix created between Hebrew, English and Mathematical notation when coding as he has tried, mostly because Hebrew and English are written in opposite directions. He also notes IntelliSense’s faults when applied to right-to-left tokens.
One can only imagine trying to write Chinese in its traditional form (top to bottom, right to left). One can only imagine, because it is simply not possible.
It has been standard practice for years at many companies to internationalize their applications by translating every single text string into numerous languages, some even creating a complete right-to-left interface, to complement the native left-to-right interface, for languages such as Arabic and Hebrew. This notion can be adapted to language and framework designs, allowing international versions to be created.
Let’s take an example of a Hello World program, originally written in Visual Basic for verbosity’s sake:
Class HelloWorldApp Shared Sub Main() System.Console.WriteLine("Hello, world!") End Sub End Class
This application, when written with Dutch or Hebrew words and lettering would look like this:
Class HalloWereldProg Shared Sub Main() System.Console.WriteLine("Hello, wereld!") End Sub End Class
Class אפליקצית_שלום_עולם Shared Sub Main() System.Console.WriteLine("שלום, עולם!") End Sub End Class
It is pretty evident that this will not work, since all we can translate are very few of the tokens. Let’s try and translate the entire code, including the language’s syntax:
Klasse HelloWereldProg Gedeelde Procedure Hoofd() Systeem.Console.SchrijfRegel("Hello, wereld!") Einde Procedure Einde Klasse
מחלקה אפליקצית_שלום_עולם משותפת פרוצדורה ראשית() מערכת.מסך.כתוב_שורה("שלום, עולם!") סיים פרוצדורה סיים מחלקה
Although not perfect (the ordering of the words Shared Subroutine is valid in English, but Subroutine Shared is the valid Hebrew form), this solution would eliminate the language barrier for those who struggle with English.
How can this be done?
- Language Syntax – there are already quite a few Visual Basic <-> C# automatic translators out there, some using regular expressions, others a lexical analyzer, but the fact remains that these are different syntaxes, while the problem in question relates to keywords alone. Comments are a different matter as they require an actual spoken-language translation engine.
- Framework – This involves the creation of an entire system of translations between terms in English and their corollaries in other languages, requiring either translation or transliteration. I pity the poor sod who needs to find a translation for XML and other technical terms.
Should it be done? Yes.
Can it be done? Yes.
Will it be done? Not commercially. There is no money in this, since the customer base is already used to writing code in English.
I have been thinking of writing a language for the CLR that will be based on Structured Hebrew for quite a while, but since the framework has not and will not be translated, the prospect of such a language succeeding is pretty bleak.
[Update: Dutch translation has been fixed.]