Internationalization of Programming

Research has shown that the language spoken around babies at a very early age is in most occasions the language they will prefer over any other language they hear or speak for the rest of their lives. Thought processes are language-based and this is mostly evident when one speaks a language they are not native to. For example, an English speaker trying to speak Hebrew might mistake the gender of a noun they know in Hebrew, since they are not used to nouns having gender, much more easily than they would mistake the conjugation of a verb in English. However, almost all mainstream programming languages and frameworks are based on the English language, since English is an internationally spoken language and presents the lowest common denominator. This forces non-native English speaking programmers to either translate or, when no translation is found appropriate, transliterate terms from their native tongue to English, mostly resulting in broken English in the codebase.

For almost all non-native English speakers confronted with their first programming language, this poses a serious deterrent: They have to both learn the language and the terms used in the language. Take the word char, as one of the simplest examples: many people, including myself, were taught that this word was pronounced tch-ar (though correctly pronounced like far), rather than k-ar, a form which is easily evident from the fact that it is an abbreviation of character. Out of all the programmers I know, most of whom have had years of experience in the field, about half still pronounce it wrong.

This problem was solved in Israel with the introduction of a pseudo-code language for students, called Ivrit Mivnit (עברית מבנית, translated: Structured Hebrew), which is a language similar in concept to Visual Basic, in which it is very verbose. However, except for a single implementation of a compiler I know of, written by a friend and myself as a school project with our own attempt to standardize the syntax, Structured Hebrew remains – literally – on paper.

    Today Oren Eini, better known as Ayende Rahien, wrote about why you can’t code in Hebrew using C# and especially under Visual Studio. His points talk of the unreadable mix created between Hebrew, English and Mathematical notation when coding as he has tried, mostly because Hebrew and English are written in opposite directions. He also notes IntelliSense’s faults when applied to right-to-left tokens.
    One can only imagine trying to write Chinese in its traditional form (top to bottom, right to left). One can only imagine, because it is simply not possible.

    It has been standard practice for years at many companies to internationalize their applications by translating every single text string into numerous languages, some even creating a complete right-to-left interface, to complement the native left-to-right interface, for languages such as Arabic and Hebrew. This notion can be adapted to language and framework designs, allowing international versions to be created.

    Let’s take an example of a Hello World program, originally written in Visual Basic for verbosity’s sake:

    Class HelloWorldApp
    Shared Sub Main()
    System.Console.WriteLine("Hello, world!")
    End Sub
    End Class

    This application, when written with Dutch or Hebrew words and lettering would look like this:

    Class HalloWereldProg
    Shared Sub Main()
    System.Console.WriteLine("Hello, wereld!")
    End Sub
    End Class
    Class אפליקצית_שלום_עולם
    Shared Sub Main()
    System.Console.WriteLine("שלום, עולם!")
    End Sub
    End Class

    It is pretty evident that this will not work, since all we can translate are very few of the tokens. Let’s try and translate the entire code, including the language’s syntax:

    Klasse HelloWereldProg
    Gedeelde Procedure Hoofd()
    Systeem.Console.SchrijfRegel("Hello, wereld!")
    Einde Procedure
    Einde Klasse

    מחלקה אפליקצית_שלום_עולם
    משותפת פרוצדורה ראשית()
    מערכת.מסך.כתוב_שורה("שלום, עולם!")
    סיים פרוצדורה
    סיים מחלקה

    Although not perfect (the ordering of the words Shared Subroutine is valid in English, but Subroutine Shared is the valid Hebrew form), this solution would eliminate the language barrier for those who struggle with English.

    How can this be done?

    • Language Syntax – there are already quite a few Visual Basic <-> C# automatic translators out there, some using regular expressions, others a lexical analyzer, but the fact remains that these are different syntaxes, while the problem in question relates to keywords alone. Comments are a different matter as they require an actual spoken-language translation engine.
    • Framework – This involves the creation of an entire system of translations between terms in English and their corollaries in other languages, requiring either translation or transliteration. I pity the poor sod who needs to find a translation for XML and other technical terms.

    Should it be done? Yes.
    Can it be done? Yes.
    Will it be done? Not commercially. There is no money in this, since the customer base is already used to writing code in English.

    I have been thinking of writing a language for the CLR that will be based on Structured Hebrew for quite a while, but since the framework has not and will not be translated, the prospect of such a language succeeding is pretty bleak.

    [Update: Dutch translation has been fixed.]


    3 thoughts on “Internationalization of Programming

    1. I altered the dutch translation a bit. You didn’t use the correct words.
      Klasse HelloWereldProg
      Gedeelde Procedure Hoofd()
      Systeem.Console.SchrijfRegel(“Hello, wereld!”)
      Einde Procedure
      Einde Klasse
      1. App -> Prog as in Prog(ramma), which I prefer agains App(likatie)
      2. Gedeeld -> Gedeelde
      3. Lijn -> Regel, because the translation should be based on (Text)Line.
      4. Eind – > Einde
      btw. Very weird to use both dutch and english in the same sentence.

    2. Where I work, char is usually pronounced /xAr/ (khar), but that’s probably because ch in Dutch is pronounced that way. Pretty close to kar though.
      I remember a Basic clone in the eighties that was completely in Dutch. Not as a serious product, mind you, more like a mental excercise I guess.
      Voor i = 1 Tot 3 : Schrijf i : Volgende
      Als x > 0 Dan : GaNaar Einde : Anders : KeerTerug

    Leave a Reply

    Fill in your details below or click an icon to log in: Logo

    You are commenting using your account. Log Out /  Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )


    Connecting to %s