Videos
All the programming languages I know about use english keywords like 'for', 'while', 'with', 'class', etc. I've never seen a programming language with keywords in a non-english language. Is it just because many of them were first developed in the english-speaking world (US, mostly)? Is it weird for non-English speakers (especially people who speak a language that doesn't share the same alphabet, e.g. chinese) to have to memorise all these foreign words while learning programming? Are there any programming languages based on non-English keywords?
If your system allows for three-letter codes, the ISO 639-2 standard says:
If there is language content, but the specific language cannot be determined, a special identifier is provided by ISO 639-2:
und (Undetermined)
The language codes you're using look like ISO 639-1. There is no ISO 639-1 code for an indeterminate or unknown language; however, "xx" could be used as as a reasonable placeholder.
I think the primary reason for this is the same that motivates any other lingua franca: the desire to exchange ideas across different groups.
We can group the places where natural language is used in programming into a few different categories:
- Keywords - terms baked into the syntax of the language ("if", "return", "while", etc). These are what linguists would call a "closed word class"; they may be reserved from use as identifiers, or marked out with stropping.
- Built-in identifiers - names of functions, namespaces, classes, etc, which are part of the "standard library" of the language. These are generally an "open class", in the sense that new items are added reasonably regularly, and can move somewhat freely between this and the next category.
- User-defined identifiers - this is the much largest list, and also the one over which languages designers have least direct influence. The main limitation placed is the available character set - if identifiers can only use the Latin letters in ASCII, some languages will be harder to use (though not impossible).
- Error messages and other output. This generally depends not on the design of the language, but its implementation. These may be the same thing (e.g. PHP has only one widely used implementation), or entirely separate and widely varied (e.g. C compilers).
- Documentation. On the face of it, this is the easiest to translate, and for example the PHP manual is currently "fully" translated into 9 languages.
Of these, the vast majority of code that looks like language is user-defined identifiers. These can be, and regularly are, written in the user's first language - except when they're shared. If a new online service wants to publish an SDK on a package repository such as CPAN, NPM, Packagist, Nuget, etc, they need to define a public API for that package, and that involves choosing identifiers. If the majority of packages on that repository use identifiers in the same language, that is a lingua franca in exactly the same way as the trade languages of the ancient Mediterranean, or the Latin of Renaissance scientists.
This then leads back to the choice of built-in identifiers. The language designer could, in principle, provide multiple aliases for every built in function, giving the user a choice of languages. However, this becomes a wasted effort if the user community picks one as the lingua franca for sharing code samples and libraries.
As we get deeper into the core of the language, and particularly with keywords, we get a similar effect between languages: if you want new users to pick up your language, there is an advantage to them recognising parts of it. This leads to certain terms becoming somewhat standardised in their meaning - the lingua franca is based on a particular natural language, but becomes its own thing. For instance, the "return" keyword originated from the intransitive English verb meaning "go back", but has acquired programming-specific meanings: "returned value", "return type", and so on.
To circle back to error messages and documentation, like private identifiers these can be and often are translated; but they will still have to incorporate large parts of the lingua franca, to mention keywords and identifiers which are not translated.
All of that leaves us with a few situations where not using the lingua franca would be reasonable:
- Where the language has no "open word class". For instance, Microsoft Excel has localized names for all its built-in formula functions.
- Where the language isn't intended for writing code to share. The biggest example of this is educational languages: the main aim is to make it easy for users to write their own programs, to learn programming concepts.
- Where the audience for the language is limited to a particular community. The macro language of an internal tool might exclusively use the local language of its developers, although the draw of the lingua franca will still be there from exposure to other languages.
Throughout this, I've deliberately talked about the lingua franca in the abstract, because the fact that it is based on English is largely historical accident: English was the language in the UK and USA where major early computer science work happened; and it's also a lingua franca in other contexts, meaning it is accessible to a lot of users as a second language.
The influence of momentum
The first "compiled" programming language was Autocode, developed at the University of Manchester in England in 1952. The first widely used programming language, FORTRAN, was developed at IBM in 1957, in the USA. The first example of a non-English programming language I can find is the ALGOL 68 standard, which was published in several European languages, and eventually Japanese.
By this time, however, the USA and UK had already established themselves as the pioneers of programming language development, with languages like LISP, BASIC, COBOL, and ALGOL58. From there, every notable language was in English because the people who made them spoke (possibly amongst other things) English. If you want your language to be both useful and popular (as many do), it makes sense to target the demographic with the largest chance of using your language; for a long time, this was English[1], and even as countries like India and China have become large centers of development in the tech world, this continues to be English. I have no doubt that at some point we will see a popular non-English language, but momentum rolls on.
[1]: By "English", I mean "People willing to program in English"