I am in my third programing class in college, and we have only done C++ up to this point. My current class is on Java, and we are learning about how C++ is compiled and Java has a mixture of both compilation and interpretation. Some people are having a lot of difficulty with this idea, especially coming from JavaScript.
I understand that for a language to be interpreted, it means it is read line-by-line, converted, and executed right away; for a langauge to be compiled, it means that the entire source code is translated and the entire file is executed.
My questions:
But in practice (i.e. as in actual writing code and developing software), how are compiled languages and interpreted languages different?
Also, how is Java both? I haven't noticed any differences from using Java to using C++.
Definitions of “interpreted language” and “compiled language” with explanations of why Python and Java are or are not such languages - Python in Education - Discussions on Python.org
Examples of when we'll use interpreted language over compiled language? - Software Engineering Stack Exchange
programming languages - Interpreted vs Compiled: A useful distinction? - Software Engineering Stack Exchange
Can every language be categorized as either compiled or interpreted? - Software Engineering Stack Exchange
Videos
A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition "+" operation in your source code could be translated directly to the "ADD" instruction in machine code.
An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same "+" operation would be recognised by the interpreter at run time, which would then call its own "add(a,b)" function with the appropriate arguments, which would then execute the machine code "ADD" instruction.
You can do anything that you can do in an interpreted language in a compiled language and vice-versa - they are both Turing complete. Both however have advantages and disadvantages for implementation and use.
I'm going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:
- Faster performance by directly using the native code of the target machine
- Opportunity to apply quite powerful optimisations during the compile stage
And here are the advantages of interpreted languages:
- Easier to implement (writing good compilers is very hard!!)
- No need to run a compilation stage: can execute code directly "on the fly"
- Can be more convenient for dynamic languages
Note that modern techniques such as bytecode compilation add some extra complexity - what happens here is that the compiler targets a "virtual machine" which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).
A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?
Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.
So generally categorizing languages by "compiled" and "interpreted" doesn't make much sense.
There's (to my knowledge) no such thing as an interpretted "language" or a compiled "language".
Languages specify the syntax and meaning of the code's keywords, flow constructs and various other things, but I am aware of no language which specifies whether or not it must be compiled or interpreted in the language spec.
Now if you're question is when you use a language compiler vs a language interpreter, it really comes down to the pro's/con's of the compiler vs. the interpreter and the purpose of project.
For instance, you may use the JRuby compiler for easier integration with java libraries instead of the MRI ruby interpreter. There are likely also reasons to use the MRI ruby interpreter over JRuby, I'm unfamiliar with the language though and can't speak to this.
Touted benefits of interpreters:
- No compilation means the time from editing code to testing the app can be diminished
- No need to generate binaries for multiple architectures because the interpreter will manage the architecture abstraction (though you may need to still worry about the scripts handling integer sizes correctly, just not the binary distribution)
Touted benefits of compilers:
- Compiled native code does not have the overhead of an interpreter and is therefore usually more efficient on time and space
- Interoperability is usually better, the only way for in-proc interoperation with scripts is via an interpreter rather than a standard FFI
- Ability to support architectures the interpreter hasn't been compiled for (such as embedded systems)
However, I would bet in 90% of cases it goes something more like this: I want to write this software in blub because I know it well and it should do a good job. I'll use the blub interpreter (or compiler) because it is the generally accepted canonical method for writing software in blub.
So TL;DR is basically, on a case by case basis comparison of the interpreters vs the compilers for your particular use case.
Also, FFI: Foreign Function Interface, in other words interface for interoperating with other languages. More reading at wikipedia
An important point here is that many language implementations actually do some sort of hybrid of both. Many commonly used languages today work by compiling a program into a intermediate format such as bytecode, and then executing that in an interpreter. This is how Java, C#, Python, Ruby, and Lua are typically implemented. In fact, this is arguably how most language in use today are implemented. So, the fact is, language today both interpret and compile their code. Some of these languages have an additional JIT compiler to convert the bytecode to native code for execution.
In my opinion, we should stop talking about interpreted and compiled languages because they are no longer useful categories for distinguishing the complexities of today's language implementations.
When you ask about the merits of interpreted and compiled languages, you probably mean something else. You may be asking about the merit of static/dynamic typing, the merits of distributing native executables, the relative advantages of JIT and AOT compilation. These are all issues which get conflated with interpretation/compilation but are different issues.
It's important to remember that interpreting and compiling are not just alternatives to each other. In the end, any program that you write (including one compiled to machine code) gets interpreted. Interpreting code simply means taking a set of instructions and returning an answer.
Compiling, on the other hand, means converting a program in one language to another language. Usually it is assumed that when compilation takes place, the code is compiled to a "lower-level" language (eg. machine code, some kind of VM bytecode, etc.). This compiled code is still interpreted later on.
With regards to your question of whether there is a useful distinction between interpreted and compiled languages, my personal opinion is that everyone should have a basic understanding of what is happening to the code they write during interpretation. So, if their code is being JIT compiled, or bytecode-cached, etc., the programmer should at least have a basic understanding of what that means.
The distinction is deeply meaningful because compiled languages restrict the semantics in ways that interpreted languages do not necessarily. Some interpretive techniques are very hard (practically impossible) to compile.
Interpreted code can do things like generate code at run time, and give that code visibility into lexical bindings of an existing scope. That's one example. Another is that interpreters can be extended with interpreted code which can control how code is evaluated. This is the basis for ancient Lisp "fexprs": functions that are called with unevaluated arguments and decide what to do with them (having full access to the necessary environment to walk the code and evaluate variables, etc). In compiled languages, you can't really use that technique; you use macros instead: functions that are called at compile time with unevaluated arguments, and translate the code rather than interpreting.
Some language implementations are built around these techniques; their authors reject compiling as being an important goal, and rather embrace this kind of flexibility.
Interpreting will always be useful as a technique for bootstrapping a compiler. For a concrete example, look at CLISP (a popular implementation of Common Lisp). CLISP has a compiler that is written in itself. When you build CLISP, that compiler is being interpreted during the early building steps. It is used to compile itself, and then once it is compiled, compiling is then done using the compiled compiler.
Without an interpreter kernel, you would need to bootstrap with some existing Lisp, like SBCL does.
With interpretation, you can develop a language from absolute scratch, starting with assembly language. Develop the basic I/O and core routines, then write an eval, still machine language. Once you have eval, write in the high level language; the machine code kernel does the evaluating. Use this facility to extend the library with many more routines and write a compiler also. Use the compiler to compile those routines and the compiler itself.
Interpretation: an important stepping stone in the path leading to compilation!
The answer to your question:
Can every language be categorized as either compiled or interpreted?
Is "No", but not for the reason you think it is. The reason is not that there is a third missing category, the reason is that the categorization itself is nonsensical.
There is no such thing as a "compiled language" or an "interpreted language". Those terms are not even wrong, they are nonsensical.
Programming languages are sets of abstract mathematical rules, definitions, and restrictions. Programming languages aren't compiled or interpreted. Programming languages just are. [Credit goes to Shriram Krishnamurthi who said this in an interview on Channel 9 years ago (at about 51:37-52:20).]
In fact, a programming language can perfectly exist without having any interpreter or compiler! For example, Konrad Zuse's Plankalkül which he designed in the 1930s was never implemented during his lifetime. You could still write programs in it, you could analyze those programs, reason about them, prove properties about them … you just couldn't execute them. (Well, actually, even that is wrong: you can of course run them in your head or with pen and paper.)
Compilation and interpretation are traits of the compiler or interpreter (duh!), not the programming language. Compilation and interpretation live on a different level of abstraction than programming languages: a programming language is an abstract concept, a specification, a piece of paper. A compiler or interpreter is a concrete piece of software (or hardware) that implements that specification. If English were a typed language, the terms "compiled language" and "interpreted language" would be type errors. [Again, credit to Shriram Krishnamurthi.]
Every programming language can be implemented by a compiler. Every programming language can be implemented by an interpreter. Many modern mainstream programming languages have both interpreted and compiled implementations. Many modern mainstream high-performance programming language implementations have both compilers and interpreters.
There are interpreters for C and for C++. On the other hand, every single current major mainstream implementation of ECMAScript, PHP, Python, Ruby, and Lua has a compiler. The original version of Google's V8 ECMAScript engine was a pure native machine code compiler. (They went through several different designs, and the current version does have an interpreter, but for many years, it didn't have one.) XRuby and Ruby.NET were purely compiled Ruby implementations. IronRuby started out as a purely compiled Ruby implementation, then added an interpreter later in order to improve performance. Opal is a purely compiled Ruby implementation.
Some people might say that the terms "interpreted language" or "compiled language" make sense to apply to programming languages that can only be implemented by an interpreter or by a compiler. But, no such programming language exists. Every programming language can be implemented by an interpreter and by a compiler.
For example, you can automatically and mechanically derive a compiler from an interpreter using the Second Futamura Projection. It was first described by Prof. Yoshihiko Futamura in his 1971 paper Partial Evaluation of Computation Process – An approach to a Compiler-Compiler (Japanese), an English version of which was republished 28 years later. It uses Partial Evaluation, by partially evaluating the partial evaluator itself with respect to the interpreter, thus yielding a compiler.
But even without such complex highly-academic transformations, you can create something that is functionally indistinguishable from compilation in a much simpler way: just bundle together the interpreter with the program to be interpreted into a single executable.
Another possibility is the idea of a "meta-JIT". (This is related in spirit to the Futamura Projections.) This is e.g. used in the RPython framework for implementing programming languages. In RPython, you write an interpreter for your language, and then the RPython framework will JIT-compile your interpreter while it is interpreting the program, thus producing a specialized compiled version of the interpreter which can only interpret that one single program – which is again indistinguishable from compiling that program. So, in some sense, RPython dynamically generates JIT compilers from interpreters.
The other way around, you can wrap a compiler into a wrapper that first compiles the program and then directly executes it, making this wrapped compiler indistinguishable from an interpreter. This is, in fact, how the Scala REPL, the C♯ REPL (both in Mono and .NET), the Clojure REPL, the interactive GHC REPL, and many other REPLs are implemented. They simply take one line / one statement / one expression, compile it, immediately run it, and print the result. This mode of interacting with the compiler is so indistinguishable from an interpreter, that some people actually use the existence of a REPL for the programming language as the defining characteristic of what it means to be an "interpreted programming language".
Note, however, that you can't run a program without an interpreter. A compiler simply translates a program from one language to another. But that's it. Now you have the same program, just in a different language. The only way to actually get a result of the program is to interpret it. Sometimes, the language is an extremely simple binary machine language, and the interpreter is actually hard-coded in silicone (and we call it a "CPU"), but that's still interpretation.
Some people say that you can call a programming language "interpreted" if the majority of its implementations are interpreters. Well, let's just look at a very popular programming language: ECMAScript. There are a number of production-ready, widely-used, high-performance mainstream implementations of ECMAScript, and every single one of them includes at least one compiler, some even multiple compilers. So, according to this definition, ECMAScript is clearly a compiled language.
You might also be interested in this answer of mine, which explains the differences and the different means of combining interpreters, JIT compilers and AOT compilers and this answer dealing with the differences between an AOT compiler and a JIT compiler.
It is possible to categorize language implementations to some degree. In general, we have the distinction between
- compilers and
- interpreters (if the interpreter interprets a language that is not meant for humans, it is also often called a virtual machine)
Within the group of compilers, we have the temporal distinction when the compiler is run:
- Just-In-Time compilers run while the program is executing
- Ahead-Of-Time compilers run before the program starts
And then we have implementations which combine interpreters and compilers, or combine multiple compilers, or (much more rare) multiple interpreters. Some typical combinations are
- mixed-mode execution engines which combine an interpreter and a JIT compiler that both process the same program at the same time (examples: Oracle HotSpot JVM, IBM J9 JVM)
- multi-phase [I invented that term; I don't know of a widely-used one] execution engines, where the first phase is a compiler that compiles the program to a language more suitable for the next phase, and then a second phase which processes that language. (There could be more phases, but two is typical.) As you can probably guess, the second phase can again use different implementation strategies:
- an interpreter: this is a typical implementation strategy. Often, the language that is interpreted is some form of bytecode that is optimized for "interpretability". Examples: CPython, YARV (pre-2.6), Zend Engine
- a compiler, which makes this a combination of two compilers. Typically, the first compiler translates the language into some form of bytecode that is optimized for "compilability" and the second compiler is an optimizing compiler that is specific to the target platform
- a mixed-mode VM. Examples: YARV post-2.6, Rubinius, SpiderMonkey, SquirrelFish Extreme, Chakra
But, there are still others. Some implementations use two compilers instead of a compiler and an interpreter to get the same benefits as a mixed-mode engines (e.g. the first few years, V8 worked this way).
RPython combines a bytecode interpreter and a JIT, but the JIT does not compile the user program, it compiles the bytecode interpreter while it interprets the user program! The reason for this is that RPython is a framework for implementing languages, and in this way, a language implementor only has to write the bytecode interpreter and gets a JIT for free. (The most well-known user of RPython is of course PyPy.)
The Truffle framework interprets a language-agnostic AST, but at the same time it specializes itself to the specific AST, which is kind-of like compilation but also kind-of not. The end result is that Truffle can execute the code extremely fast, without knowing too much about the language-specifics. (Truffle is also a generic framework for language implementations.) Because the AST is language-agnostic, you can mix and match multiple languages in the same program, and Truffle is able to perform optimizations across languages, such as inlining a Ruby method into an ECMAScript function etc.
Macros and eval are sometimes cited as features that cannot possibly be compiled. But that is wrong. There are two simple ways of compiling macros and eval. (Note that for the purpose of compilation, macros and eval are somewhat dual to each other, and can be handled using similar means.)
- Using an interpreter: for macros, you embed an interpreter into the compiler. For
eval, you embed an interpreter into the compiled program or into the runtime support libraries. - Using a compiler: for macros, you compile the macro first, then embed the compiled macro into your compiler and compile the program using this "extended" compiler. For
eval, you embed a compiler into the compiled program or into the runtime support libraries.
If we are being pedantic, there is no such thing as a compiled or interpreted language, since any language could be in principle be implemented either by a compiler or an interpreter. However, most languages follow a relatively consistent implementation strategy. C++ is almost always compiled to native code. Python is almost always run via a bytecode interpretor. Java is almost always run via a JIT comiler. So, if we don't insist on an obtuse pedanticness, it does make sense to talk about compiled or interpreted languages.
However, language implementation strategy does not neatly fit into the compiled/interpreted dichotomy. Essentially no languages are strictly interpreted, executed directly from the source. This would be very slow. Instead, virtually all "interpreted" language implementations compile the source into something (often bytecode) which can be more effeciently executed. On top of this, some implementations JIT compile that bytecode into native code at run time. Even languages that we think of as being compiled often contain some amount of interpretation. For example, printf in C is effectively an interpreter of the format string.
So, I would argue that it doesn't make sense to try and categorize languages into compiled or interpreted. Pretty much any languages is some degree of hybrid of the two approaches. (And yes, if we are pedantic, it is language implementations not languages which are compiled/interpreted).
What’s the difference between compiled and interpreted language?
The difference is not in the language; it is in the implementation.
Having got that out of my system, here's an answer:
In a compiled implementation, the original program is translated into native machine instructions, which are executed directly by the hardware.
In an interpreted implementation, the original program is translated into something else. Another program, called "the interpreter", then examines "something else" and performs whatever actions are called for. Depending on the language and its implementation, there are a variety of forms of "something else". From more popular to less popular, "something else" might be
Binary instructions for a virtual machine, often called bytecode, as is done in Lua, Python, Ruby, Smalltalk, and many other systems (the approach was popularized in the 1970s by the UCSD P-system and UCSD Pascal)
A tree-like representation of the original program, such as an abstract-syntax tree, as is done for many prototype or educational interpreters
A tokenized representation of the source program, similar to Tcl
The characters of the source program, as was done in MINT and TRAC
One thing that complicates the issue is that it is possible to translate (compile) bytecode into native machine instructions. Thus, a successful intepreted implementation might eventually acquire a compiler. If the compiler runs dynamically, behind the scenes, it is often called a just-in-time compiler or JIT compiler. JITs have been developed for Java, JavaScript, Lua, and I daresay many other languages. At that point you can have a hybrid implementation in which some code is interpreted and some code is compiled.
Java and JavaScript are a fairly bad example to demonstrate this difference, because both are interpreted languages. Java (interpreted) and C (or C++) (compiled) might have been a better example.
Why the striked-through text? As this answer correctly points out, interpreted/compiled is about a concrete implementation of a language, not about the language per se. While statements like "C is a compiled language" are generally true, there's nothing to stop someone from writing a C language interpreter. In fact, interpreters for C do exist.
Basically, compiled code can be executed directly by the computer's CPU. That is, the executable code is specified in the CPU's "native" language (assembly language).
The code of interpreted languages however must be translated at run-time from any format to CPU machine instructions. This translation is done by an interpreter.
Another way of putting it is that interpreted languages are code is translated to machine instructions step-by-step while the program is being executed, while compiled languages have code has been translated before program execution.