This flag enables Profile guided optimization (PGO) and Link Time Optimization (LTO).
Both are expensive optimizations that slow down the build process but yield a significant speed boost (around 10-20% from what I remember reading).
The discussion of what these exactly do is beyond my knowledge and probably too broad for a single question. Either way, you can read a bit about LTO from the the docs on GCC which has an implementation for it and get a start on PGO by reading its wiki page.
Also, see the relevant issues opened on the Python Bug Tracker that added these:
- Issue 24915: Profile Guided Optimization improvements (better training, llvm support, etc) (Added PGO.)
- Issue 25702: Link Time Optimizations support for GCC and CLANG (Added LTO.)
- Issue 26359: CPython build options for out-of-the box performance (Adds the
--enable-optimizationsflag to the configure script which enables the aforementioned optimizations.)
As pointed out by @Shuo in a comment and stated in Issue 28032, LTO isn't always enabled with the --enable-optimizations flag. Some platforms (depending on the supported version of gcc) will disable it in the configuration script.
Future versions of this flag will probably always have it enabled though, so it's pretty safe to talk about them both here.
Answer from Dimitris Fasarakis Hilliard on Stack OverflowWhat is the use of Python's basic optimizations mode? (python -O) - Stack Overflow
Activating Python's optimization mode via script-level command-line argument - Stack Overflow
How to optimize python code?
optimization - What are the implications of running python with the optimize flag? - Stack Overflow
Videos
Another use for the -O flag is that the value of the __debug__ builtin variable is set to False.
So, basically, your code can have a lot of "debugging" paths like:
if __debug__:
# output all your favourite debugging information
# and then more
which, when running under -O, won't even be included as bytecode in the .pyo file; a poor man's C-ish #ifdef.
Remember that docstrings are being dropped only when the flag is -OO.
On stripping assert statements: this is a standard option in the C world, where many people believe part of the definition of ASSERT is that it doesn't run in production code. Whether stripping them out or not makes a difference depends less on how many asserts there are than on how much work those asserts do:
def foo(x):
assert x in huge_global_computation_to_check_all_possible_x_values()
# ok, go ahead and use x...
Most asserts are not like that, of course, but it's important to remember that you can do stuff like that.
As for stripping docstrings, it does seem like a quaint holdover from a simpler time, though I guess there are memory-constrained environments where it could make a difference.
Hi Pythonistas,
I'm interested in learning what optimization techniques you know for python code. I know its a general statement, but I'm interested in really pushing execution to the maximum.
I use the following -
-
I declare
__slots__in custom classes -
I use typing blocks for typing imports
-
I use builtins when possible
-
I try to reduce function calls
-
I use set lookups wherever possible
-
I prefer iteration to recursion
Edit: I am using a profiler, and benchmarks. I'm working on a library - an ASGI Api framework. The code is async. Its not darascience. Its neither compatible with pypy, nor with numba..
What else?
assert statements are completely eliminated, as are statement blocks of the form if __debug__: ... (so you can put your debug code in such statements blocks and just run with -O to avoid that debug code).
With -OO, in addition, docstrings are also eliminated.
From the docs:
- You can use the
-Oor-OOswitches on the Python command to reduce the size of a compiled module. The-Oswitch removes assert statements, the-OOswitch removes both assert statements and __doc__ strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have anopt-tag and are usually smaller. Future releases may change the effects of optimization.- A program doesn’t run any faster when it is read from a
.pycfile than when it is read from a.pyfile; the only thing that’s faster about.pycfiles is the speed with which they are loaded.
So in other words, almost nothing.
Regarding "Secondly: When writing a program from scratch in python, what are some good ways to greatly improve performance?"
Remember the Jackson rules of optimization:
- Rule 1: Don't do it.
- Rule 2 (for experts only): Don't do it yet.
And the Knuth rule:
- "Premature optimization is the root of all evil."
The more useful rules are in the General Rules for Optimization.
Don't optimize as you go. First get it right. Then get it fast. Optimizing a wrong program is still wrong.
Remember the 80/20 rule.
Always run "before" and "after" benchmarks. Otherwise, you won't know if you've found the 80%.
Use the right algorithms and data structures. This rule should be first. Nothing matters as much as algorithm and data structure.
Bottom Line
You can't prevent or avoid the "optimize this program" effort. It's part of the job. You have to plan for it and do it carefully, just like the design, code and test activities.
Rather than just punting to C, I'd suggest:
Make your code count. Do more with fewer executions of lines:
- Change the algorithm to a faster one. It doesn't need to be fancy to be faster in many cases.
- Use python primitives that happens to be written in C. Some things will force an interpreter dispatch where some wont. The latter is preferable
- Beware of code that first constructs a big data structure followed by its consumation. Think the difference between range and xrange. In general it is often worth thinking about memory usage of the program. Using generators can sometimes bring O(n) memory use down to O(1).
- Python is generally non-optimizing. Hoist invariant code out of loops, eliminate common subexpressions where possible in tight loops.
- If something is expensive, then precompute or memoize it. Regular expressions can be compiled for instance.
- Need to crunch numbers? You might want to check
numpyout. - Many python programs are slow because they are bound by disk I/O or database access. Make sure you have something worthwhile to do while you wait on the data to arrive rather than just blocking. A weapon could be something like the
Twistedframework. - Note that many crucial data-processing libraries have C-versions, be it XML, JSON or whatnot. They are often considerably faster than the Python interpreter.
If all of the above fails for profiled and measured code, then begin thinking about the C-rewrite path.
-O is a compiler flag, you can't set it at runtime because the script already has been compiled by then.
Python has nothing comparable to compiler macros like #if.
Simply write a start_my_project.sh script that sets these flags.
#!/usr/bin/env python
def main():
assert 0
print("tada")
if __name__=="__main__":
import os, sys
if '--optimize' in sys.argv:
sys.argv.remove('--optimize')
os.execl(sys.executable, sys.executable, '-O', *sys.argv)
else:
main()