Optimizations which made Python 3.6 faster than Python 3.5

Pycon US 2017, Portland, OR

Victor Stinner

[email protected]

Agenda (1) Benchmarks (2) Benchmarks results (3) Python 3.5 optimizations (4) Python 3.6 optimizations (5) Python 3.7 optimizations

Agenda

(1) Benchmarks

Unstable benchmarks March 2016, no developer trusted the Python benchmark suite Many benchmarks were unstable It wasn’t possible to decide if an optimization makes CPython faster or not...

New perf module Calibrate the number of loops Spawn 20 processes sequentially, 3 values per process, total: 60 values Compute average (mean) and standard deviation

performance project Benchmarks rewritten using perf: new project performance on GitHub http://speed.python.org now runs performance CPython is now compiled with Link Time Optimization (LTO) and Profile Guided Optimization (PGO)

Linux and CPUs sudo python3 -m perf system tune

Use fixed CPU frequency Disable Intel Turbo Boost If CPU isolation is enabled, Linux kernel options isolcpus and rcu_nocbs, use CPU pinning CPU isolation helps a lot to reduce operation system jitter

Spot perf regression

python_startup: 20 ms => 27 ms, fix: 17 ms

Timeline

April, 2014 – May, 2017: 3 years

Agenda

(2) Benchmarks results

3.6 faster than 3.5

Results normalized to Python 3.5 lower = faster

3.6 faster than 2.7

Results normalized to Python 2.7 lower = faster

3.6 faster than 2.7

Sympy: 22% - 42% faster

telco: 3.6 vs 2.7

Python 3.6 is 40x faster than Python 2.7 (decimal module rewritten in C by Stefan Krah in Python 3.3)

3.7 faster than 3.6

Results normalized to Python 3.6 lower = faster

Agenda

(3) Python 3.5 optimizations

lru_cache() Matt Joiner, Alexey Kachayev and Serhiy Storchaka reimplemented functools.lru_cache() in C sympy: 20% faster scimark_lu: 5% faster Tricky C code, hard to get it right: 3 years ½ to close the bpo-14373

OrderedDict Eric Snow reimplemented collections.OrderedDict in C html5lib: 20% faster Reuse C implementation of dict Again, tricky C code: 2 years ½ to close the bpo-16991

Agenda

(4) Python 3.6 optimizations

PyMem_Malloc() Victor Stinner changed PyMem_Malloc() to use Python fast memory allocator Many benchmarks: 5% - 22% faster Check if the GIL is held in debug hooks Only numy misused the API (fixed) PYTHONMALLOC=debug now available in release builds to detect memory corruptions, bpo-26249

ElementTree parse Serhiy Storchaka optimized ElementTree.iterparse() 2x faster Follow-up of Brett Cannon’s Pycon Canada 2015 keynote :-) bpo-25638

PGO uses tests Brett Cannon modified the Profile Guided Optimization (PGO) The Python test suite is now used, rather than pidigits, to guide the compiler Many benchmarks: 5% – 27% faster bpo-24915

Wordcode Demur Rumed and Serhiy Storchaka modified the bytecode to always use 2 bytes opcodes Before: 1 (no arg) or 3 bytes (with arg) Removed an if from ceval.c hotcode for better CPU branch prediction: if (HAS_ARG(opcode)) oparg = NEXTARG();

bpo-26647

FASTCALL Victor Stinner wrote a new C API to avoid the creation of temporary tuples to pass function arguments Many microbenchmarks: 12% – 50% faster obj[0], getattr(obj, "attr"), {1: 2}.get(1), list.count(0), str.replace("a","b"), …

Avoid 20 ns per modified function call

Unicode codecs Victor Stinner optimized ASCII and UTF-8 codecs for ignore, replace, surrogateescape and surrogatepass error handlers UTF-8: decoder 15x faster, encoder 75x faster ASCII: decoder 60x faster, encoder 3x faster

bytes % args PEP 461 added back bytes % args to Python 3.5 Victor Stinner wrote a new _PyBytesWriter API to optimize functions creating bytes and bytearray strings bytes % args: 2x faster bytes.fromhex(): 3x faster

Globbing Serhiy Storchaka optimized glob.glob(), glob.iglob() and pathlib globbing using os.scandir() (new in Python 3.5) glob: 3x - 6x faster Pathlib glob: 1.5x - 4x faster Avoid one stat() per directory entry bpo-25596, bpo-26032

asyncio Yury Selivanov and Naoki INADA reimplemented asyncio Future and Task classes in C Asyncio programs: 30% faster bpo-26081, bpo-28544

Agenda

(5) Python 3.7 optimizations

Method calls Yury Selivanov and Naoki INADA added LOAD_METHOD and CALL_METHOD opcodes Methods calls: 10% - 20% faster Idea coming from PyPy, bpo-26110

Future optimizations More optimizations are coming in Python 3.7… Stay tuned!

3.7 slower than 2.7 :-(

Results normalized to Python 2.7 higher = slower

Questions?

http://speed.python.org/ http://faster-cpython.readthedocs.io/

Optimizations which made Python 3.6 faster than Python 3.5 - GitHub

Benchmarks rewritten using perf: new project performance ... in debug hooks. Only numy misused the API (fixed) ... The Python test suite is now used, rather than ...

641KB Sizes 26 Downloads 302 Views

Recommend Documents

Covers Python 3 and Python 2 - GitHub
Setting a custom figure size. You can make your plot as big or small as you want. Before plotting your data, add the following code. The dpi argument is optional ...

Covers Python 3 and Python 2 - GitHub
You can add as much data as you want when making a ... chart.add('Squares', squares) .... Some built-in styles accept a custom color, then generate a theme.

Python Cryptography Toolkit - GitHub
Jun 30, 2008 - 1 Introduction. 1.1 Design Goals. The Python cryptography toolkit is intended to provide a reliable and stable base for writing Python programs that require cryptographic functions. ... If you're implementing an important system, don't

Scientific python + IPython intro - GitHub
2. Tutorial course on wavefront propagation simulations, 28/11/2013, XFEL, ... written for Python 2, and it is still the most wide- ... Generate html and pdf reports.

Annotated Algorithms in Python - GitHub
Jun 6, 2017 - 2.1.1 Python versus Java and C++ syntax . . . . . . . . 24. 2.1.2 help, dir ..... 10 years at the School of Computing of DePaul University. The lectures.

Dan Dietz Greenville Django + Python Meetup - GitHub
Awaken your home: Python and the. Internet of Things. PyCon 2016. • Architecture. • Switch programming. • Automation component. Paulus Schoutsen's talk: ...

QuTiP: Quantum Toolbox in Python - GitHub
Good support for object-oriented and modular programming, packaging and reuse of code, ... integration with operating systems and other software packages.

Introduction to Scientific Computing in Python - GitHub
Apr 16, 2016 - 1 Introduction to scientific computing with Python ...... Support for multiple parallel back-end processes, that can run on computing clusters or cloud services .... system, file I/O, string management, network communication, and ...

Beyond Hive – Pig and Python - GitHub
Pig performs a series of transformations to data relations based on Pig Latin statements. • Relations are loaded using schema on read semantics to project table structure at runtime. • You can run Pig Latin statements interactively in the Grunt s

Faster than possible.key
App Code. en_US. Safari 2.0.x. App Code. fr_FR … IE 6. App Code. en_UK … Opera 9. App Code. fr_CA ... When Scripting, Abstractions Cost Ya. • Equivalent ...

Faster than possible.key
A modest goal: the best attainable performance ... App Code. en_US. Safari 2.0.x. App Code. fr_FR … IE 6. App Code. en_UK ... Particularly horrible for mobile.

Cheetah: Faster Insertions - GitHub
In-Memory Database (Java). ○ Developed by Alan Lu. ○ JSON ~> Table. ○ Exploration of various data stores. ○ Queried via SQL statements.

automate the boring stuff with python automate the boring ... - GitHub
This book is not designed as a reference manual; it's a guide for begin- ners. The coding style sometimes .... Be sure to download a version of Python 3 (such as 3.4.0). The programs in this book are written to ...... two different programming projec

Matrices and matrix operations in R and Python - GitHub
To calculate matrix inverses in Python you need to import the numpy.linalg .... it for relatively small subsets of variables (maybe up to 7 or 8 variables at a time).

Grandalf : A Python module for Graph Drawings - GitHub
Existing Tools see [1] program flow browsers: ▷ IDA, BinNavi: good interfaces but fails at semantic-driven analysis, graph drawings: ▷ general-purpose, 2D:.

Track memory leaks in Python - Pycon Montreal 2014 - GitHub
Track memory leaks in Python. Page 2. Python core developer since 2010 github.com/haypo/ bitbucket.org/haypo/. Working for eNovance. Victor Stinner. Page 3 ...