numexpr

Project site
Description
Fast numerical array expression evaluator for Python and NumPy.
Last commit
9 years ago
Last release
5 years ago
Repository

What it is

The numexpr package evaluates multiple-operator array expressions many times faster than NumPy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it to faster Python code on the fly. It's the next best thing to writing the expression in C and compiling it with a specialized just-in-time (JIT) compiler, i.e. it does not require a compiler at runtime.

Also, numexpr has support for the Intel VML (Vector Math Library) -- integrated in Intel MKL (Math Kernel Library) --, allowing nice speed-ups when computing transcendental functions (like trigonometrical, exponentials...) on top of Intel-compatible platforms. This support also allows to use multiple cores in your computations.

Why It Works

There are two extremes to array expression evaluation. Each binary operation can run separately over the array elements and return a temporary array. This is what NumPy does: 2a + 3b uses three temporary arrays as large as a or b. This strategy wastes memory (a problem if the arrays are large). It is also not a good use of CPU cache memory because the results of 2a and 3b will not be in cache for the final addition if the arrays are large.

The other extreme is to loop over each element:

```for i in xrange(len(a)):
c[i] = 2a[i] + 3b[i]```

This conserves memory and is good for the cache, but on each iteration Python must check the type of each operand and select the correct routine for each operation. All but the first such checks are wasted, as the input arrays are not changing.

numexpr uses an in-between approach. Arrays are handled in chunks (the first pass uses 256 elements). As Python code, it looks something like this:

```for i in xrange(0, len(a), 256):
r0 = a[i:i+256]
r1 = b[i:i+256]
multiply(r0, 2, r2)
multiply(r1, 3, r3)
c[i:i+256] = r2```

The 3-argument form of add() stores the result in the third argument, instead of allocating a new array. This achieves a good balance between cache and branch prediction. The virtual machine is written entirely in C, which makes it faster than the Python above.

Examples of Use

Using it is simple:

```>>> import numpy as np
>>> import numexpr as ne

>>> a = np.arange(1e6)   # Choose large arrays for high performance
>>> b = np.arange(1e6)

>>> ne.evaluate("a + 1")   # a simple expression
array([  1.00000000e+00,   2.00000000e+00,   3.00000000e+00, ...,
9.99998000e+05,   9.99999000e+05,   1.00000000e+06])

>>> ne.evaluate('ab-4.1a > 2.5b')   # a more complex one
array([False, False, False, ...,  True,  True,  True], dtype=bool)```

and fast... :-)

```>>> timeit a2 + b2 + 2ab
10 loops, best of 3: 33.3 ms per loop
>>> timeit ne.evaluate("a2 + b2 + 2a*b")
100 loops, best of 3: 7.96 ms per loop   # 4.2x faster than NumPy```

• buzz 7
• delicious 46
• diigo 2

Source Code Commits

• 148. Merged in r147 from 1.3 branch (Small output beautification). (9 years ago)
• 148. Merged in r147 from 1.3 branch (Small output beautification). (9 years ago)
• 148. Merged in r147 from 1.3 branch (Small output beautification). (9 years ago)
• 148. Merged in r147 from 1.3 branch (Small output beautification). (9 years ago)
• 148. Merged in r147 from 1.3 branch (Small output beautification). (9 years ago)

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“There are only 3 numbers of interest to a computer scientist: 1, 0 and infinity”