Python Numeric Tutorial

(typeset 22 October 1996)

Copyright (C) 1996 David Ascher

All Rights Reserved

Preface

This is a tutorial (1) which covers in detail the new "multidimensional array objects" introduced to the latest version of Python by the Numeric (a.k.a. NumPy) extension, written by Jim Hugunin. Python's home is @url{http://www.python.org}, and Numeric's current home is @url{http://www.lcs.mit.edu/jjh/python/numeric}. This tutorial was written for Python 1.4 and Numeric 1.0.

These new arrays are distinct from the array type described in the Python tutorial and reference manuals in that these new arrays support multiple dimensions, and are much more full-featured than the previous type. They differ from "list" objects in that they can be multidimensional and in that all the elements of the new arrays have to be of the same type (e.g. Integer, Float, Complex). The new arrays were designed for "serious scientific computation," and hopefully this tutorial will make some of their power obvious -- with these arrays, Python becomes an efficient tool for many kinds of scientific and engineering projects.

The aim of this tutorial is to teach the novice Python user how to use these powerful new features. If you are familiar with numerical software like MATLAB, Basis, Yorick, J, S+, IDL, etc. you will recognize some features, and this should make learning NumPy that much easier. Keep in mind however that NumPy was designed because none of those systems solved all the problems that the author wanted to solve. The most blatant example of that is MATLAB -- while it's a very popular system, it is unable to deal "naturally" with arrays of dimension greater than two -- I don't know a single MATLAB user who hasn't been frustrated at least once by this limitation. Clearly, Numeric has its own limitations, especially at this early stage in its development. We like to think that that is a temporary state of being. Please keep in mind that while the "core" functionality of the arrays as described in this tutorial is fairly stable (after all, it has been in serious alpha testing for many months), we expect many many more modules will build on the core, just like the LinearAlgebra, FFT and RandomArray modules described later in this tutorial.

If you are used to a mathematical package such as those named above, and don't know Python, some things may seem odd. Chances are, oddities are due to the fact that Numeric is a part of Python, which is a general purpose language -- Python's generality means that some mathematical expressions are a bit more verbose in Numeric than in some other, more specialized packages (especially if one compares it to a language like APL). Users of Numeric generally feel that this verbosity is more than offset by the power Python gives them. This tutorial will not go over the non-Numeric parts of Python, so if you don't know any Python, I'd recommend reading the Tutorial to Python at @url{http://www.python.org/doc/tut/tut.html} before this document -- it'll make things much clearer. If you've never used a numerical tool, and you need to manipulate large sets of numbers, you're in for a treat.

The tutorial is divided into N parts. First, some definitions of commonly used terms are presented. The bulk of "real" section covers the array objects and things one can do with and to them. The next section will describe each of the "utility" modules which come with NumPy: The linear algebra module, the Fast Fourier Transform (FFT) module, the MLab (Matlab compatibility) module and the random number module.

Terminology and Setting up

Discussions of arrays and matrices and vectors can get confusing due to disagreements on the nomenclature. Here is a brief definition of the terms used in this tutorial:

The python objects under discussion are called array objects. These supersede the old array objects defined in the array module.

These array objects are arrays in the computer science meaning of homogeneous block of identical items. This is quite different from most Python container objects, which can contain heterogeneous sets. Imposing this restriction was a necessary step for a fast implementation, and in practice, it is not a cumbersome restriction.

Any given array object has a rank, which is the number of "dimensions" or "axes" it has. For example, a point in 3D space [1, 2, 1] is an `array' of rank 1 -- it has one dimension. That dimension has a length of 3.

As another example, the array

	1.0  0.0  0.0
	0.0  1.0  2.0
	

is an array of rank 2 (it is 2-dimensional). The first dimension has a length of 2, the second dimension has a length of 3. Because the word "dimension" has many different meanings to different folks, in general the word "axis" is used instead. Axes are numbered just like Python list indices: they start at 0, and can also be counted from the "end", so that axis -1 is the last axis of an array, axis -2 is the penultimate axis, etc.

One important fact to keep in mind is that by default, operations on arrays are performed element-wise. This means that when adding two arrays, the resulting array has as elements the pairwise sums of the two operand arrays. This is true for all operations, including multiplication. Thus array multiplication using the * operator will default to element-wise multiplication, not matrix multiplication as used in linear algebra. Many people will want to use arrays as linear algebra-type matrices (including their rank-1 homolog, vectors). For those users, the Matrix module built using the Numeric module provides a more intuitive interface. This module is documented at the end of this tutorial.

Now that all of these definitions are laid out, let's see what we can do with these arrays.

Before one can start creating or using arrays, the Numeric module must be loaded. This can be done, like all Python modules, either using:

	from Numeric import *
	

or

	import Numeric
	

If you do the latter, then you have to prepend Numeric. to all the function calls. There are good reasons for doing so, but we'll use the former just because it makes the code in this Tutorial easier to read. Throughout the tutorial, examples will be displayed as if you were in the Python interpreter. This is to encourage you to "play along" with the tutorial -- doing so is the best way to gain a feeling for the power of the tool. So, first type from Numeric import *:

	>>> from Numeric import *
	

Creating arrays

Creating arrays from existing numbers

There are many ways to create arrays. The most basic one is the array() function:

	>>> a = array([1.2, 3.5, -1])
	

to make sure this worked, do:

	>>> a   
	1.2 3.5 -1.
	

Remember that in the interpreter typing a variable on a line by itself is synonymous with printing it. The exact format of the printout may be slightly different depending on what version you are using -- regardless, it should work.

The array(numbers, typecode=None) function takes two arguments -- the first one is the values, which have to be in a python sequence object (either a list or a tuple). The second one is the typecode of the elements, and is optional. If it is omitted, as in the example above, Python tries to find the one type which can represent all the elements. Since the elements we gave our example were two floats and one integer, it chose `float' as the type of the resulting array. If one specifies the typecode, one can specify unequivocally the type of the elements -- this is especially useful when, for example, you want to make sure that an array contains floats even though in some cases all of its elements could be represented as integers:

	>>> x,y,z = 1,2,3
	>>> a = array([x,y,z])
	>>> a
	1 2 3 
	>>> a = array([x,y,z], Float)
	>>> a
	1.0000 2.0000 3.0000
	

Pop Quiz: What will be the type of an array defined as follows:

	>>> mystery = array([1, 2.0, -3j])
	

Hint: -3j is an imaginary number

Answer: try it out!

The possible values for the second argument to the array creator function (and indeed to any function which accepts a so-called typecode for arrays are:

If you use Int, Float, or Complex (that is, without a bit length qualifier), then you will get the largest version of the given type available on your hardware. On Linux, the equivalences are:

	Int     <==>  Int32
	Float   <==>  Float64
	Complex <==>  Complex64
	

The last typecode deserves a little comment. Indeed, it seems to indicate that arrays can be filled with any Python objects. This appears to violate the notion that arrays have homogeneous elements. In fact, the typecode PyObject does allow heterogeneous arrays. However, if you plan to do numerical computation, you're much better off with a homogeneous array with a potentially "large" type than with a heterogeneous array. This is because a heterogeneous array stores references to objects, which incur a memory cost, and because the speed of computation is much slower with arrays of PyObject's than with uniform number arrays. Why does it exist, then? Well, it turns out that a very useful set of features of arrays is the ability to slice them, dice them, select and choose from them, and repeat their elements. This feature is so nice that sometimes one wants to do the same operations with, e.g., arrays of strings. In such cases, computation speed is not as important as convenience. There, a mystery is exaplined (2).

The following example shows one way of creating multidimensional arrays:

	>>> ma = array([[1,2,3],[4,5,6],[7,8,9]])
	>>> ma
	1 2 3
	4 5 6
	7 8 9
	

Note that the first argument to array() in this case is a single list containing three lists, each containing three elements. If we wanted floats instead, we could use:

	>>> ma_floats = array([[1,2,3],[4,5,6],[7,8,9]], Float)
	>>> ma_floats
	1. 2. 3.
	4. 5. 6.
	7. 8. 9.
	

This brings up the notion of `shape'. The shape of an array, as you might think, is the set of numbers which define its dimensions. So, the shape of the array ma defined above is:

	>>> ma.shape
	(3, 3)
	

Using the definitions presented above, this is a shape of rank 2, with each dimension equal to 3. The rank of an array A is always equal to len(A.shape).

Note that shape is an attribute of the array objects. It is the first of several which we will see throughout this tutorial. If you're not used to object-oriented programming, you can think of it as a "feature" or a "quality" of arrays. The relation between an array and its shape is similar to the relation between a person and their hair color. In Python, it's called an object/attribute relation.

What if one wants to change the dimensions of an array? For now, let us consider changing the shape of an array without making it "grow". Say, for example, we want to make that 3x3 array above (ma) an array of rank 1. All we need to do is:

	>>> flattened_ma = reshape(ma, (9,))
	>>> flattened_ma
	1 2 3 4 5 6 7 8 9
	

One can change the shape of arrays as long as the product of all the lengths of all the dimensions is kept constant:

	>>> a = array([1,2,3,4,5,6,7,8])
	>>> a
	1 2 3 4 5 6 7 8
	>>> b = reshape(a, (2,4))    #  2*4 == 8
	>>> b
	1 2 3 4
	5 6 6 7
	>>> c = reshape(b, (4,2))    #  4*2 == 8
	>>> c
	1 2
	3 4 
	5 6
	7 8
	

Notice that we used a new function, reshape(). It is a function defined in the Numeric module (so in general you may want to refer to it as Numeric.reshape()). It expects an array as its first argument, and a shape as its second argument. The shape has to be a sequence object (a list or a tuple). Keep in mind that a tuple with a single element needs a comma at the end, e.g.: (5,).

One nice feature of shape tuples, is that one entry in the shape tuple can be -1 -- it will be replaced by whatever number is needed to build a shape which does not change the size of the array. Thus:

	>>> a = reshape(arrayrange(25), (5,-1))
	>>> print a, a.shape
	 0  1  2  3  4
	 5  6  7  8  9
	10 11 12 13 14
	15 16 17 18 19
	20 21 22 23 24
	

Furthermore, as we mentioned earlier, the shape of an array is an attribute of the array. This attribute is not write-protected. This means that you can change the shape of an array simply by assigning a new shape to it:

	>>> a = array([1,2,3,4,5,6,7,8,9,10])
	>>> a.shape
	(10,)
	>>> a.shape = (2,5)
	>>> a
	 1  2  3  4  5
	 6  7  8  9 10
	>>> a.shape  = (10,1)
	1
	2
	3
	4
	5
	6       
	7
	8       
	9
	10
	>>> a.shape  = (5,-1)   # note the -1 trick described above
	 1  2
	 3  4
	 5  6
	 7  8
	 9  10
	

Fun, fun, fun.

Aside about printing

The default printing routine provided by the Numeric module prints arrays as follows: the last dimension is always printed left to right, the next-to-last top to bottom, and the remaining ones top to bottom with increasingly thick separators.

This explains why rank-1 arrays are printed from left to right, rank-2 arrays have the first dimension going down the screen and the second dimension going from left to right, etc.

If you want to change the shape of an array so that it has more elements than it started with (i.e. grow it), then you have many options: One solution is to use the concat() method defined towards the end of this tutorial. An alternative is to use the array() creator function with existing arrays as arguments:

	>>> a
	0 1 2 
	3 4 5
	6 6 7
	>>> b = array([a,a])
	>>> b
	0 1 2
	3 4 5
	6 7 8
	
	0 1 2
	3 4 5
	6 7 8
	>>> b.shape
	(2, 3, 3)
	
A final possibility is the resize() function, which is quite powerful: it takes a "base" array as its first argument and the desired shape as the second argument. So starting with a simple array:

	>>> base = array([0,1])
	

One can quickly build:

	>>> big = resize(base, (9,9))
	>>> print big
	0 1 0 1 0 1 0 1 0
	1 0 1 0 1 0 1 0 1
	0 1 0 1 0 1 0 1 0
	1 0 1 0 1 0 1 0 1
	0 1 0 1 0 1 0 1 0
	1 0 1 0 1 0 1 0 1
	0 1 0 1 0 1 0 1 0
	1 0 1 0 1 0 1 0 1
	0 1 0 1 0 1 0 1 0
	

I recommend reading the reference entry on resize() before going too far with it -- it can be tricky if you're not careful.

Creating arrays with values specified `on-the-fly'

zeros(shape, typecode=None) and ones(shape, typecode=None)

Often, one needs to manipulate arrays filled with numbers which aren't available beforehand. The Numeric module provides a few functions which come in quite handy:

The first two simply create arrays of a given shape filled with zeros and ones respectively:

	>>> z = zeros((3,3))
	>>> z
	0 0 0
	0 0 0
	0 0 0
	
	>>> o = ones([2,3])
	>>> o
	1 1 1
	1 1 1
	

Note that the first argument is a shape -- again, this means that it needs to be a list or a tuple. Also note that the default type for these is Int (since that is the smallest type which can represent 0's and 1's), which you can feel free to override using something like:

	>>> o = ones((2,3), Float)
	>>> o
	1. 1. 1.
	1. 1. 1.
	

arrayrange(start=0,stop,step=1,typecode=None)

A slightly more subtle utility function is the arrayrange() function, which is just like the range() function in Python, except that it returns an array as opposed to a list.

	>>> r = arrayrange(10)
	>>> r
	0 1 2 3 4 5 6 7 8 9
	

Combining the arrayrange() with the reshape() function, we can get:

	>>> big = reshape(arrayrange(100),(10,10))
	>>> big
	 0  1  2  3  4  5  6  7  8  9
	10 11 12 13 14 15 16 17 18 19
	20 21 22 23 24 25 26 27 28 29
	30 31 32 33 34 35 36 37 38 39
	40 41 42 43 44 45 46 47 48 49
	50 51 52 53 54 55 56 57 58 59
	60 61 62 63 64 65 66 67 68 69
	70 71 72 73 74 75 76 77 78 79
	80 81 82 83 84 85 86 87 88 89
	90 91 92 93 94 95 96 97 98 99
	

One useful shorthand is arange, which is synonymous with arrayrange (Python has few of them on purpose, but this one couldn't be helped -- it's much too useful)

Also note that you can set the start, stop and step arguments, which allow more varied ranges:

	>>> print arrayrange(10,-10,-2)
	10  8  6  4  2  0 -2 -4 -6 -8
	

If you want to create an array with just one value, repeated over and over, you can use the * operator applied to lists

	>>> a = array([[3]*5]*5)
	>>> a
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	

but that is fairly slow, since the expansion is done on Python lists. A quicker way would be to start with 0's and add:

	>>> a = 3+zeros([5,5],Int)
	>>> a
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	3 3 3 3 3
	

Finally, one may want to create an array from a function. This is done using the fromfunction function, which takes two arguments, a shape and a function. The function can of course be a lambda expression:

	>>> def dist(x,y):
	...    return (x-5)**2+(y-5)**2
	...
	>>> m = fromfunction(dist, (10,10))
	>>> m
	50 41 34 29 26 25 26 29 34 41
	41 32 25 20 17 16 17 20 25 32
	34 25 18 13 10  9 10 13 18 25
	29 20 13  8  5  4  5  8 13 20
	26 17 10  5  2  1  2  5 10 17
	25 16  9  4  1  0  1  4  9 16
	26 17 10  5  2  1  2  5 10 17
	29 20 13  8  5  4  5  8 13 20
	34 25 18 13 10  9 10 13 18 25
	41 32 25 20 17 16 17 20 25 32
	
	>>> m = fromfunction(lambda i,j,k: 100*i+10*j+k, (4,2,3))
	>>> m
	  0   1   2
	 10  11  12
	
	100 101 102
	110 111 112
	
	200 201 202
	210 211 212
	
	300 301 302
	310 311 312
	

Currently, the function which is called by fromfunction can only use numerical operations and ufuncs on its arguments. It cannot, for example, use if ... else: ... constructs. This is due to the fact that fromfunction does operations "en masse" to speed up evaluation. If you need to fill an array with the result of a function which does not meet these criteria, you can always use a function like:

	XXXX Need code here
	

One more array constructor is the asarray(a, typecode=None) function. It is used if you want to have an array of a specific typecode and you don't know what typecode array you have (for example, in a generic function which can operate on all kinds of arrays, but needs them to be converted to complex arrays). If the array it gets as an argument is of the right typecode, it will get sent back unchanged. If the array is not of the right typecode, each element of the new array will be the result of the coercion to the new type of the old elements. Keep in mind that the int() coercion returns the neighboring integer closest to 0, not the nearest integer.

The last array constructor we'll cover here is the identity(n) function, which takes a single integer argument and returns a square identity array of that size:

	>>> print identity(5)
	1 0 0 0 0
	0 1 0 0 0
	0 0 1 0 0
	0 0 0 1 0
	0 0 0 0 1
	

Manipulating Arrays

Simple operations

If you have a keen eye, you have noticed that one of the last examples did something new. It added a number to an array. Indeed, most Python operations applicable to numbers are directly applicable to arrays:

	>>> a = array([1,2,3])
	>>> a * 3
	3 6 9
	>>> sin(a)
	0.84147096 0.90929741 0.14112000
	>>> a + a
	2 4 6
	>>> a + 3
	4 5 6
	

Note that the + operator, for example, behaves differently depending on the type of its operands. When one of the operands is an array and the other is a number, the number is added to all the elements of the array and the resulting array is returned. When both elements are arrays, then a new array is created, where each element is the sum of the corresponding elements in the original arrays. If the shapes of the two arrays do not match exactly, then an exception is generated:

	>>> a = array([1,2,3])
	>>> a
	1 2 3
	>>> b = array([4,5,6,7])  # note this has four elements
	>>> a + b
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	ArrayError: frames are not aligned
	

Note what happens for arrays with `swapped' shapes:

	>>> a
	1 2 3
	>>> c = array([[4],[5],[6]])
	>>> c
	4
	5
	6
	>>> a + c
	5 6 7
	6 7 8
	7 8 9
	

To understand this, one needs to look carefully at the shapes of a and c:

	>>> a.shape
	(3,)
	>>> c.shape
	(3,1)
	

It is important to remember that an array of shape (5,) is not the same as an array of shape (5,1).

[XXX more here about the duplication of the matrix...]

At this point, a boring but important issue comes up. How does Numeric deal with type coercion? Can one add integers and floats? The answer is that types get coerced "up" to fit the circumstances, never down. [XXX examples]

Indexing Arrays

Ok, now that we have arrays, how do we get and set their values?

The simple answer is: "just like lists, use the [] notation". So, if you have a rank-1 array, the notation is just like with a list:

	>>> a = arrayrange(10)
	>>> a[0]
	0
	>>> a[1:5]
	1 2 3 4
	>>> a[:-1]
	9
	

The first difference with lists comes with multidimensional indexing:

	>>> a = arrayrange(9)
	>>> a.shape = (3,3)
	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> a[0]
	0 1 2
	>>> a[1]
	3 4 5
	>>> a[0,0]
	0
	>>> a[0,1]
	1
	>>> a[1,0]
	3
	

Of course, the [] notation can be used to set values as well:

	>>> a[0,0] = 123
	>>> a
	123  1  2
	  3  4  5
	  6  7  8
	>>> a[1] = [10,11,12]
	123  1  2
	 10 11 12
	  6  7  8
	

Slicing Arrays

Hopefully you know about the details of slicing operations on Python lists. Briefly, if L is a list, then L[0] is the first element of L, L[:2] is the first two elements of the list, L[-1:] is the last element of the list, etc.

The same style of slicing applies to arrays, on a per-dimension basis. Assuming a 3x3 array:

	>>> a = reshape(arrayrange(9),(3,3))
	>>> a
	0 1 2
	3 4 5
	6 7 8
	

First, let's look at the plain [:] operator:

	>>> a[:,:]
	0 1 2
	3 4 5
	6 7 8
	

In other words, [:] with no arguments is the same as [:] for lists -- it can be read "all indices along this axis. So, to get the second row along the second dimension:

	>>> a[:,1]
	1 4 7
	

If one does not specify as many slices as there are dimensions in an array, then the remaining slices are assumed to be "all". In other words, if A is a rank-3 array, then A[1] == A[1,:] == A[1,:,:].

There is one addition to the slice notation for arrays which does not exist for lists, and that is the optional third argument, meaning the "step size" also called stride or increment:

	>>> a = arange(12)
	>>> a
	 0  1  2  3  4  5  6  7  8  9  10 11
	>>> a[::2]                 # return every *other* element
	 0  2  4  6  8  10
	
	>>> a = reshape(arrayrange(9),(3,3))
	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> a[:, 0]
	0 3 6
	>>> a[::-1, 0]
	6 3 0
	>>> a[::-1]       # note this reverses only the first axis
	6 7 8
	3 4 5
	0 1 2
	>>> a[::-1,::-1]  # this reverses both axes
	8 7 6
	5 4 3
	2 1 0
	

In summary:

	      List Slicing             Array Slicing
	 [3]                      [3]
	 [3:]                     [3:]  
	 [:3]                     [:3]
	 [:-3]                    [:-3]
	 [2:4]                    [2:4]
	 [:]                      [:]
	  loosely .reverse()              [::-1]
	

Thus, if one has a rank-2 array A, A[:,:] is the same thing as A.

One more interesting way of slicing arrays is with the keyword .... This keyword is somewhat complicated. It stands for "however many `:' I need depending on the rank of the object I'm indexing, so that the indices I *do* specify are at the end of the index list as opposed to the usual beginning."

So, if one has a rank-3 array A, then

	A[...,0] 
	

is the same thing as

	A[:,:,0]
	

but if B is rank-4, then

	B[...,0]
	

is the same thing as:

	B[:,:,:,0]
	

This "stretching" only works from the left: In other words, if ... is used after a non-pseudo index, then it is equivalent to :, and does not stretch. If C is a rank-5 array, then

	C[...,0,...]
	

is the same thing as

	C[...,0,:]
	

and

	C[:,:,:,0,:]
	

but not:

	C[:,:,0,:,:]
	

Ufuncs

reduce

If you don't know about the reduce() command in Python, go check out section 10.5.2 of the Tutorial.

Arrays have an efficient (read `fast') version of reduce, which can be useful, for example, to add up all the elements of an array:

	>>> a = array([1,2,3,4])
	>>> add.reduce(a)
	10
	
	>>> b = array([[1,2,3,4],[6,7,8,9]])
	>>> add.reduce(b)
	7 9 11 13
	

add is just one of many related functions. These are called ufuncs, which stands for universal functions. They are not plain functions but in fact quite sophisticated replacements for the builtin operators (such as +, -, *, etc.) and for the functions defined in the math module. They are:

	add, subtract, multiply, divide, remainder, power, arccos,
	arccosh, arcsin, arcsinh, arctan, arctanh, cos, cosh, exp,
	log, log10, sin, sinh, sqrt, tan, tanh, maximum, minimum,
	conjugate, equal, not_equal, greater, greater_equal, less,
	less_equal, logical_and, logical_or, logical_xor, logical_not,
	boolean_and, boolean_or, boolean_xor, boolean_not
	

The first feature that they have is that they can operate on arrays (well, clearly, otherwise I wouldn't be mentioning them here). The reason they are called "universal" is that they can operate on any Python sequence (e.g. lists and tuples). Witness:

	>>> a = arange(10)
	>>> a
	0 1 2 3 4 5 6 7 8 9
	>>> b = range(10)
	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
	>>> add(a,b)
	 0  2  4  6  7  8  10  12  14  16  18
	

The second useful feature of these ufuncs is that they have four methods: reduce(), accumulate(), outer(), and reduceat. Reduce performs the method on each element in the argument in succession, just like the builtin reduce() Python function. For example:

	>>> a = arange(10)
	>>> a
	0 1 2 3 4 5 6 7 8 9
	>>> add.reduce(a)
	45                    # that's 1 + 2 + 3 + ... + 9
	

The accumulate ufunc is just like add, except that it returns an array with the intermediate results:

	>>> a = arange(10)
	>>> a
	0 1 2 3 4 5 6 7 8 9
	>>> add.accumulate(a)
	 0  1  3  6 10 15 21 28 36 45   # 0, 0+1, 0+1+2, 0+1+2+3, ...
	

outer takes two arrays as arguments and returns the outer ufunc of the two arguments. If one uses the multiply ufunc, then one gets the outer product. For obvious reasons, the outer method is only supported for binary methods (what would sin.outer(a,b) mean, anyway?).

	>>> a = arange(5)
	>>> a
	0 1 2 3 4
	>>> b = arange(5)
	>>> b
	0 1 2 3 4
	>>> add.outer(a,b)
	0 1 2 3 4
	1 2 3 4 5
	2 3 4 5 6
	3 4 5 6 7
	4 5 6 7 8
	>>> multiply.outer(a,b)
	 0  0  0  0  0
	 0  1  2  3  4
	 0  2  4  6  8
	 0  3  6  9 12
	 0  4  8 12 16
	>>> power.outer(a,b)
	  1   0   0   0   0
	  1   1   1   1   1
	  1   2   4   8  16
	  1   3   9  27  81
	  1   4  16  64 256
	

Now, a delicate point must be addressed. Because arrays can be multidimensional, ufunc methods (except outer) take an optional argument which is the axis along which they should iterate. The default value for this axis is 0, meaning the first axis.

Finally, all ufuncs (and not ufunc methods, notice) can take as a final argument an array which will be the "return array" for the ufunc. This array has to already exist, and be have the proper typecode and shape. For example:

	>>> a = arange(5)
	>>> b = arange(5)
	>>> c = arange(5)
	>>> add(a,b,c)
	0 2 4 6 8
	>>> c
	0 2 4 6 8
	

Some of the ufuncs are clear enough, but some may require a little explanation. Well understood, they will become your dearest friends (well, maybe I exaggerate a little, but still...).

Simple Ufuncs

I'll skip these, since they're easily understood:

Unary Ufuncs (take only one argument)

Binary Ufuncs (take two arguments)

Also simple is the conjugate unary ufunc, which returns an array with as elements the complex conjugates of the elements in the sequence it gets as an argument.

Comparison ufuncs

We've already discussed how to find out about the contents of arrays based on the indices in the arrays -- that's what the various slice mechanisms are for. Often, especially when dealing with the result of computations or data analysis, one needs to "pick out" parts of matrices based on the content of those matrices. For example, it might be useful to find out which elements of an array are negative, and which are positive. The comparison ufuncs are designed for just this type of operation. Assume an array with various positive and negative numbers in it (for the sake of the example we'll generate it from scratch):

	>>> a = reshape(arange(25), (5,5))
	>>> a
	 0  1  2  3  4
	 5  6  7  8  9
	10 11 12 13 14
	15 16 17 18 19
	20 21 22 23 24
	>>> b = sin(a)
	>>> b
	 0.          0.84147098  0.90929743  0.14112001 -0.7568025 
	-0.95892427 -0.2794155   0.6569866   0.98935825  0.41211849
	-0.54402111 -0.99999021 -0.53657292  0.42016704  0.99060736
	 0.65028784 -0.28790332 -0.96139749 -0.75098725  0.14987721
	 0.91294525  0.83665564 -0.00885131 -0.8462204  -0.90557836
	

The comparison ufuncs (less, greater, greater_equal, less_equal, equal, not_equal) return arrays with the same shape as the arrays they get as first arguments and with values which correspond to the result of the pairwise comparison between the elements and the second argument. This is complicated to explain in natural language but quite simple to illustrate:

	>>> less_equal(b, 0)
	0 0 0 0 1
	1 1 0 0 0
	1 1 1 0 0
	0 1 1 1 0
	0 0 1 1 1
	

If you examine the result, you will see that the positions in the array which are set to 1 correspond to the positions in the array b which have negative numbers in them. You can play with the others and verify that they work the way you'd expect.

A similar pair of ufuncs is the maximum, minimum pair which returns an array with elements which are the maxima (or minima) of the element and the second argument.

	>>> maximum(b, 0)
	 0.          0.84147098  0.90929743  0.14112001  0.        
	 0.          0.          0.6569866   0.98935825  0.41211849
	 0.          0.          0.          0.42016704  0.99060736
	 0.65028784  0.          0.          0.          0.14987721
	 0.91294525  0.83665564  0.          0.          0.        
	

While these examples have used single numbers as second arguments, ufuncs are often quite useful when applied on two arrays.

[XXX EXAMPLES]

Numeric defines a few functions which correspond to often-used uses of ufuncs: for example, add.reduce() is synonymous with the sum() utility function:

	>>> a = arange(5)
	>>> sum(a)
	10
	

Related is the cumsum(), equivalent to add.accumulate (for "cumulative sum"):

	>>> cumsum(a)
	 1  3  6 10
	>>> b = reshape(arange(10),(2,5))
	>>> b
	0 1 2 3 4
	5 6 7 8 9
	>>> cumsum(b)
	0 1 2  3  4
	5 7 9 11 13
	

which adds each element to the next one. Note that in all of these cases, the process follows the first dimension. Compare the last example with:

	>>> c = reshape(arange(10),(5,2))
	>>> c
	0 1
	2 3
	4 5
	6 7
	8 9
	>>> cumsum(c)
	 0  1
	 2  4
	 6  9
	12 16
	20 25
	

You can easily find out what product() and cumproduct() do. Additional "utility" functions which are often useful are:

	>>> a = arange(5)
	>>> greater(a,0)
	0 1 1 1 1
	>>> alltrue(greater(a,0))
	0
	>>> sometrue(greater(a,0))
	1
	

Pseudo Indices and Broadcasting

As we've already mentioned, this returns a reshaped version of the input array. Also, the shape tuple can contain a -1 which will be replaced by the appropriate dimension (assuming it's possible -- resizing a (5,5) array using a tuple like (13,-1) isn't going to work since 25 is not divisible by 13...

All this is old hat to you by now. Here's something which isn't. You might have noticed that one can go from a rank-1 array to a rank-2 array by changing the shape:

	>>> a = arange(6)
	0 1 2 3 4 5
	>>> a.shape
	(6,)
	>>> a.shape = (6,1)
	>>> a
	0
	1
	2
	3
	4
	5
	

This example shows that while the product of all of the elements in an array's shape tuple has to be a constant, one can freely add axes with unit length. Why would one want to do this? Well, the answer has to do with broadcasting of arrays to higher dimensions. Clear as mud, isn't it? Let's try again.

Consider multiplication of a rank-1 array by a scalar:

	>>> a = array([1,2,3])
	>>> a * 2
	2 4 6
	

This should be trivial to you by now. But notice that we've just multiplied a rank-1 array by a rank-0 array. In other words, the rank-0 array was broadcast to the next rank. This works for adding two rank-1 arrays as well:

	>>> a
	1 2 3
	>>> a + array([4])
	5 6 7
	

but it won't work if either of the two rank-1 arrays have non-matching dimensions which aren't 1 -- put another way, broadcast only works for dimensions which are either missing (e.g. a lower-rank array) or for dimensions of 1.

Why do all this? Well, the classic example is matrix multiplication. Suppose one wants to multiply the row vector [10,20] by the column vector [1,2,3].

	>>> a = array([10,20])
	>>> b = array([1,2,3])
	>>> a * b
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	ValueError: frames are not aligned
	

Well, this makes sense -- we're trying to multiply a rank-1 array of shape (2,) with a rank-1 array of shape (3,). This violates the laws of broadcast. What we really want to do is make the second vector a vector of shape (3,1), so that the first vector can be broadcast accross the second axis of the second vector. One way to do this is to use the reshape function:

	>>> a.shape
	(2,)
	>>> b.shape
	(3,)
	>>> b2 = reshape(b, (3,1))
	>>> b2
	1
	2
	3
	>>> b2.shape
	(3, 1)
	>>> a * b2
	10 20
	20 40
	30 60
	

This is such a common operation that a special feature was added (it turns out to be useful in many other places as well) -- the NewAxis "pseudo-index" (3). Here's how it works. It is an index, so it's used inside of the slice brackets []. It can be thought of as "add a new axis here," in much the same ways as adding a 1 to an array's shape adds an axis. Again, examples help clarify the situation:

	>>> b
	1 2 3
	>>> b.shape
	(3,)
	>>> c = b[:, NewAxis]
	>>> c
	1
	2
	3
	>>> c.shape
	(3,1)
	

Why use such a pseudo-index over the reshape function or shape assignments? The answer is that often one doesn't really want a new array with a new axis, one just wants it for an intermediate computation. Witness the array multiplication mentioned above, without and with pseudo-indices:

	>>> without = a * reshape(b, (3,1))
	
	>>> with = a * b[:,NewAxis]
	

The second is much more readable (once you understand how NewAxis works), and it's much closer to the intended meaning. Also, it's independent of the dimensions of the array b (4). It also works nicely with higher rank arrays, and with the ellipses "rubber index" mentioned earlier. Adding an axis before the last axis in an array can be done simply with:

	>>> a[...,NewAxis,:]
	

If you haven't been convinced of the power of such constructs yet, try thinking of the C or FORTRAN code needed to do the equivalent work... It's of course doable, but it is not going to be this concise or elegant. operations

Array Attributes

We've already seen a very useful attribute of arrays, the shape attribute. There are three more, flat, real and imaginary.

flat: an array, flattened

a.flat returns the flattened, or ravel()'ed version of an array, without having to do a function call. It is a read-only attribute:

	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> a.flat
	0 1 2 3 4 5 6 7 8 
	

real and imaginary

These attributes exist only for complex arrays. They return respectively arrays filled with the real and imaginary parts of their elements. These, unlike flat, and like shape, are read/write attributes:

	>>> b
	 (0.54030231+0.84147098j) (-0.41614684+0.90929743j)
	(-0.9899925 +0.14112001j) (-0.65364362-0.7568025j) 
	>>> b.real
	 0.54030231 -0.41614684
	-0.9899925  -0.65364362
	>>> b.imaginary
	 0.84147098  0.90929743
	 0.14112001 -0.7568025 
	>>> b.imaginary[0,0] = 1.0
	>>> b
	 (0.54030231+1.j)         (-0.41614684+0.90929743j)
	(-0.9899925 +0.14112001j) (-0.65364362-0.7568025j) 
	

Array Functions

Most of the useful manipulations with arrays is done with functions. This might be surprising given Python's object-oriented framework, and the fact that many of these functions could have been done with methods instead (and in fact they were in an earlier release of Numeric). The choice of functions was deliberate -- it allows these functions to work on arbitrary python sequences, not just arrays. I.e. transpose([[1,2],[3,4]]) works just fine, but [[1,2],[3,4]].transpose() couldn't possibly work. This approach also allows uniformity in interface between functions defined in the base system whether in C or in Python, and functions defined on array objects in extensions. The only functions which are array methods are those which depend critically on the implementation details of array objects, and they are discussed in the next section.

We've already seen two functions which operate on arrays, namely reshape() and resize(). The first has been covered in detail, now is let's have some fun with the resize() function and the rest of the arsenal.

The resize() function takes an array and a shape and returns a new array of the given shape. Sounds like reshape(), you say? Well, yes, except that the shape doesn't need to correspond in any way to the shape of the input array. If it's smaller, the new array will be smaller, if it's bigger, the new array will be bigger. The only points left to explain is how resize() fills the new array based on the input array. The answer is that the new array is filled in the order of the ravel()'ed or flattened elements of the input array. If the new array is smaller, then the elements at the "end" of the old array will be dropped. If it is bigger, then the raveled array elements are reduplicated:

	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> resize(a, (2,2))
	0 1
	2 3
	>>> resize(a, (5,))
	0 1 2 3 4
	>>> resize (a, (5,5))
	0 1 2 3 4
	5 6 7 8 0
	1 2 3 4 5
	6 7 8 0 1
	2 3 4 5 6
	

take(a, indices, axis=0)

take() is in some ways like the slice operations. It selects the elements of the array it gets as first argument based on the indices it gets as a second argument. This is again something much easier to understand with an illustration:

	>>> a
	 0  1  2  3  4
	 5  6  7  8  9
	10 11 12 13 14
	15 16 17 18 19
	20 21 22 23 24
	>>> take(a, (0,))    # first row
	0 1 2 3 4
	>>> take(a, (0,1))   # first and second row
	0 1 2 3 4
	5 6 7 8 9
	>>> take(a, (0,-1))  # first and second row
	 0  1  2  3  4
	20 21 22 23 24
	

Notice that the second argument specifies the axis along which the selection occurs, and the default value (as in the examples above) is 0, the first axis. If you want another axis, then you can specify it:

	>>> take(a, (0,), 1)   # take first column
	 0
	 5
	 10
	 15
	 20
	>>> take(a, (0,1), 1)  # take first and second column
	 0  1
	 5  6
	10 11
	15 16
	20 21
	>>> take(a, (0,-1), 1) # take first and last column
	 0  4
	 5  9
	10 14    
	15 19
	20 24
	

This is considered to be a "structural" operation, because its result does not depend on the content of the arrays or the result of a computation on those contents but uniquely on the structure of the array. Like all such structural operations, the default axis is 0 (the first rank). I mention it here because later in this tutorial, we will see functions which have a default axis of -1.

transpose(a, axes=None)

transpose() takes an array and returns a new array which corresponds to a with the order of axes specified by the second argument. The default corresponds to flipping the order of all the axes (thus it is equivalent to a.shape[::-1]).

	>>> a
	 0  1  2  3  4
	 5  6  7  8  9
	10 11 12 13 14
	15 16 17 18 19
	20 21 22 23 24
	>>> transpose(a)
	 0  5 10 15 20
	 1  6 11 16 21
	 2  7 12 17 22
	 3  8 13 18 23
	 4  9 14 19 24
	

repeat(a, repeats, axis=0)

repeat() takes an array and returns an array with each element in the input array repeated as often as indicated by the corresponding elements in the second array.

[XXX Don't understand axis argument for this one yet]

	>>> d
	0 1 2 3
	>>> f
	1 2 3 4
	>>> repeat(d, f)
	0 1 1 2 2 2 3 3 3 3
	

choose(a, (b0, ..., bn))

a is an array of integers between 0 and n. The resulting array will have the same shape as a, with element selected from b0,...,bn as indicating by the value of the corresponding element in a.

Assume a is an array that you want to "clip" so that no values are greater than 100.0.

	(greater(a, 100.0)).choose(a, 100.0)
	

Everywhere that greater(a, 100.0) is false (ie. 0) this will "choose" the corresponding value in a. Everywhere else it will "choose" 100.0.

concatenate( (a0,...,an), axis=0)

The arrays will be concatenated along the given axis. All arrays must have the same shape along every axis except for the one given. To concatenate arrays along a newly created axis, you can use array( (a0,...,an) ) so long as all arrays have the same shape.

diagonal(a, k=0)

Returns the entries along the k th diagonal of a (k is an offset from the main diagonal). This is designed for 2d arrays. For larger arrays, it will return the diagonal of each 2d sub-array.

	>>> a
	 0  1  2  3  4
	 5  6  7  8  9
	10 11 12 13 14
	15 16 17 18 19
	20 21 22 23 24
	>>> diagonal(a,0)
	 0  6 12 18 24
	>>> diagonal(a,1)
	 1  7 13 19
	>>> diagonal(a,-1)
	 5 11 17 23
	

ravel(a)

ravel() returns the argument array a as a 1d array. It is equivalent to reshape(a, (-1,)) or a.flat.

	>>> a
	0 1 2 3 4
	5 6 7 8 9
	>>> ravel(a)
	0 1 2 3 4 5 6 7 8 9
	

nonzero(a)

nonzero() returns an array containing the indices of the elements in a that are nonzero. These indices only make sense for 1d arrays, so the function refuses to act on anything else. As of 1.0a5 this function does not work for complex arrays.

	>>> a
	 0 -2  0  0  0  2  0  4  0  6
	>>> nonzero(a)
	1 5 7 9
	

where(condition, x, y)

where(condition,x,y) returns an array shaped like condition and has elements of x and y where condition is respectively true or false

compress(condition, a, axis=0)

returns those elements of a corresponding to those elements of condition that are nonzero. condition must be the same size as the given axis of a.

	>>> XXX
	

trace(a, k=0)

returns the sum of the elements in a along the k th diagonal.

	>>> XXX
	

sort(a, axis=-1)

Will sort the elements of a along the given axis.

	>>> XXX
	

argsort(a, axis=-1)

Will return the indices of the elements of a needed to produce sort(a). ie. take(a, argsort(a)) == sort(a).

	>>> XXX
	

binarysearch(a, values)

a must be sorted in ascending order and 1d. This will return the indices of the positions in a where the corresponding values would fit.

	>>> XXX
	

argmax(a, axis=-1), argmin(a, axis=-1)

The argmax() function returns an array with the arguments of the maximum values of its input array a along the given axis. The returned array will have one less dimension than a. argmin() is just like argmax(), except that it returns the indices of the minima along the given axis.

	>>> a             # first we'll do it with rows (default axis = last axis)
	 0  1  2  3  4    # the biggest element in this row is at index 4
	 5  6  7  8  9    # the biggest element in this row is at index 4
	30 11 12 13 14    # the biggest element in this row is at index 0
	15 16 17 30 19    # the biggest element in this row is at index 3
	20 21 22 23  1    # the biggest element in this row is at index 4
	>>> argmax(a)     
	4 4 0 3 4
	>>> argmax(a,0)   # now do it for columns instead of rows (first axis)
	2 4 4 3 4
	>>> argmin(a)
	0 0 1 0 4
	>>> argmin(a,0)
	0 0 0 0 4
	

concatenate((m1, m2, ..., mn), axis=0)

The concatenate() function returns an array which is the result of appending all of the arrays in its first argument. Every dimension except the first must be the same in both matrices.

	>>> a
	0 1 2 3 4 
	5 6 7 8 9
	>>> b = reshape(arange(9,-1,-1),(2,5))
	>>> b
	9 8 7 6 5
	4 3 2 1 0
	>>> concatenate((a,b))
	0 1 2 3 4
	5 6 7 8 9
	9 8 7 6 5 
	4 3 2 1 0
	

fromstring(string, typecode)

Will return the array formed by the binary data given in string of the specified typecode. This is mainly used for reading binary data to and from files, it can also be used to exchange binary data with other modules that use python strings as storage (ie PIL).

	
	

dot(m1, m2)

Will return the dot product of a and b. This is equivalent to matrix multiply for 2d arrays (without the transpose). Somebody who does more linear algebra really needs to do this function right some day!

matrixmultiply(m1, m2)

The matrixmultiply() function is..

	>>>
	

ravel(m)

The ravel() function takes a single argument, which is an array. It returns the same array reshaped to be a rank-1 array.

	>>> a
	0 1 2 
	3 4 5
	6 7 8
	>>> ravel(a)
	0 1 2 3 4 5 6 7 8
	

clip(m, m_min, m_max)

The clip function creates an array with the same shape and typecode as m, but where every entry in m that is less than m_min is replaced by m_min, and every entry greater than m_max is replaced by m_max. Entries within the range [m_min, m_max] are left unchanged.

	>>> a = arange(9, Float)
	>>> clip(a, 1.5, 7.5)
	1.5000 1.5000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 7.5000
	
	

reverse(m)

The reverse function returns a version of its argument m where the outermost dimension is reversed:

	>>> a
	0 1 2 
	3 4 5
	6 7 8
	>>> reverse(a)
	6 7 8
	3 4 5
	0 1 2
	

copyAndReshape(m)

	>>>
	

indices(*dimensions)

This function returns a set of indices to use to create a multidimensional array from a nice functional form. The arguments are the dimensions of the array along each of its dimensions, and it returns the indices for that array, arranged in a tuple.

	>>> indices(3,3)
	(0
	1
	2, 0 1 2)
	
	

maximum.reduce, minimum.reduce, (m)

	>>>
	

where(m)

where(condition,x,y) is shaped like condition and has elements of x and where condition is respectively true or false

compress(m)

compress(condition, x, dimension=-1) = those elements of x corresponding to those elements of condition that are "true". condition must be the same size as the given dimension of x.

	>>>
	

Methods on array objects

As we discussed at the beginning of the last section, there are very few array methods for good reasons, and these all depend on the the implementation details. They're worth knowing, though:

a.astype(typecode)

Calling an array's astype() method returns a new array with the same shape as A but with the specified typecode, and all its elements (necessarily) cast to the new type. This may result in rounding off, etc.

Warning: this cast is the straight C cast from one type to another and contains no range checks at all! You can always add youre own range checks if youre concerned.

	>>> a
	0 1 2 3 4 5 6 7 8
	>>> b = a.astype(Float) + .1
	>>> b
	 0.1  1.1  2.1  3.1  4.1  5.1  6.1  7.1  8.1
	>>> c = b.astype(Int)
	>>> c
	0 1 2 3 4 5 6 7 8
	

a.transpose(new_dimensions)

The transpose method applied to an array A resorts the dimensions of A, to correspond to the order given in new_dimensions. None can be given to indicate a range to remain unchanged. If no argument given, then the order of the dimensions is completely reversed. As an example:

	>>> a
	0 1 2
	3 4 5
	>>> a.transpose()
	0 3
	1 4
	2 5
	

a.itemsize()

The itemsize() method applied to an array returns the number of bytes used by any one of its elements.

	>>> a= arange(10)
	>>> a.itemsize()
	4
	>>> a = array([1.0])
	>>> a.itemsize()
	8
	>>> a = array([1], Complex)
	>>> a.itemsize()
	16
	

a.iscontiguous()

Calling an array's iscontiguous() method returns true if the memory used by A is contiguous. A non-contiguous array can be converted to a contiguous one by the copy() method. [This is useful for XXXX ]

	>>>  # NEED GOOD EXAMPLE HERE XXX
	

a.typecode()

The `typecode()' method returns the typecode of the array it is applied to. While we've been talking about them as Float, Int, etc., they are represented internally as characters, so this is what you'll get:

	>>> a = array([1,2,3])
	>>> a.typecode()
	'l'
	>>> a = array([1], Complex())
	>>> a.typecode()
	'D'
	

a.byteswapped()

The byteswap method performs a byte swapping operation on all the elements in the array. This is useful if you have an array which was written on a machine with a different byte order.

	>>> a
	0 0 1
	0 0 0
	1 2 3
	>>> a.byteswap()
	       0        0 16777216
	       0        0        0
	16777216 33554432 50331648
	

a.tostring()

The `tostring()' method returns a string representation of the data portion of the array it is applied to.

If a is discontiguous, this data will still be faked to look as if it came from a contiguous array.

	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> a.tostring()  # output truncated for printing
	'\000\000\000\000\001\000\000\000\002\000\000\000\003\000\000
	\000\004\000\000\000\005\000\000\000\006\000\000\000\007\000
	\000\000\010\000\000\000'
	

a.tolist()

Calling an array's tolist() method returns a hierarchical python list version of the same array:

	>>> a
	0 1 2
	3 4 5
	6 7 8
	>>> a.tolist()
	[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
	

Saving and Loading arrays

The recommended way of storing arrays on disk and loading them up again is to use the pickling mechanism provided by python. As an example, pickling an array can be done by:

	>>> a 
	0 0 1
	1 2 3
	4 5 6
	>>> f = open('myarray', 'w')
	>>> p = pickle.Pickler(f)
	>>> p.dump(a)
	>>> f.close()
	

And the equivalent `reading' operation to read an array back from a file is:

	>>> f = open('myarray', 'r')
	>>> p = pickle.Unpickler(f)
	>>> a = p.load()
	>>> f.close()
	>>> a 
	0 0 1
	1 2 3
	4 5 6
	

Note that as of Python 1.4final, arrays can be pickled when they are part of other Python objects.