Numpy is the numerical library for python, it encompasses many kind of solutions for dealing with multi-dimensional arrays and many high-level mathematical functions that operate on them. It is implemented in C and provides faster methods than the ones contained in the standard python libraries.
The main actor of Numpy is the numpy array, which we can mentally map to standard python lists, for they have similarities:
Access by position: you can access a nparray by position (0-based) just like regular lists.
Slicing: nparrays are sliced like regular lists.
For loops: iterates over each element of the nparray.
arr=np.array([1,4,9]) arr > 1 arr[1:] > array([4, 9]) for element in arr: print(element) > 1 > 4 > 9
Yet the important features lie in their differences:
Same type: Numpy arrays may not contain data of different type. A list can be defined as
l=[1,'cat',3/4] but a numpy array cannot (like in C, for memory optimization).
methods: You can compute basic statistical operations with a simple command, like means, standard deviations, and other gimmicks (See the docs for a full list)
multi-dimensionality: Of course a List of Lists can be thought as a multi-dimensional list, but arrays (the exact type is ndarray, N-dimensional array) are built to be multi-dimensional. They have attributes and methods that are specifically designed with this in mind (See shape manipulation in the official documentation).
arr=np.array([[1, 4, 9],[3,7,10]]) arr > array([1, 4, 9], [3, 7, 10]]) np.array([1, 4, 'chocobo']) > array(['1','4','chocobo']) np.array([1, 4, 'chocobo']).dtype > dtype(' 4.6666666667 arr.std() > 3.299... arr.reshape(6,1) > array([[ 1], > [ 4], > [ 9], > [ 3], > [ 7], > ])
You may have noticed by now that the creation of a ndarray passes through a list. This is not necessary needed (you need an object that exposes its buffer interface, but this is beyond our scope), but it is probably the most common way you are going to use.
Next, two more important differences will be described, Vectorized Operations and Boolean Indexing.
One of the most important differences. Operation between arrays are carried out with a different logic than that of standard lists. For example, the operator ‘+’ on lists concatenates two lists, while if applied to two ndarrays they get summed in an element-wise fashion.
l=[1, 4, 9] arr=np.array(l) l+l > [1, 4, 9, 1, 4, 9] arr+arr > array([2, 8, 18]) ##the same operation without ndarrays would have requested e.g. a list comprehension and a zip
[x+y for x,y in zip(l,l)] > [2, 8, 18]
Question: What I get if I try to sum two ndarrays with different shapes?
e.g. A1=np.array([0,0,0]), A2=np.array([1,1]) -> A1+A2=?
The operator ‘*’ will do the element-wise multiplication. If a matrix product is desired instead, the method dot should be used:
arr=np.array([[1, 4, 9], [2, 7, 10]]) arr * arr > array([[ 1, 16, 81], [ 4, 49, 100]]) np.dot(arr,arr.T) > array([[ 98, 120], [120, 153]])
Dot corresponds to the dot product for two 1D arrays and matrix multiplication for 2D arrays.
Question 1: Why is the second argument of np.dot in the example arr.T?
Question 2: What happens if instead you write np.dot(arr.T, arr)? Why?
Numpy arrays can be sliced with vectors of booleans (lists or other ndarrays) with the same dimensions.
A boolean vector is usually created starting from the ndarray itself by applying a condition.
arr=np.array([1, 4, 9]) arr>3 > array([False, True, True], dtype=bool) #by requesting arr>3 I get a boolean array that states if the elements #in the same position are >3. This vector can be used to slice my #original array to get all elements that are >3. arr[arr>3] > array([4, 9])
Note that applying a boolean indexing on a multidimensional array will flatten the output to a single dimension array since the resulting shape should be not obvious in general (Try it).