Definition
The array
Data Structure
Installing NumPy
Create NumPy arrays
Benefits of NumPy
Comparison of NumPy arrays vs Python Lists
Indexing and Slicing
Array Operations
Vectorization
Broadcasting
Aggregation
Linear Algebra
Summary
NumPy (short for Numerical Python) is “the fundamental package for scientific computing with Python” and it is the library Pandas, Matplotlib and Scikit-learn builds on top off.
It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
An array is a collection of items stored at contiguous memory locations. The idea is to store multiple items of the same type together. Some examples of arrays are vectors (an array with a single column) and matrices (an array with multiple columns).
A NumPy array is homogeneous grid of values. In NumPy dimensions of array are called axes. The number of axes is called rank. A tuple of non-negative integers giving the size of the array along each dimension is called its shape.
NumPy array data structure is also called ndarray, short for n-dimensional array. An array with one dimension is called a vector and an array with two dimensions is called a matrix.
NumPy does not come with Python by default so it needs to be installed. (i.e) It is not part of the standard library.
You can install NumPy via pip command.
Open your command prompt and enter the following command
pip install numpy
After you’ve downloaded and install NumPy, you need to import it every time you want to use it in your Python IDE.
Importing a library means loading it into the memory and then it’s there for you to work with.
To import NumPy you need to write the following code:
import numpy as np
Here numpy is the library we are loading and giving it an alias of np
.
Remember, you will need to do it every time you start a new Jupyter Notebook or new Python module.
Once numpy is loaded, you can start working with numpy.
There are several ways to create arrays.
For example, you can create an array from a regular Python list or tuple using the array
function. The type of the resulting array is deduced from the type of the elements in the sequences.
import numpy as np
We can create an array from Python lists, we just have to insert the list inside np.array()
as shown below.
# create an array from a list
a = np.array([1,2,3])
print(a)
[1 2 3]
type(a)
numpy.ndarray
a.dtype # type of elements in the array
dtype('int32')
b = np.array([1.2, 3.5, 5.1])
b
array([1.2, 3.5, 5.1])
b.dtype
dtype('float64')
A frequent error consists in calling array with multiple arguments, rather than providing a single sequence as an argument.
a = np.array(1,2,3) # WRONG
----------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [7], in <module> ----> 1 a = np.array(1,2,3) TypeError: array() takes from 1 to 2 positional arguments but 3 were given
a = np.array([1,2,3]) # RIGHT
a
array([1, 2, 3])
# Creating 2D array
b = np.array([[1,2,3],[4,5,6]])
b
array([[1, 2, 3], [4, 5, 6]])
print(f"array a has dimensions: {a.ndim}")
print(f"array b has dimensions: {b.ndim}")
array a has dimensions: 1 array b has dimensions: 2
size
¶Now that you have your array loaded, you can check its size (number of elements) by typing array.size
.
print(f"array a has size: {a.size}") # number of items
print(f"array b has size: {b.size}")
array a has size: 3 array b has size: 6
shape
¶Its shape (the dimensions — rows and columns) by typing array.shape
.
print(f"array a has shape: {a.shape}") # number of rows and columns
print(f"array b has shape: {b.shape}")
array a has shape: (3,) array b has shape: (2, 3)
dtype
¶You can use array.dtype
to get the data types of the array (floats, integers etc — see more in the NumPy documentation)
print(f"array a contains: {a.dtype}") # data types contained in the array
print(f"array b contains: {b.dtype}")
array a contains: int32 array b contains: int32
astype()
¶If you need to convert the datatype you can use the array.astype(dtype)
command.
print(f"array a as a float: {a.astype(np.float64)}") # number of items
print("")
print(f"array b as complex numbers: \n{b.astype(np.complex128)}")
array a as a float: [1. 2. 3.] array b as complex numbers: [[1.+0.j 2.+0.j 3.+0.j] [4.+0.j 5.+0.j 6.+0.j]]
A = np.arange(0, 30, 5) # Creates [ 0, 5, 10, 15, 20, 25]
A
array([ 0, 5, 10, 15, 20, 25])
np.linspace()
¶linspace
receives as an argument the number of elements that we want, instead of the step
A = np.linspace(0, 30, 5) # 5 numbers from 0 to 30 inclusive
A
array([ 0. , 7.5, 15. , 22.5, 30. ])
arange: returns evenly spaced values within a given interval. step size is specified.
linspace: returns evenly spaced values within a given interval. num no. of elements are returned.
np.zeros()
¶The function zeros
creates an array full of zeros
a = np.zeros((2,2)) # Create an array of all zeros
a
array([[0., 0.], [0., 0.]])
np.ones()
¶The function ones
creates an array full of ones
b = np.ones((2,2)) # Create an array of all ones
b
array([[1., 1.], [1., 1.]])
np.empty()
¶The function empty
creates an array whose initial content is random and depends on the state of the memory.
e = np.empty((2, 2)) # random elements
e
array([[ 7.5, 15. ], [22.5, 30. ]])
By default, the dtype of the created array is float64, but it can be specified via the key word argument
dtype
.
# create an array of ones but integers
np.ones((3, 3), dtype=np.int16)
array([[1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=int16)
The primary benefit of arrays lies in their ability to quickly manipulate and process data. When you’re working with numbers, you often need to conduct arithmetic operations or produce summary data quickly.
More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than seconds.
Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration indices.
Clearer code: Without loops, your code will look more like the equations you’re trying to calculate.
Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and bug free.
Because of these benefits, NumPy is the de facto standard for multidimensional arrays in Python data science, and many of the most popular libraries are built on top of it. Learning NumPy is a great way to set down a solid foundation as you expand your knowledge into more specific areas of data science.
To benchmark the performance of arrays vs lists, we create a list that contains integers from 0 to 1,000,000. We also create an array that contains the same elements.
We then multiply each element of the list with 5 and calculate the time it took to complete the operation in seconds. We also multiply each element of the array with 5 and calculate the time it took to complete the operation.
We then compare the two. The difference is staggering.
my_list = [i for i in range(1000000)] # create a list using list comprehension
list_size = len(my_list)
print(my_list[:10])
print("list size:",list_size)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] list size: 1000000
my_arr = np.array(my_list)
print(my_arr[:10])
print("array size",my_arr.size)
[0 1 2 3 4 5 6 7 8 9] array size 1000000
import time
t1 = time.time()
new_list = [i*5 for i in my_list]
t2 = time.time()
diff_list = t2 - t1
print(new_list[:10])
print("Time it took to multiply each element of the list with 5: ", diff_list)
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45] Time it took to multiply each element of the list with 5: 0.1309194564819336
t1 = time.time()
new_arr = my_arr * 5
t2 = time.time()
diff_arr = t2 - t1
print(new_arr[:10])
print("Time it took to multiply each element of the array with 5: ", diff_arr)
[ 0 5 10 15 20 25 30 35 40 45] Time it took to multiply each element of the array with 5: 0.0039980411529541016
times = diff_list / diff_arr
print(f"The array operation is {times} faster than list operation!!")
The array operation is 32.74590017293816 faster than list operation!!
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
array[5]
will return the element in the 5th index
You can also select the first five elements, for example, by using a colon (:
). array[0:5]
will select the first 5 values, and element at index 5 will not be included.
# 1D array
a = np.arange(10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a[2] # element at index 2 or third element in the array
2
a[2:5] # slice elements from the third element to the sixth element (element at index 5 (sixth) is not included)
array([2, 3, 4])
a[:6:2] # elements from start of array to element in fifth index. 2 is the step size
array([0, 2, 4])
a[-1] # negative indexing
9
a[::-1] # reversed a
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]])
# lets access 10 from our array
# the element is in the row 2 and column 1
arr[(2,1)]
10
# access all elements in the first row
arr[0]
array([1, 2, 3, 4])
# access all elements in the last column
arr[:,-1]
array([ 4, 8, 12])
# each column in the second and third row
arr[1:3, :]
array([[ 5, 6, 7, 8], [ 9, 10, 11, 12]])
# change all elements in first and second column to zero
arr[:, 0:2]
array([[ 1, 2], [ 5, 6], [ 9, 10]])
arr[:, 0:2] = np.zeros((3,2))
arr
array([[ 0, 0, 3, 4], [ 0, 0, 7, 8], [ 0, 0, 11, 12]])
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
a = np.array([20, 30, 40, 50])
b = np.arange(4)
print("a",a)
print("b",b)
a [20 30 40 50] b [0 1 2 3]
c = a + b
c
array([20, 31, 42, 53])
b**2
array([0, 1, 4, 9])
The product operator *
operates elementwise in NumPy arrays.
The matrix product can be performed using the @
operator or the dot()
function or method:
# Creates arrays/vectors a and b
A = np.array([[4,0],[0,2]])
print(A)
[[4 0] [0 2]]
B = np.array([[0,1],[3,2]])
print(B)
[[0 1] [3 2]]
# Dot product between the two vectors
print(np.dot(A,B))
[[0 4] [6 4]]
Vectorization is a powerful ability within NumPy to express operations as occurring on entire arrays rather than their individual elements.
This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact seen in any kind of numerical computations.
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.
Using vectorized sum method on NumPy array. We will compare the vectorized sum method along with simple non-vectorized operation i.e the iterative method to calculate the sum of numbers from 0 – 14,999.
# vectorized sum
t1 = time.time()
print(np.sum(np.arange(15000)))
t2 = time.time()
print("Time taken by vectorized sum : ", end = "")
print(t2 - t1)
# iterative sum
t1 = time.time()
total = 0
for item in range(0, 15000):
total += item
a = total
print("\n" + str(a))
t2 = time.time()
print("Time taken by iterative sum : ", end = "")
print(t2-t1)
112492500 Time taken by vectorized sum : 0.0 112492500 Time taken by iterative sum : 0.004997730255126953
NumPy provides a mechanism for performing mathematical operations on arrays of unequal shapes
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations
# a shape-(3, 4) array
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(a)
print('shape',a.shape)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] shape (3, 4)
# a shape-(4,) array
b = np.arange(4)
print(b)
print('shape',b.shape)
[0 1 2 3] shape (4,)
# multiplying a shape-(4,) array with a shape-(3, 4) array
# `b` is multiplied by each row of `a`
a * b
array([[ 0, 2, 6, 12], [ 0, 6, 14, 24], [ 0, 10, 22, 36]])
Index-based selection is great, but what if you want to filter your data based on more complicated nonuniform or nonsequential criteria? This is where the concept of a mask comes into play.
A mask is an array that has the exact same shape as your data, but instead of your values, it holds Boolean values: either True or False. You can use this mask array to index into your data array in nonlinear and complex ways. It will return all of the elements where the Boolean array has a True value.
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
a
array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]])
mask = (a % 3 == 0)
mask
array([[False, False, True, False], [False, True, False, False], [ True, False, False, True]])
a[mask] # only numbers that are divisible by 3
array([ 3, 6, 9, 12])
data = np.array([
[7, 1, 4],
[8, 6, 5],
[1, 2, 3]
])
data
array([[7, 1, 4], [8, 6, 5], [1, 2, 3]])
print(np.sort(data))
[[1 4 7] [5 6 8] [1 2 3]]
np.max()
¶np.max(data)
8
np.mean()
¶np.mean(data)
4.111111111111111
a = np.array([[1., 2.], [3., 4.]])
a
array([[1., 2.], [3., 4.]])
# Inverses the matrix
print(np.linalg.inv(a))
[[-2. 1. ] [ 1.5 -0.5]]
Compute the determinant of an array.
a = np.array([[1, 2], [3, 4]], dtype=np.int64)
a
array([[1, 2], [3, 4]], dtype=int64)
np.linalg.det(a)
-2.0000000000000004
a = np.array([[1, 2], [3, 5]])
a
array([[1, 2], [3, 5]])
b = np.array([1, 2])
b
array([1, 2])
x = np.linalg.solve(a, b)
x
array([-1., 1.])
These are just a few examples of what can be done with Numpy and there are many more not covered in the workshops. If you ever want to look for a function for a mathmatical operation not covered here you can always try searching it online. If Numpy doesn't have it you may need to look at another package like Scipy or simply code it yourself.
# challenge 1 solution
a = np.array([[1,2,3],
[4,5,6]])
b = np.array([[10,11,12],
[13,14,15]])
c = a + b
print(c)
[[11 13 15] [17 19 21]]
a = np.arrayy([[1,2,3],
[4,5,6],
[7,8,9]])
b = np.array([[2,3,4],
[5,6,7],
[8,9,10]])
expected solution for challenge 2
[ 36 42 48
81 96 111
126 150 174]
Expected output:
[
2 3 4
5 6 7
8 9 10
]
### challenge 3 solution
##
### Your solution here
import numpy as np
a = np.array([
[1,1,1],
[0,2,5],
[2,5,-1]
])
b = np.array(
[6,-4,27]
)
x, y, z = np.linalg.solve(a,b)
print(f"x -> {x} y -> {y} z -> {z}")
#### Your solution here
# x = 5; y = 3; z = −2
x -> 5.0 y -> 3.0 z -> -2.0