1.3.1. NumPy 数组对象¶
1.3.1.1. 什么是 NumPy 和 NumPy 数组?¶
NumPy 数组¶
- Python 对象:
高级数字对象:整数,浮点数
容器:列表(插入和追加无成本),字典(快速查找)
- NumPy 提供:
用于多维数组的 Python 扩展包
更接近硬件(效率)
专为科学计算而设计(便利)
也称为数组导向计算
>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])
提示
例如,一个包含
实验/模拟在离散时间步长上的值
测量设备记录的信号,例如声波
图像的像素,灰度或颜色
在不同 X-Y-Z 位置测量的 3-D 数据,例如 MRI 扫描
…
为什么它有用: 内存高效的容器,提供快速的数值运算。
In [1]: L = range(1000)
In [2]: %timeit [i**2 for i in L]
50.6 us +- 725 ns per loop (mean +- std. dev. of 7 runs, 10,000 loops each)
In [3]: a = np.arange(1000)
In [4]: %timeit a**2
920 ns +- 7.16 ns per loop (mean +- std. dev. of 7 runs, 1,000,000 loops each)
NumPy 参考文档¶
交互式帮助
In [5]: np.array? Docstring: array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None) Create an array. Parameters ---------- object : array_like An array, any object exposing the array interface, an object whose ``__array__`` method returns an array, or any (nested) sequence. If object is a scalar, a 0-dimensional array containing object is returned. dtype : data-type, optional The desired data-type for the array. If not given, NumPy will try to use a default ``dtype`` that can represent the values (by applying promotion rules when necessary.) copy : bool, optional If ``True`` (default), then the array data is copied. If ``None``, a copy will only be made if ``__array__`` returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements (``dtype``, ``order``, etc.). Note that any copy of the data is shallow, i.e., for arrays with object dtype, the new array will point to the same objects. See Examples for `ndarray.copy`. For ``False`` it raises a ``ValueError`` if a copy cannot be avoided. Default: ``True``. order : {'K', 'A', 'C', 'F'}, optional Specify the memory layout of the array. If object is not an array, the newly created array will be in C order (row major) unless 'F' is specified, in which case it will be in Fortran order (column major). If object is an array the following holds. ===== ========= =================================================== order no copy copy=True ===== ========= =================================================== 'K' unchanged F & C order preserved, otherwise most similar order 'A' unchanged F order if input is F and not C, otherwise C order 'C' C order C order 'F' F order F order ===== ========= =================================================== When ``copy=None`` and a copy is made for other reasons, the result is the same as if ``copy=True``, with some exceptions for 'A', see the Notes section. The default order is 'K'. subok : bool, optional If True, then sub-classes will be passed-through, otherwise the returned array will be forced to be a base-class array (default). ndmin : int, optional Specifies the minimum number of dimensions that the resulting array should have. Ones will be prepended to the shape as needed to meet this requirement. like : array_like, optional Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as ``like`` supports the ``__array_function__`` protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument. .. versionadded:: 1.20.0 Returns ------- out : ndarray An array object satisfying the specified requirements. See Also -------- empty_like : Return an empty array with shape and type of input. ones_like : Return an array of ones with shape and type of input. zeros_like : Return an array of zeros with shape and type of input. full_like : Return a new array with shape of input filled with value. empty : Return a new uninitialized array. ones : Return a new array setting values to one. zeros : Return a new array setting values to zero. full : Return a new array of given shape filled with value. copy: Return an array copy of the given object. Notes ----- When order is 'A' and ``object`` is an array in neither 'C' nor 'F' order, and a copy is forced by a change in dtype, then the order of the result is not necessarily 'C' as expected. This is likely a bug. Examples -------- >>> import numpy as np >>> np.array([1, 2, 3]) array([1, 2, 3]) Upcasting: >>> np.array([1, 2, 3.0]) array([ 1., 2., 3.]) More than one dimension: >>> np.array([[1, 2], [3, 4]]) array([[1, 2], [3, 4]]) Minimum dimensions 2: >>> np.array([1, 2, 3], ndmin=2) array([[1, 2, 3]]) Type provided: >>> np.array([1, 2, 3], dtype=complex) array([ 1.+0.j, 2.+0.j, 3.+0.j]) Data-type consisting of more than one element: >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')]) >>> x['a'] array([1, 3]) Creating an array from sub-classes: >>> np.array(np.asmatrix('1 2; 3 4')) array([[1, 2], [3, 4]]) >>> np.array(np.asmatrix('1 2; 3 4'), subok=True) matrix([[1, 2], [3, 4]]) Type: builtin_function_or_method
提示
>>> help(np.array) Help on built-in function array in module numpy: array(...) array(object, dtype=None, ...
寻找东西
In [6]: np.con*? np.concat np.concatenate np.conj np.conjugate np.convolve
导入约定¶
导入 NumPy 的推荐约定是
>>> import numpy as np
1.3.1.2. 创建数组¶
手动构建数组¶
1-D:
>>> a = np.array([0, 1, 2, 3]) >>> a array([0, 1, 2, 3]) >>> a.ndim 1 >>> a.shape (4,) >>> len(a) 4
2-D,3-D,…:
>>> b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array >>> b array([[0, 1, 2], [3, 4, 5]]) >>> b.ndim 2 >>> b.shape (2, 3) >>> len(b) # returns the size of the first dimension 2 >>> c = np.array([[[1], [2]], [[3], [4]]]) >>> c array([[[1], [2]], [[3], [4]]]) >>> c.shape (2, 2, 1)
用于创建数组的函数¶
提示
在实践中,我们很少一个一个地输入项目…
等间距
>>> a = np.arange(10) # 0 .. n-1 (!) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> b = np.arange(1, 9, 2) # start, end (exclusive), step >>> b array([1, 3, 5, 7])
或按点数
>>> c = np.linspace(0, 1, 6) # start, end, num-points >>> c array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]) >>> d = np.linspace(0, 1, 5, endpoint=False) >>> d array([0. , 0.2, 0.4, 0.6, 0.8])
常用数组
>>> a = np.ones((3, 3)) # reminder: (3, 3) is a tuple >>> a array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]) >>> b = np.zeros((2, 2)) >>> b array([[0., 0.], [0., 0.]]) >>> c = np.eye(3) >>> c array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) >>> d = np.diag(np.array([1, 2, 3, 4])) >>> d array([[1, 0, 0, 0], [0, 2, 0, 0], [0, 0, 3, 0], [0, 0, 0, 4]])
np.random
: 随机数(Mersenne Twister PRNG)>>> rng = np.random.default_rng(27446968) >>> a = rng.random(4) # uniform in [0, 1] >>> a array([0.64613018, 0.48984931, 0.50851229, 0.22563948]) >>> b = rng.standard_normal(4) # Gaussian >>> b array([-0.38250769, -0.61536465, 0.98131732, 0.59353096])
1.3.1.3. 基本数据类型¶
你可能已经注意到,在某些情况下,数组元素以一个尾随点显示(例如 2.
与 2
)。这是由于使用的不同数据类型。
>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')
提示
不同的数据类型允许我们在内存中更紧凑地存储数据,但大多数情况下我们只使用浮点数。请注意,在上面的示例中,NumPy 会根据输入自动检测数据类型。
你可以明确指定所需的数据类型
>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')
默认数据类型是浮点数
>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')
还有其他类型
- 复数:
>>> d = np.array([1+2j, 3+4j, 5+6*1j]) >>> d.dtype dtype('complex128')
- 布尔值:
>>> e = np.array([True, False, False, True]) >>> e.dtype dtype('bool')
- 字符串:
>>> f = np.array(['Bonjour', 'Hello', 'Hallo']) >>> f.dtype # <--- strings containing max. 7 letters dtype('<U7')
- 更多:
int32
int64
uint32
uint64
1.3.1.4. 基本可视化¶
现在我们有了第一个数据数组,我们将对其进行可视化。
从启动 IPython 开始
$ ipython # or ipython3 depending on your install
或笔记本
$ jupyter notebook
IPython 启动后,启用交互式绘图
>>> %matplotlib
或者,从笔记本中,在笔记本中启用绘图
>>> %matplotlib inline
inline
对笔记本很重要,这样绘图就会显示在笔记本中,而不是在新窗口中。
Matplotlib 是一个 2D 绘图包。我们可以像下面这样导入它的函数
>>> import matplotlib.pyplot as plt # the tidy way
然后使用(注意,如果你没有使用 %matplotlib
启用交互式绘图,则必须显式使用 show
)
>>> plt.plot(x, y) # line plot
>>> plt.show() # <-- shows the plot (not needed with interactive plots)
或者,如果你已使用 %matplotlib
启用交互式绘图
>>> plt.plot(x, y) # line plot
1D 绘图:
>>> x = np.linspace(0, 3, 20)
>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y) # line plot
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, y, 'o') # dot plot
[<matplotlib.lines.Line2D object at ...>]
2D 数组(如图像)
>>> rng = np.random.default_rng(27446968)
>>> image = rng.random((30, 30))
>>> plt.imshow(image, cmap=plt.cm.hot)
<matplotlib.image.AxesImage object at ...>
>>> plt.colorbar()
<matplotlib.colorbar.Colorbar object at ...>
另请参阅
更多信息:matplotlib 章节
1.3.1.5. 索引和切片¶
数组的项目可以像其他 Python 序列(例如列表)一样被访问和赋值。
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(np.int64(0), np.int64(2), np.int64(9))
警告
索引从 0 开始,与其他 Python 序列(和 C/C++)一样。相比之下,在 Fortran 或 Matlab 中,索引从 1 开始。
Python 中反转序列的常用习惯用法得到了支持
>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
对于多维数组,索引是整数元组
>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
np.int64(1)
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])
注意
在 2D 中,第一个维度对应于行,第二个对应于列。
对于多维
a
,a[0]
被解释为获取未指定维度的所有元素。
切片:数组像其他 Python 序列一样也可以被切片
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])
请注意,最后一个索引不包括在内!
>>> a[:4]
array([0, 1, 2, 3])
所有三个切片组件都不是必需的:默认情况下,start 为 0,end 为最后一个,step 为 1
>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])
NumPy 索引和切片的简要图示总结…
你也可以结合赋值和切片
>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])
1.3.1.6. 副本和视图¶
切片操作会在原始数组上创建一个视图,这仅仅是一种访问数组数据的途径。因此,原始数组不会在内存中被复制。你可以使用 np.may_share_memory()
来检查两个数组是否共享相同的内存块。但是请注意,这使用的是启发式方法,可能会给你带来误报。
修改视图时,原始数组也会被修改:
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.may_share_memory(a, c)
False
这种行为乍一看可能令人惊讶……但它可以节省内存和时间。
1.3.1.7. 花式索引¶
提示
NumPy 数组可以使用切片索引,也可以使用布尔或整数数组(**掩码**)索引。这种方法称为 *花式索引*。它创建的是**副本而不是视图**。
使用布尔掩码¶
>>> rng = np.random.default_rng(27446968)
>>> a = rng.integers(0, 21, 15)
>>> a
array([ 3, 13, 12, 10, 10, 10, 18, 4, 8, 5, 6, 11, 12, 17, 3])
>>> (a % 3 == 0)
array([ True, False, True, False, False, False, True, False, False,
False, True, False, True, False, True])
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 12, 18, 6, 12, 3])
使用掩码索引对于为子数组分配新值非常有用
>>> a[a % 3 == 0] = -1
>>> a
array([-1, 13, -1, 10, 10, 10, -1, 4, 8, 5, -1, 11, -1, 17, -1])
使用整数数组索引¶
>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
可以使用整数数组进行索引,其中相同的索引可以重复多次
>>> a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list
array([20, 30, 20, 40, 20])
可以使用这种类型的索引分配新值
>>> a[[9, 7]] = -100
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])
提示
当通过整数数组索引创建新数组时,新数组的形状与整数数组的形状相同
>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])
下图说明了各种花式索引应用