1.3.1. NumPy 数组对象

1.3.1.1. 什么是 NumPy 和 NumPy 数组?

NumPy 数组

Python 对象:
  • 高级数字对象:整数,浮点数

  • 容器:列表(插入和追加无成本),字典(快速查找)

NumPy 提供:
  • 用于多维数组的 Python 扩展包

  • 更接近硬件(效率)

  • 专为科学计算而设计(便利)

  • 也称为数组导向计算


>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

提示

例如,一个包含

  • 实验/模拟在离散时间步长上的值

  • 测量设备记录的信号,例如声波

  • 图像的像素,灰度或颜色

  • 在不同 X-Y-Z 位置测量的 3-D 数据,例如 MRI 扫描

为什么它有用: 内存高效的容器,提供快速的数值运算。

In [1]: L = range(1000)
In [2]: %timeit [i**2 for i in L]
50.6 us +- 725 ns per loop (mean +- std. dev. of 7 runs, 10,000 loops each)
In [3]: a = np.arange(1000)
In [4]: %timeit a**2
920 ns +- 7.16 ns per loop (mean +- std. dev. of 7 runs, 1,000,000 loops each)

NumPy 参考文档

  • 网络上:https://numpy.com.cn/doc/

  • 交互式帮助

    In [5]: np.array?
    
    Docstring:
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
    like=None)
    Create an array.
    Parameters
    ----------
    object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
    dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
    copy : bool, optional
    If ``True`` (default), then the array data is copied. If ``None``,
    a copy will only be made if ``__array__`` returns a copy, if obj is
    a nested sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.). Note that any copy of
    the data is shallow, i.e., for arrays with object dtype, the new
    array will point to the same objects. See Examples for `ndarray.copy`.
    For ``False`` it raises a ``ValueError`` if a copy cannot be avoided.
    Default: ``True``.
    order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
    newly created array will be in C order (row major) unless 'F' is
    specified, in which case it will be in Fortran order (column major).
    If object is an array the following holds.
    ===== ========= ===================================================
    order no copy copy=True
    ===== ========= ===================================================
    'K' unchanged F & C order preserved, otherwise most similar order
    'A' unchanged F order if input is F and not C, otherwise C order
    'C' C order C order
    'F' F order F order
    ===== ========= ===================================================
    When ``copy=None`` and a copy is made for other reasons, the result is
    the same as if ``copy=True``, with some exceptions for 'A', see the
    Notes section. The default order is 'K'.
    subok : bool, optional
    If True, then sub-classes will be passed-through, otherwise
    the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
    Specifies the minimum number of dimensions that the resulting
    array should have. Ones will be prepended to the shape as
    needed to meet this requirement.
    like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.
    .. versionadded:: 1.20.0
    Returns
    -------
    out : ndarray
    An array object satisfying the specified requirements.
    See Also
    --------
    empty_like : Return an empty array with shape and type of input.
    ones_like : Return an array of ones with shape and type of input.
    zeros_like : Return an array of zeros with shape and type of input.
    full_like : Return a new array with shape of input filled with value.
    empty : Return a new uninitialized array.
    ones : Return a new array setting values to one.
    zeros : Return a new array setting values to zero.
    full : Return a new array of given shape filled with value.
    copy: Return an array copy of the given object.
    Notes
    -----
    When order is 'A' and ``object`` is an array in neither 'C' nor 'F' order,
    and a copy is forced by a change in dtype, then the order of the result is
    not necessarily 'C' as expected. This is likely a bug.
    Examples
    --------
    >>> import numpy as np
    >>> np.array([1, 2, 3])
    array([1, 2, 3])
    Upcasting:
    >>> np.array([1, 2, 3.0])
    array([ 1., 2., 3.])
    More than one dimension:
    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
    [3, 4]])
    Minimum dimensions 2:
    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])
    Type provided:
    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j, 2.+0.j, 3.+0.j])
    Data-type consisting of more than one element:
    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])
    Creating an array from sub-classes:
    >>> np.array(np.asmatrix('1 2; 3 4'))
    array([[1, 2],
    [3, 4]])
    >>> np.array(np.asmatrix('1 2; 3 4'), subok=True)
    matrix([[1, 2],
    [3, 4]])
    Type: builtin_function_or_method

    提示

    >>> help(np.array)
    
    Help on built-in function array in module numpy:
    array(...)
    array(object, dtype=None, ...
  • 寻找东西

    In [6]: np.con*?
    
    np.concat
    np.concatenate
    np.conj
    np.conjugate
    np.convolve

导入约定

导入 NumPy 的推荐约定是

>>> import numpy as np

1.3.1.2. 创建数组

手动构建数组

  • 1-D:

    >>> a = np.array([0, 1, 2, 3])
    
    >>> a
    array([0, 1, 2, 3])
    >>> a.ndim
    1
    >>> a.shape
    (4,)
    >>> len(a)
    4
  • 2-D,3-D,…:

    >>> b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
    
    >>> b
    array([[0, 1, 2],
    [3, 4, 5]])
    >>> b.ndim
    2
    >>> b.shape
    (2, 3)
    >>> len(b) # returns the size of the first dimension
    2
    >>> c = np.array([[[1], [2]], [[3], [4]]])
    >>> c
    array([[[1],
    [2]],
    [[3],
    [4]]])
    >>> c.shape
    (2, 2, 1)

用于创建数组的函数

提示

在实践中,我们很少一个一个地输入项目…

  • 等间距

    >>> a = np.arange(10) # 0 .. n-1  (!)
    
    >>> a
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> b = np.arange(1, 9, 2) # start, end (exclusive), step
    >>> b
    array([1, 3, 5, 7])
  • 或按点数

    >>> c = np.linspace(0, 1, 6)   # start, end, num-points
    
    >>> c
    array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
    >>> d = np.linspace(0, 1, 5, endpoint=False)
    >>> d
    array([0. , 0.2, 0.4, 0.6, 0.8])
  • 常用数组

    >>> a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
    
    >>> a
    array([[1., 1., 1.],
    [1., 1., 1.],
    [1., 1., 1.]])
    >>> b = np.zeros((2, 2))
    >>> b
    array([[0., 0.],
    [0., 0.]])
    >>> c = np.eye(3)
    >>> c
    array([[1., 0., 0.],
    [0., 1., 0.],
    [0., 0., 1.]])
    >>> d = np.diag(np.array([1, 2, 3, 4]))
    >>> d
    array([[1, 0, 0, 0],
    [0, 2, 0, 0],
    [0, 0, 3, 0],
    [0, 0, 0, 4]])
  • np.random: 随机数(Mersenne Twister PRNG)

    >>> rng = np.random.default_rng(27446968)
    
    >>> a = rng.random(4) # uniform in [0, 1]
    >>> a
    array([0.64613018, 0.48984931, 0.50851229, 0.22563948])
    >>> b = rng.standard_normal(4) # Gaussian
    >>> b
    array([-0.38250769, -0.61536465, 0.98131732, 0.59353096])

1.3.1.3. 基本数据类型

你可能已经注意到,在某些情况下,数组元素以一个尾随点显示(例如 2.2)。这是由于使用的不同数据类型。

>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')

提示

不同的数据类型允许我们在内存中更紧凑地存储数据,但大多数情况下我们只使用浮点数。请注意,在上面的示例中,NumPy 会根据输入自动检测数据类型。


你可以明确指定所需的数据类型

>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')

默认数据类型是浮点数

>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

还有其他类型

复数:
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')
布尔值:
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')
字符串:
>>> f = np.array(['Bonjour', 'Hello', 'Hallo'])
>>> f.dtype # <--- strings containing max. 7 letters
dtype('<U7')
更多:
  • int32

  • int64

  • uint32

  • uint64

1.3.1.4. 基本可视化

现在我们有了第一个数据数组,我们将对其进行可视化。

从启动 IPython 开始

$ ipython # or ipython3 depending on your install

或笔记本

$ jupyter notebook

IPython 启动后,启用交互式绘图

>>> %matplotlib  

或者,从笔记本中,在笔记本中启用绘图

>>> %matplotlib inline 

inline 对笔记本很重要,这样绘图就会显示在笔记本中,而不是在新窗口中。

Matplotlib 是一个 2D 绘图包。我们可以像下面这样导入它的函数

>>> import matplotlib.pyplot as plt  # the tidy way

然后使用(注意,如果你没有使用 %matplotlib 启用交互式绘图,则必须显式使用 show

>>> plt.plot(x, y)       # line plot    
>>> plt.show() # <-- shows the plot (not needed with interactive plots)

或者,如果你已使用 %matplotlib 启用交互式绘图

>>> plt.plot(x, y)       # line plot    
  • 1D 绘图:

>>> x = np.linspace(0, 3, 20)
>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y) # line plot
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, y, 'o') # dot plot
[<matplotlib.lines.Line2D object at ...>]
../../_images/sphx_glr_plot_basic1dplot_001.png
  • 2D 数组(如图像)

>>> rng = np.random.default_rng(27446968)
>>> image = rng.random((30, 30))
>>> plt.imshow(image, cmap=plt.cm.hot)
<matplotlib.image.AxesImage object at ...>
>>> plt.colorbar()
<matplotlib.colorbar.Colorbar object at ...>
../../_images/sphx_glr_plot_basic2dplot_001.png

另请参阅

更多信息:matplotlib 章节

1.3.1.5. 索引和切片

数组的项目可以像其他 Python 序列(例如列表)一样被访问和赋值。

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(np.int64(0), np.int64(2), np.int64(9))

警告

索引从 0 开始,与其他 Python 序列(和 C/C++)一样。相比之下,在 Fortran 或 Matlab 中,索引从 1 开始。

Python 中反转序列的常用习惯用法得到了支持

>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

对于多维数组,索引是整数元组

>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
np.int64(1)
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])

注意

  • 在 2D 中,第一个维度对应于,第二个对应于

  • 对于多维 aa[0] 被解释为获取未指定维度的所有元素。

切片:数组像其他 Python 序列一样也可以被切片

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])

请注意,最后一个索引不包括在内!

>>> a[:4]
array([0, 1, 2, 3])

所有三个切片组件都不是必需的:默认情况下,start 为 0,end 为最后一个,step 为 1

>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])

NumPy 索引和切片的简要图示总结…

../../_images/numpy_indexing.png

你也可以结合赋值和切片

>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

1.3.1.6. 副本和视图

切片操作会在原始数组上创建一个视图,这仅仅是一种访问数组数据的途径。因此,原始数组不会在内存中被复制。你可以使用 np.may_share_memory() 来检查两个数组是否共享相同的内存块。但是请注意,这使用的是启发式方法,可能会给你带来误报。

修改视图时,原始数组也会被修改:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.may_share_memory(a, c)
False

这种行为乍一看可能令人惊讶……但它可以节省内存和时间。

1.3.1.7. 花式索引

提示

NumPy 数组可以使用切片索引,也可以使用布尔或整数数组(**掩码**)索引。这种方法称为 *花式索引*。它创建的是**副本而不是视图**。

使用布尔掩码

>>> rng = np.random.default_rng(27446968)
>>> a = rng.integers(0, 21, 15)
>>> a
array([ 3, 13, 12, 10, 10, 10, 18, 4, 8, 5, 6, 11, 12, 17, 3])
>>> (a % 3 == 0)
array([ True, False, True, False, False, False, True, False, False,
False, True, False, True, False, True])
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 12, 18, 6, 12, 3])

使用掩码索引对于为子数组分配新值非常有用

>>> a[a % 3 == 0] = -1
>>> a
array([-1, 13, -1, 10, 10, 10, -1, 4, 8, 5, -1, 11, -1, 17, -1])

使用整数数组索引

>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

可以使用整数数组进行索引,其中相同的索引可以重复多次

>>> a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list
array([20, 30, 20, 40, 20])

可以使用这种类型的索引分配新值

>>> a[[9, 7]] = -100
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])

提示

当通过整数数组索引创建新数组时,新数组的形状与整数数组的形状相同

>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])

下图说明了各种花式索引应用

../../_images/numpy_fancy_indexing.png