第三讲复合类型与函数 DONE

1. Python 复合类型
2. Python 的运行模式
- 2.1. 调试
- 2.2. Jupyter 环境
3. Python 函数
4. 文档

1 Python 复合类型

Python 的基本数据类型包括整型、浮点型、布尔型与字符串。这些类型都可以组合起来。

1.1 列表

列表用 [] 表达，元素用 , 分离。元素类型任意，甚至可以不同。

[1,2,3], ["天","地","人"], ["物理",3.14]

([1, 2, 3], ['天', '地', '人'], ['物理', 3.14000000000000])

也可以嵌套，我们仿照集合论的自然数构造方法，构造一系列合法的列表：

[], [[]], [[], [[]]], [[], [[], [[]]]]

([], [[]], [[], [[]]], [[], [[], [[]]]])

在 Python 看来，这些个列表都各不相同。

1.1.1 汇总

列表常用来汇总。生成空列表，使用 .append() 方法逐步加入元素，例如：

li = []
li.append("手机")
li.append("身份证")
li.append("钥匙")

print(li)

['手机', '身份证', '钥匙']

列表可用作迭代器，

for x in li:
    print(f"出门之前，记得带{x}！")

出门之前，记得带手机！
出门之前，记得带身份证！
出门之前，记得带钥匙！

也可以用下标取出特定的元素，用法与字符串一样：

li[0], li[1:3], li[-1]

('手机', ['身份证', '钥匙'], '钥匙')

可以当成一个集合来判断元素的归属：

"手机" in li, "眼镜" in li

(True, False)

1.2 字典

字典是 Python 标志性的数据结构。顾名思义，单词放进字典，它个单词（key）的解释对应字典中的值（value）。词与值之间用 : 分隔，词与词之间用 , 分隔。我们把教室里的学生人数创建一个字典，字典可通过赋值加新词，也可以判断词的归属：

sc = {"工物": 20, "物理": 40}
print(sc["工物"], sc["物理"])

sc["上海交大"] = 2 # 创建了新的词条
print(sc["上海交大"])
print("牛津" in sc, "工物" in sc)

20 40
2
False True

1.2.1 条件语句字典化

字典构建了从词到值的映射关系，当条件语句有这样的特点时，可用字典方便地替代。体会下面的例子，已经学生群体的变量名 aff ，找出学生人数：

aff = '工物'
if aff == '工物':
    print(20)
elif aff == '物理':
    print(40)
elif aff == "上海交大":
    print(2)
else:
    print(1)

# 用字典查询更加方便
print(sc[aff])

20
20

"字典查询"替代了多级的条件，更适合直觉。

1.2.2 字典的使用

字典中的词或者值都可以转化为列表，或者迭代器，

list(sc.keys()), list(sc.values())

(['工物', '物理', '上海交大'], [20, 40, 2])

for k in sc:
    print(k)
for v in sc.values():
    print(v)

工物
物理
上海交大
20
40
2

更常用是把词与值一起迭代循环，

for k, v in sc.items():
    print(f"教室里有{v}名{k}的学生。")

教室里有20名工物的学生。
教室里有40名物理的学生。
教室里有2名上海交大的学生。

1.2.3 Python 内部的字典

字典是 Python 的核心数据结构，它的命名空间（namespace）就是用字典实现的。Python 环境中的变量都中某个字典的词。往往字典的妙用可以给程序带来神来之笔的重构。字典的内部数据结构是哈希表，可以保持插入和查询的效率。

1.2.4 构造字典快捷方法

任何输出序对的迭代器，都可以快速构造出字典。如，

dict(enumerate("abcd"))

{0: 'a', 1: 'b', 2: 'c', 3: 'd'}

从中可见 Python 简单语句的表现力。一般的思维会这样写：

d = {} # 生成一个空字典
for k, v in enumerate("abcd"):
    d[k] = v # 通过赋值添加 k:v 组
print(d)

{0: 'a', 1: 'b', 2: 'c', 3: 'd'}

显得很冗长。有一种中间态的写法是

{k:v for k, v in enumerate("abcd")}

{0: 'a', 1: 'b', 2: 'c', 3: 'd'}

可以用来把值与词对换

{v:k for k, v in enumerate("abcd")}

{'a': 0, 'b': 1, 'c': 2, 'd': 3}

1.2.5 词的数据类型

字典的原理要求词是不可变类型。字典创建后，如果词变了，内部的哈希方案会失效。列表可变，所以不能成为字典的词。与列表对应的不可改类型是元组（tuple），可以用作词。字典的值可以是任何数据类型，与变量等价，如

{(0, 0): 6, (0, 1): "您", (1, 0): ["Python", "Bash"]}

{(0, 0): 6, (0, 1): '您', (1, 0): ['Python', 'Bash']}

这不奇怪，变量本身就是由内部的字典实现的！

1.3 删除元素

1.3.1 列表

remove ，效果为删除特定元素，使用方式为 list.remove(element) 。例如对以下样例：

ls = [1, "f", 3.0]
ls.remove("f")
print(ls)

[1, 3.0]

pop ，效果为删除指定索引对应的元素，使用方式为 list.pop(index) 。例如对以下样例：

ls = [1, "f", 3.0]
ls.pop(0)
print(ls)

['f', 3.0]

切片，效果为删除连续索引对应的多个元素，使用方式为 list[index_begin:index_end] = [] 。例如对以下样例：

ls = [1, "f", 3.0]
ls[0:2] = []
print(ls)

[3.0]

（注意如 0:9 表示从 index 为 0 到 index 为 8 的元素）

clear ，大规模杀伤性武器，效果为清空列表中所有元素，得到空列表，使用方式为 list.clear() 。例如对以下样例：

ls = [1, "f", 3.0]
ls.clear()
print(ls)

[]

使用 list = [] 可实现同样效果。

del ，核武器，直接删除了列表对象，使用方式为 del list 。例如对以下样例：

ls = [1, "f", 3.0]
del ls
print(ls)

会报错：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ls' is not defined

1.3.2 元组

由于元组具有不可变原则，故想通过类似 remove 和 pop 这样的方式删除元素无法实现。但由于课上介绍过元组可以实现加法（类似并集），故可以通过切片的方式进行删除操作。例如对以下样例：

tup = (1, 2, 3, 4)

#删除元素3
tup = tup[:2] + tup[3:]

print(tup)

(1, 2, 4)

与列表类似，可以使用 tuple = () 实现全部元素的清理，也可以使用 del tuple 删除整个元组。

1.3.3 字典

不同于列表，字典的索引是通过keys实现的，是无序的，但仍可以通过对keys的索引进行元素的删除，只是不支持切片操作。字典元素的删除主要有以下几种方式：

pop ,效果为删除指定索引对应的键值对，使用方式为 dict.pop(keys) 。例如对以下样例：

dictionary = {"A":1, "B":2, "C":3}
dictionary.pop("A")
print(dictionary)

{'B': 2, 'C': 3}

popitem ，效果为随机返回并删除一对键值对，但似乎从 Python3.6 以后一般认为删除的是最后插入的键值对（类似于栈，FILO），使用方式为 dict.popitem() 。例如对以下样例：

dictionary = {"A":1, "B":2, "C":3, "D":4}
x = dictionary.popitem()
print(x, dictionary)

('D', 4) {'A': 1, 'B': 2, 'C': 3}

clear ，与列表类似，在此不再赘述，同样可以使用 dict = {} 替代。

del ，与列表类似，在此不再赘述。

2 Python 的运行模式

在命令行输入 python3 进入的是交互模式，每输入一个指令，都即时得到结果。这个环境称为“REPL”，即 “Read-Evaluate-Print-Loop”，命令行界面也属此类。交互界面更适合探索和试验。批处理模式可以批量执行命令，组成批量命令的文件叫做“脚本”，也是广义上的程序。因为脚本运行时没有 REPL 部分，可以不受人干涉，适用于处理大量数据。在实践中，经常在交互模式中探索出正确的指令，再把这些指令收集起来，形成脚本，自动地批处理运行。因为脚本应用于无人值守的环境，输入与输出就应当自动进行，只是根据提示交互地输入就远远不够了。另一种从外界给程序传递信息的方式是调用参数。Python 把调用时的参数传递给了特定的字符串列表 sys.argv 。通过 sys.argv[1] ， sys.argv[2] 取得第一、第二个参数。

2.1 调试

在 REPL 与 Python 的解释器进行交互，是个区别于编译型语言很有效的调试环境。分析程序中可疑的程序片断，放在 REPL 中观察它的行为，是找到问题的好方法。在有问题的脚本运行时给 python3 加上 -i 选项，含义是“iteractive”，例如 python3 -i prime.py 。这可让脚本出错或结束后不退出，留在 REPL 的环境中，提供给我们程序恰好出问题时的“案发现场”来寻找诱因。

2.2 Jupyter 环境

Python 有一个新兴的运行环境，叫做 Jupyter。本课之所以没有以 Jupyter 起步，是因为它只在教学和交互探索中有效，不适合大数据处理。Jupyter 对用户隐藏的细节，在科学数据中是必要的。但是 Jupyter 有它的优点，是图片、文字与程序的混排，对非专业用户直观友好。它是 literate programming ，是一次原则的体现，我们在 4 一节深入讨论。 Jupyter 从 IPython 项目起步，把交互境放到了网页上。在直观的优点之上，它带的困扰是写长的程序时，网页是很糟糕的环境，远远不如完整的编辑器的功能。 Emacs 和 VSCode 的 Jupyter 扩展，在一定程序上缓解了这一问题，使我们写 Jupyter 的程序时，可以用强大的编辑功能替代网页。 Jupyter 支持 Python 之外的运行环境，是一款值得探索的辅助工具。

3 Python 函数

一直以来，函数都是程序的基本组成部分，是由多段指令组成的功能单元，与数学函数有类似之处。函数给程序划分了逻辑层次，在函数内我们关注函数的实现方法，在函数外我们关注它的使用方法并重复调用，有助于实践“一次”原则。

Python 定义函数的语法如下：

def add(x, y):
    print(f"x is {x} and y is {y}")
    return x + y  # Return values with a return statement
add(3, 5)

x is 3 and y is 5
8

关键字 def 跟一个带 (...): 的表达式。返回值用 return 给出。外部调用 add(3,5) 在函数的入口， x 和 y 被赋予 3 和 5。

再举一个例子，

def swap(x, y):
    return y, x
a = '左'
b = '右'

print(a,b)
a, b = swap(a,b)
print(a,b)

左 右
右 左

在调用 swap() 函数的一步， x 被赋予 a 的值“左”， y 被赋予“右”。经过函数计算，得到的返回值是交换的结果“右”、“左”，用于覆盖 a 和 b 的值。这个函数返回了两个值，在 Python 的内部表示为一个元组。在函数的内与外，变量有各自的定义范围，这使得我们不论从内还是从外看来，都有完整的逻辑。

3.0.1 函数命名空间

函数内与函数外的命名空间相互独立。看下面的例子，

x = 1
def scope():
    x = 2
scope()
print(x)

scope() 函数内的变量 x ，与外部独立。变量是由字典实现的，函数内与函数外的变量分属于不同的字典。函数从内部操作它外部的变量，是不鼓励的，一定要做时，使用 global 标识符：

x = 1
def scope():
     global x
     x = 2
scope()
print(x)

3.0.2 元组赋值

值得注意一点， Python 可以通过对元组赋值的形式，实现元组各元素赋值。

print(a, b)
a, b = b, a # 是 (a, b) = (b, a) 的简写
print(a, b)

左 右
右 左

这使得置换变量的值非常简单直接，因为否则要引入中间变量，这样写，

print(a, b)
tmp = a
a = b
b = tmp
print(a, b)

左 右
右 左

十分不便。

3.0.3 函数的调用

函数可供大批量调用。使用 map 来自然定义一个作用在迭代器上的函数，是分别把它作用到每个元素后返回值组成的迭代器。例如把列表看作迭代器，

def squared(x):
    return x*x

list(map(squared, [1, 2, 3, 4]))

[1, 4, 9, 16]

它把一列数字逐个取平方。 map() 返回的迭代器，交给 list() 转换成了列表输出。

等价于直接用迭代器输入，

list(map(squared, range(1, 5)))

[1, 4, 9, 16]

3.0.4 无名函数

如果函数名不重要，可以直接把函数无名化定义嵌入到语句中。

list(map(lambda x: x*x, range(1,5)))

[1, 4, 9, 16]

省去了函数名， return 等。 lambda 的名字来自理论计算机科学的 lambda calculus 理论，函数式程序的基础。

4 文档

4.1 注释

注释由半角的“#”引出，多行注释用多个“#”：

# 高精度整数举例
#
2**1000

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

要让程序易于被人理解，一方面应提升和打磨代码风格，让程序本身的逻辑易懂，另一方面，对不明显的程序段落，通过注释来解释。

4.2 函数的文档

函数是代码复用的单元，这意味着我们经常会用到别人创作的函数，以节省精力，站在巨人或者一群小矮人的肩膀上。函数定义后紧接的字符串是它的文档，它被特殊对待，由 help() 读取输出：

def spherical_harmonic_fitter(grid, order):
    "求球谐函数拟合的系数"

    # 具体实现省略
    pass

help(spherical_harmonic_fitter)

Help on function spherical_harmonic_fitter in module __main__:

spherical_harmonic_fitter(grid, order)
    求球谐函数拟合的系数

帮助告诉我们，在 “__main__” 模块（程序默认环境）的名字空间里，有这个函数。

一行的帮助有些单薄。函数的文档，由他人使用，文档写得越详细越好。对复杂的函数而言，函数的帮助需要长篇大论。Python 的多行字符串，正好胜任这一点。多行字符串以三个引号开始，三个引号结束，单引号双引号皆可。三引号设计恰好不与人类语言冲突。

def spherical_harmonic_fitter(grid, order):
    '''
    求球谐函数拟合的系数

    输入
    ~~~
    grid: 球面上连续函数在固定格点上的取值
    order: 拟合时球谐函数近似截断的阶数

    输出
    ~~~
    拟合系数矩阵
    '''

    # 具体实现省略
    pass

help(spherical_harmonic_fitter)

Help on function spherical_harmonic_fitter in module __main__:

spherical_harmonic_fitter(grid, order)
    求球谐函数拟合的系数

    输入
    ~~~
    grid: 球面上连续函数在固定格点上的取值
    order: 拟合时球谐函数近似截断的阶数

    输出
    ~~~
    拟合系数矩阵

这个例子里，我们把文档写得更加详细。不仅有标题，还详细注明了输入和输出的含义。调用函数的人——可能是队友也可能是未来的自己——应当在不阅读的原代码的前提下，顺利使用它们。即使在写作程序时，感觉很显然，也应该认真撰写文档。在例子中，我们用多行字符串写出，函数在固定格点上取值，定义了输入变量 order 的含义，输出的意义是系统。

4.3 标准库中的文档

Python 的标准库非常重视文档，几乎所有的函数都带有详细的排版精美的多行字符串说明。

help(int)

Help on class int in module builtins:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __ceil__(...)
 |      Ceiling of an Integral returns itself.
 |  
 |  __divmod__(self, value, /)
 |      Return divmod(self, value).
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __float__(self, /)
 |      float(self)
 |  
 |  __floor__(...)
 |      Flooring an Integral returns itself.
 |  
 |  __floordiv__(self, value, /)
 |      Return self//value.
 |  
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getnewargs__(self, /)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __index__(self, /)
 |      Return self converted to an integer, if self is suitable for use as an index into a list.
 |  
 |  __int__(self, /)
 |      int(self)
 |  
 |  __invert__(self, /)
 |      ~self
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __lshift__(self, value, /)
 |      Return self<<value.
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mod__(self, value, /)
 |      Return self%value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __neg__(self, /)
 |      -self
 |  
 |  __or__(self, value, /)
 |      Return self|value.
 |  
 |  __pos__(self, /)
 |      +self
 |  
 |  __pow__(self, value, mod=None, /)
 |      Return pow(self, value, mod).
 |  
 |  __radd__(self, value, /)
 |      Return value+self.
 |  
 |  __rand__(self, value, /)
 |      Return value&self.
 |  
 |  __rdivmod__(self, value, /)
 |      Return divmod(value, self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __rfloordiv__(self, value, /)
 |      Return value//self.
 |  
 |  __rlshift__(self, value, /)
 |      Return value<<self.
 |  
 |  __rmod__(self, value, /)
 |      Return value%self.
 |  
 |  __rmul__(self, value, /)
 |      Return value*self.
 |  
 |  __ror__(self, value, /)
 |      Return value|self.
 |  
 |  __round__(...)
 |      Rounding an Integral returns itself.
 |      Rounding with an ndigits argument also returns an integer.
 |  
 |  __rpow__(self, value, mod=None, /)
 |      Return pow(value, self, mod).
 |  
 |  __rrshift__(self, value, /)
 |      Return value>>self.
 |  
 |  __rshift__(self, value, /)
 |      Return self>>value.
 |  
 |  __rsub__(self, value, /)
 |      Return value-self.
 |  
 |  __rtruediv__(self, value, /)
 |      Return value/self.
 |  
 |  __rxor__(self, value, /)
 |      Return value^self.
 |  
 |  __sizeof__(self, /)
 |      Returns size in memory, in bytes.
 |  
 |  __sub__(self, value, /)
 |      Return self-value.
 |  
 |  __truediv__(self, value, /)
 |      Return self/value.
 |  
 |  __trunc__(...)
 |      Truncating an Integral returns itself.
 |  
 |  __xor__(self, value, /)
 |      Return self^value.
 |  
 |  as_integer_ratio(self, /)
 |      Return integer ratio.
 |      
 |      Return a pair of integers, whose ratio is exactly equal to the original int
 |      and with a positive denominator.
 |      
 |      >>> (10).as_integer_ratio()
 |      (10, 1)
 |      >>> (-10).as_integer_ratio()
 |      (-10, 1)
 |      >>> (0).as_integer_ratio()
 |      (0, 1)
 |  
 |  bit_length(self, /)
 |      Number of bits necessary to represent self in binary.
 |      
 |      >>> bin(37)
 |      '0b100101'
 |      >>> (37).bit_length()
 |      6
 |  
 |  conjugate(...)
 |      Returns self, the complex conjugate of any int.
 |  
 |  to_bytes(self, /, length, byteorder, *, signed=False)
 |      Return an array of bytes representing an integer.
 |      
 |      length
 |        Length of bytes object to use.  An OverflowError is raised if the
 |        integer is not representable with the given number of bytes.
 |      byteorder
 |        The byte order used to represent the integer.  If byteorder is 'big',
 |        the most significant byte is at the beginning of the byte array.  If
 |        byteorder is 'little', the most significant byte is at the end of the
 |        byte array.  To request the native byte order of the host system, use
 |        `sys.byteorder' as the byte order value.
 |      signed
 |        Determines whether two's complement is used to represent the integer.
 |        If signed is False and a negative integer is given, an OverflowError
 |        is raised.
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  from_bytes(bytes, byteorder, *, signed=False) from builtins.type
 |      Return the integer represented by the given array of bytes.
 |      
 |      bytes
 |        Holds the array of bytes to convert.  The argument must either
 |        support the buffer protocol or be an iterable object producing bytes.
 |        Bytes and bytearray are examples of built-in objects that support the
 |        buffer protocol.
 |      byteorder
 |        The byte order used to represent the integer.  If byteorder is 'big',
 |        the most significant byte is at the beginning of the byte array.  If
 |        byteorder is 'little', the most significant byte is at the end of the
 |        byte array.  To request the native byte order of the host system, use
 |        `sys.byteorder' as the byte order value.
 |      signed
 |        Indicates whether two's complement is used to represent the integer.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  denominator
 |      the denominator of a rational number in lowest terms
 |  
 |  imag
 |      the imaginary part of a complex number
 |  
 |  numerator
 |      the numerator of a rational number in lowest terms
 |  
 |  real
 |      the real part of a complex number

Python 的文档网站的内容，就是由这些代码中的函数文档生成。这种把人类可读和机器可读的文字写在一起的思想，叫做“literate programming”，目标是让程序既适合被机器执行，也适合被人类阅读。修改程序与修改文档要保持同步。相反，如果程序与文档写在不同地方，甚至由不同的人来撰写，那么大概率经年累月，它们会有很大出入，使用文档失去了应有的价值。因此从一开始贯彻 literate programming 的原则，有助于长远的程序可读性和易用性，注意体会其中的“一次”原则：文档和程序在说同一件事情，我们只在一个地方把它们全都写出来。在通过书籍或课程系统性地对 Python 语言和环境的整形把握之后，随手查阅 help() 所得的在线帮助非常实用，是灵活的“工具书”。我们有了基础之后，可以借助这个强大的帮助系统边学边用，学习和工作效率都会很高。

第三讲 复合类型与函数 DONE

Table of Contents