Python 101 · Iterators, Generators, Comprehensions, Iterable, Hashable

Jan 30, 2016 | Tech Software

Index
Index

Iterators

An object that represents a stream of data. It fetches one element at a time using the __next__() method and maintains its internal state to know "what's next."

  • range

    : generates a Sequence (not an Iterator actually) of numbers for loops.
  • enumerate

    : pairs each element with its index counter.
  • zip

    : combines multiple iterables element-wise into tuples.
  • map

    : applies a specific function to every item in an iterable.
  • filter

    : keeps only the items in an iterable that satisfy a condition.
  • reversed

    : returns a reverse iterator that accesses elements from end to start without modifying the original sequence.
  • Iterator vs. Sequence

    :
Function Return Type Reusable? Indexing/Slicing?
range() Sequence Yes Yes
enumerate() Iterator No No
zip() Iterator No No
map() Iterator No No
filter() Iterator No No
reversed() Iterator No No
Note:
  • Sequence: you can loop through it multiple times, and it supports indexing and slicing, while being memory efficient.
  • Iterator: you can loop through it only once, and it's consumed and becomes empty; if you need the data again, you must convert it into a lasting type or recreate the object.

Generators

A special type of iterator that allows you to loop over data without storing the entire dataset in memory. It generates values on the fly (lazily) using yield, making it extremely memory-efficient for processing large files or streams.

  • yield

    : makes a generator function.
  • Expressions

    : make generator instances.

Iterators vs. Generators

特性 Iterator Generator
定义 遵循迭代协议的对象。 基于函数的特殊迭代器。
实现方式 基于类(使用 __iter____next__)。 基于函数(使用 yield 关键字)。
状态管理 手动管理(通过类实例变量记录位置)。 自动管理(在 yield 处自动暂停/恢复)。
内存占用 (惰性求值,按需生成数据)。 (惰性求值,按需生成数据)。
复杂度 (需要编写较多样板代码)。 (语法简洁,可读性强)。
典型用途 自定义复杂对象或数据结构。 处理数据流、简单序列或流水线计算。

Comprehensions

A concise, readable syntax for creating new collections by iterating over existing ones.

  • list

  • dict

  • set

Comprehensions vs. Generators

Features

列表推导式 [x for x in data] 生成器表达式 (x for x in data)
内存占用 随数据量 线性增长(一次性存入内存) 恒定不变(仅保存计算逻辑)
执行速度 遍历小规模数据时更快 在需要提前退出(如 any)时效率更高
可重复性 可以多次访问、切片、索引 一次性对象,遍历完就枯竭
结果类型 返回一个 list 对象 返回一个 generator 对象

Situations

列表推导式 [] 生成器表达式 ()
需要反复使用结果 是(列表可多次访问) 否(一次性消耗)
需要索引或切片 (如 res[0])
处理超大数据集(百万级以上) 否(容易内存溢出) 是(内存占用极低)
仅作为函数的参数(如 sum 可选,但效率低 最佳实践

Itertools

A standard Python library module providing a collection of efficient, memory-friendly tools for creating and manipulating iterators.

  • count

    : generates an infinite sequence of incrementing numbers.
  • cycle

    : repeats an iterable indefinitely in a loop.
  • repeat

    : returns the same object repeatedly, either infinitely or a fixed number of times.
  • chain

    : combines multiple iterables into one continuous sequence.
  • islice

    : returns selected elements from an iterable, similar to list slicing but works on iterators.

Iterable and Classes

  • __iter__ + yield

    : automatically handles iterating protocols, no need for __next__.
  • __iter__ + __next__

    : gives you more control over the iteration process.
  • __getitem__

    : allows you to access elements by index and iterate over them.
  • __iter__ + __reversed__

    : provides a reverse iterator.

Hashable and Classes

Concept

In Python, an object is considered Hashable if it meets the following three conditions:

  1. Immutable hash: its hash() value, via __hash__ implementation, remains constant during its entire lifetime.
  2. Comparability: it implements the __eq__ method to allow equality checks.
  3. Equality consistency: if a == b is true, then hash(a) must equal hash(b).
Key Usage: Only hashable objects can be used as dict keys or set elements.

Built-in Types

数据类型 是否可哈希 原因 / 备注
int, float, str, bool 原子级不可变类型。对象创建后值无法修改。
tuple 取决于内容 只有当元组内的所有元素也都是可哈希时,元组才可哈希。
list, dict, set 可变容器。内容可以随时修改,这会导致哈希值失效。
frozenset 集合的不可变版本,专门设计为可哈希。

Hashable Classes

  • By default, custom classes are hashable based on their memory address. However, if you override __eq__, you must also override __hash__; otherwise, the class becomes unhashable.

Hashable via dataclass

  • To avoid manual implementation, use @dataclass(frozen=True) to automatically make a class hashable.

Pitfalls

  • Never modify the attributes of a hashable object, as it leads to catastrophic consequences.