Python Data Model #3
34 days ago · 36 views
Understanding Python's Data Model - Part 3: Practical Patterns, Pitfalls & Performance
Now that you understand what special methods exist, let's explore how to use them effectively in real-world scenarios, avoid common mistakes, and understand their performance implications.
Practical Patterns & Real Examples
Pattern 1: Building a Custom Collection
Let's build a UniqueList that acts like a list but only stores unique elements:
class UniqueList:
"""A list that automatically removes duplicates"""
def __init__(self, items=None):
self._items = []
self._seen = set()
if items:
for item in items:
self.append(item)
def append(self, item):
"""Add item only if not already present"""
if item not in self._seen:
self._items.append(item)
self._seen.add(item)
def __len__(self):
return len(self._items)
def __getitem__(self, index):
return self._items[index]
def __setitem__(self, index, value):
old_value = self._items[index]
if value not in self._seen or value == old_value:
self._seen.discard(old_value)
self._items[index] = value
self._seen.add(value)
else:
raise ValueError(f"{value} already exists in list")
def __delitem__(self, index):
value = self._items[index]
del self._items[index]
self._seen.discard(value)
def __iter__(self):
return iter(self._items)
def __contains__(self, item):
return item in self._seen # O(1) instead of O(n)
def __repr__(self):
return f"UniqueList({self._items})"
def __eq__(self, other):
if isinstance(other, UniqueList):
return self._items == other._items
return self._items == other
# Usage:
unique = UniqueList([1, 2, 3, 2, 1, 4])
print(unique) # UniqueList([1, 2, 3, 4])
print(len(unique)) # 4
print(2 in unique) # True - O(1) lookup!
print(unique[1]) # 2
unique.append(5) # Adds 5
unique.append(5) # Silently ignores duplicate
print(unique) # UniqueList([1, 2, 3, 4, 5])
Key techniques:
- Combines list interface with set for efficient lookups
- Implements full sequence protocol (__len__, __getitem__, __iter__)
- Overrides __contains__ for O(1) membership testing
- Maintains invariants in __setitem__ and __delitem__
Pattern 2: Creating a Fluent API
Fluent APIs allow method chaining for readable code:
class QueryBuilder:
"""SQL query builder with fluent interface"""
def __init__(self, table):
self.table = table
self._select = ["*"]
self._where = []
self._order = None
self._limit = None
def select(self, *fields):
"""SELECT fields"""
self._select = fields
return self # Enable chaining
def where(self, condition):
"""WHERE condition"""
self._where.append(condition)
return self # Enable chaining
def order_by(self, field):
"""ORDER BY field"""
self._order = field
return self # Enable chaining
def limit(self, n):
"""LIMIT n"""
self._limit = n
return self # Enable chaining
def __str__(self):
"""Convert to SQL string"""
query = f"SELECT {', '.join(self._select)} FROM {self.table}"
if self._where:
query += f" WHERE {' AND '.join(self._where)}"
if self._order:
query += f" ORDER BY {self._order}"
if self._limit:
query += f" LIMIT {self._limit}"
return query
def __repr__(self):
return f"QueryBuilder('{self.table}')"
# Usage - beautiful chaining:
query = (QueryBuilder("users")
.select("name", "email")
.where("age > 18")
.where("active = true")
.order_by("name")
.limit(10))
print(query)
# SELECT name, email FROM users WHERE age > 18 AND active = true ORDER BY name LIMIT 10
Key techniques:
- Every method returns self for chaining
- __str__ provides the final conversion
- Parentheses allow multi-line chaining
Pattern 3: Domain-Specific Language (DSL)
Use operator overloading to create intuitive APIs:
class Validator:
"""Validation DSL using operator overloading"""
def __init__(self, name, value):
self.name = name
self.value = value
self.errors = []
def __gt__(self, other):
"""value > other"""
if not self.value > other:
self.errors.append(f"{self.name} must be greater than {other}")
return self
def __lt__(self, other):
"""value < other"""
if not self.value < other:
self.errors.append(f"{self.name} must be less than {other}")
return self
def __and__(self, other):
"""Combine validators"""
self.errors.extend(other.errors)
return self
def is_valid(self):
return len(self.errors) == 0
def __bool__(self):
return self.is_valid()
# Usage - reads almost like English:
age = Validator("Age", 15)
score = Validator("Score", 95)
validation = (age > 18) & (score < 100)
if not validation:
print("Errors:", validation.errors)
# Errors: ['Age must be greater than 18']
Pattern 4: Smart Configuration Objects
Callable objects for flexible configuration:
class Config:
"""Configuration with callable interface"""
def __init__(self, **defaults):
self._config = defaults
def __call__(self, **updates):
"""Update config and return new instance"""
new_config = self._config.copy()
new_config.update(updates)
return Config(**new_config)
def __getattr__(self, name):
"""Access config values as attributes"""
if name in self._config:
return self._config[name]
raise AttributeError(f"No config key: {name}")
def __getitem__(self, key):
"""Access config values as dictionary"""
return self._config[key]
def __repr__(self):
items = ", ".join(f"{k}={v}" for k, v in self._config.items())
return f"Config({items})"
# Usage:
base_config = Config(host="localhost", port=8000, debug=False)
print(base_config.host) # localhost
# Create variations without mutating original:
dev_config = base_config(debug=True, port=3000)
print(dev_config) # Config(host=localhost, port=3000, debug=True)
print(base_config.debug) # False - original unchanged
Pattern 5: Resource Management with Context Managers
Practical context manager for database transactions:
class Transaction:
"""Database transaction context manager"""
def __init__(self, connection):
self.connection = connection
self.transaction_active = False
def __enter__(self):
"""Start transaction"""
self.connection.execute("BEGIN TRANSACTION")
self.transaction_active = True
return self
def __exit__(self, exc_type, exc_value, traceback):
"""Commit or rollback based on exceptions"""
if self.transaction_active:
if exc_type is None:
# No exception - commit
self.connection.execute("COMMIT")
else:
# Exception occurred - rollback
self.connection.execute("ROLLBACK")
# Return False to propagate the exception
return False
return False
def rollback(self):
"""Manual rollback"""
if self.transaction_active:
self.connection.execute("ROLLBACK")
self.transaction_active = False
# Usage:
# with Transaction(db_connection) as txn:
# db_connection.execute("INSERT INTO users ...")
# db_connection.execute("UPDATE accounts ...")
# # Automatically commits on success, rolls back on error
Pattern 6: Immutable Objects with Tuple-like Behavior
Create immutable records:
class Point:
"""Immutable 2D point"""
__slots__ = ('_x', '_y') # Memory optimization
def __init__(self, x, y):
# Use object.__setattr__ to bypass immutability in __init__
object.__setattr__(self, '_x', x)
object.__setattr__(self, '_y', y)
@property
def x(self):
return self._x
@property
def y(self):
return self._y
def __setattr__(self, name, value):
"""Prevent attribute modification"""
raise AttributeError("Point is immutable")
def __delattr__(self, name):
"""Prevent attribute deletion"""
raise AttributeError("Point is immutable")
def __hash__(self):
"""Immutable objects should be hashable"""
return hash((self._x, self._y))
def __eq__(self, other):
if isinstance(other, Point):
return self._x == other._x and self._y == other._y
return NotImplemented
def __repr__(self):
return f"Point({self._x}, {self._y})"
def __iter__(self):
"""Act like a tuple"""
return iter((self._x, self._y))
def __getitem__(self, index):
"""Support indexing like a tuple"""
return (self._x, self._y)[index]
# Usage:
p = Point(3, 4)
x, y = p # Unpacking works!
print(p[0], p[1]) # Indexing works!
# p.x = 5 # Error: Point is immutable
# Can use as dict keys:
distances = {Point(0, 0): 0, Point(1, 1): 1.41}
Common Pitfalls & Best Practices
Pitfall 1: Implementing __eq__ Without __hash__
Problem:
class BadPoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# Missing __hash__!
p1 = BadPoint(1, 2)
p2 = BadPoint(1, 2)
print(p1 == p2) # True
# But this fails:
# points = {p1, p2} # TypeError: unhashable type: 'BadPoint'
Solution: Always implement __hash__ when you implement __eq__:
class GoodPoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
if isinstance(other, GoodPoint):
return self.x == other.x and self.y == other.y
return NotImplemented
def __hash__(self):
return hash((self.x, self.y))
Rule: Objects that compare equal must have the same hash value.
Pitfall 2: Confusing __repr__ and __str__
Problem:
class BadBook:
def __init__(self, title):
self.title = title
def __str__(self):
return self.title # User-friendly but not debuggable
# No __repr__!
book = BadBook("1984")
print(book) # "1984" - looks nice
print([book]) # [<__main__.BadBook object at 0x...>] - ugly!
Solution: Always implement __repr__, optionally add __str__:
class GoodBook:
def __init__(self, title):
self.title = title
def __repr__(self):
return f"Book('{self.title}')" # Unambiguous
def __str__(self):
return self.title # User-friendly
Best practice:
- __repr__ should be unambiguous: Book('1984')
- __str__ should be readable: 1984
- When in doubt, just implement __repr__
Pitfall 3: Returning Wrong Type from Operators
Problem:
class BadVector:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
return (self.x + other.x, self.y + other.y) # Returns tuple!
v1 = BadVector(1, 2)
v2 = BadVector(3, 4)
v3 = v1 + v2
print(type(v3)) # <class 'tuple'> - Not a BadVector!
# v4 = v3 + v1 # Error: tuples don't have x, y attributes
Solution: Always return the same type:
class GoodVector:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
return GoodVector(self.x + other.x, self.y + other.y)
Pitfall 4: Not Returning NotImplemented for Unsupported Operations
Problem:
class BadVector:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
# Crashes if other doesn't have x, y:
return BadVector(self.x + other.x, self.y + other.y)
v = BadVector(1, 2)
# result = v + 5 # AttributeError: 'int' object has no attribute 'x'
Solution: Return NotImplemented for unsupported types:
class GoodVector:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
if isinstance(other, GoodVector):
return GoodVector(self.x + other.x, self.y + other.y)
return NotImplemented # Lets Python try other.__radd__(self)
v = GoodVector(1, 2)
result = v + 5 # TypeError: unsupported operand type(s) - better error!
Why NotImplemented? It tells Python "I don't know how to do this, try something else" (like reverse operators or raising a proper TypeError).
Pitfall 5: Mutable Default Arguments in __init__
Problem:
class BadList:
def __init__(self, items=[]): # Mutable default!
self.items = items
list1 = BadList()
list2 = BadList()
list1.items.append("A")
print(list2.items) # ["A"] - WTF?! They share the same list!
Solution: Use None as default:
class GoodList:
def __init__(self, items=None):
self.items = items if items is not None else []
Pitfall 6: Forgetting to Return self in In-Place Operators
Problem:
class BadVector:
def __init__(self, x, y):
self.x = x
self.y = y
def __iadd__(self, other):
self.x += other.x
self.y += other.y
# Forgot to return self!
v = BadVector(1, 2)
v += BadVector(3, 4)
print(v) # None - assignment returned None!
Solution: Always return self:
class GoodVector:
def __iadd__(self, other):
self.x += other.x
self.y += other.y
return self # Critical!
Pitfall 7: Infinite Recursion in __getattribute__
Problem:
class Bad:
def __getattribute__(self, name):
# Accessing self.name triggers __getattribute__ again!
return self.name # Infinite recursion!
Solution: Use super() or object.__getattribute__:
class Good:
def __getattribute__(self, name):
# Use super() to access attributes safely:
return super().__getattribute__(name)
Pitfall 8: Modifying Objects During Iteration
Problem:
class BadCollection:
def __init__(self):
self.items = [1, 2, 3, 4, 5]
def __iter__(self):
return iter(self.items)
collection = BadCollection()
for item in collection:
if item % 2 == 0:
collection.items.remove(item) # Modifying during iteration!
# May skip elements or raise errors
Solution: Iterate over a copy or collect items to remove:
class GoodCollection:
def __init__(self):
self.items = [1, 2, 3, 4, 5]
def remove_evens(self):
# Iterate over a copy:
for item in list(self.items):
if item % 2 == 0:
self.items.remove(item)
Performance Considerations
1. Special Method Overhead
Special methods have a cost. Python calls them frequently, so implementation matters:
Example: Comparison operators
import timeit
class SlowPoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
# Slow: creates intermediate objects
return (self.x, self.y) == (other.x, other.y)
class FastPoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
# Fast: direct comparison
return self.x == other.x and self.y == other.y
# FastPoint is ~30% faster in equality checks
Tip: Keep special methods simple and fast. They're called often.
2. __slots__ for Memory Optimization
By default, Python stores attributes in a __dict__, which uses more memory. Use __slots__ for memory-critical classes:
class WithoutSlots:
def __init__(self, x, y):
self.x = x
self.y = y
class WithSlots:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
# Memory comparison:
# WithoutSlots: ~240 bytes per instance
# WithSlots: ~64 bytes per instance (70% reduction!)
Tradeoffs:
- ✅ Significant memory savings (especially with many instances)
- ✅ Slightly faster attribute access
- ❌ Can't add new attributes dynamically
- ❌ Can't use weak references (unless '__weakref__' is in slots)
When to use: Classes with many instances (thousands+) and fixed attributes.
3. __getattribute__ Performance Impact
__getattribute__ is called for every attribute access. This can be slow:
class LoggedAccess:
def __getattribute__(self, name):
print(f"Accessing {name}") # I/O on every attribute access!
return super().__getattribute__(name)
obj = LoggedAccess()
# This is very slow if you access attributes frequently:
for _ in range(1000):
x = obj.some_attribute # Prints and logs 1000 times
Tip: Avoid __getattribute__ unless absolutely necessary. Use __getattr__ instead (only called when attribute not found).
4. Generator-Based __iter__
For large sequences, use generators to save memory:
class HugeRange:
"""Memory-efficient range using generator"""
def __init__(self, start, end):
self.start = start
self.end = end
def __iter__(self):
# Generator - doesn't create list in memory
current = self.start
while current < self.end:
yield current
current += 1
def __len__(self):
return self.end - self.start
# Memory efficient even with millions of numbers:
huge = HugeRange(0, 1_000_000)
for num in huge:
if num > 10:
break # Only generates what we need
5. Caching Hash Values
If __hash__ is expensive, cache the result:
class ExpensiveHash:
def __init__(self, data):
self.data = data
self._hash = None # Cache
def __hash__(self):
if self._hash is None:
# Expensive computation only happens once:
self._hash = hash(tuple(self.data))
return self._hash
def __eq__(self, other):
return self.data == other.data
Important: Only cache if objects are immutable!
6. When NOT to Use Operator Overloading
Operator overloading can hurt performance if used carelessly:
class MatrixBad:
def __add__(self, other):
# Creates a new matrix every time
result = Matrix(self.rows, self.cols)
for i in range(self.rows):
for j in range(self.cols):
result[i][j] = self[i][j] + other[i][j]
return result
# This creates many intermediate objects:
result = m1 + m2 + m3 + m4 # 3 new matrices created!
Better: Use explicit methods for complex operations:
class MatrixGood:
def add_inplace(self, other):
# Modifies in place - no allocation
for i in range(self.rows):
for j in range(self.cols):
self[i][j] += other[i][j]
return self
# More efficient:
m1.add_inplace(m2).add_inplace(m3).add_inplace(m4)
Rule of thumb: Use operators for simple, expected operations. Use explicit methods for complex operations or when performance matters.
Quick Reference: When to Use What
| Pattern | Use When | Avoid When |
|---|---|---|
__repr__ |
Always implement | Never skip this |
__str__ |
User-facing output differs from repr | repr is sufficient |
__eq__ and __hash__ |
Objects need equality comparison | Object is mutable |
| Arithmetic operators | Natural mathematical meaning | Confusing or unexpected |
__getitem__ |
Object is logically a sequence/mapping | Object isn't a container |
__iter__ |
Object contains items to iterate | Object is single-valued |
__call__ |
Object acts as function or callback | Standard methods are clearer |
__enter__/__exit__ |
Resource management needed | Simple function is enough |
__slots__ |
Many instances, fixed attributes | Need dynamic attributes |
__getattribute__ |
Must intercept all attribute access | __getattr__ would work |
Resources & Further Reading
- Python Performance Tips - Official performance guide
- Memory Management in Python - Understanding Python's memory model
- Python Patterns - Design patterns in Python
- Effective Python by Brett Slatkin - Best practices and pitfalls
- Python Cookbook by David Beazley - Advanced recipes and patterns
- Raymond Hettinger's talks - Python core developer with excellent talks on internals
- Python's
__slots__explained - Memory optimization details