+ BASH HEAD GET BYTES +

← Back to Blog

Python Data Model #3

34 days ago · 36 views

Understanding Python's Data Model - Part 3: Practical Patterns, Pitfalls & Performance

Now that you understand what special methods exist, let's explore how to use them effectively in real-world scenarios, avoid common mistakes, and understand their performance implications.

Practical Patterns & Real Examples

Pattern 1: Building a Custom Collection

Let's build a UniqueList that acts like a list but only stores unique elements:

class UniqueList:
    """A list that automatically removes duplicates"""

    def __init__(self, items=None):
        self._items = []
        self._seen = set()
        if items:
            for item in items:
                self.append(item)

    def append(self, item):
        """Add item only if not already present"""
        if item not in self._seen:
            self._items.append(item)
            self._seen.add(item)

    def __len__(self):
        return len(self._items)

    def __getitem__(self, index):
        return self._items[index]

    def __setitem__(self, index, value):
        old_value = self._items[index]
        if value not in self._seen or value == old_value:
            self._seen.discard(old_value)
            self._items[index] = value
            self._seen.add(value)
        else:
            raise ValueError(f"{value} already exists in list")

    def __delitem__(self, index):
        value = self._items[index]
        del self._items[index]
        self._seen.discard(value)

    def __iter__(self):
        return iter(self._items)

    def __contains__(self, item):
        return item in self._seen  # O(1) instead of O(n)

    def __repr__(self):
        return f"UniqueList({self._items})"

    def __eq__(self, other):
        if isinstance(other, UniqueList):
            return self._items == other._items
        return self._items == other

# Usage:
unique = UniqueList([1, 2, 3, 2, 1, 4])
print(unique)           # UniqueList([1, 2, 3, 4])
print(len(unique))      # 4
print(2 in unique)      # True - O(1) lookup!
print(unique[1])        # 2
unique.append(5)        # Adds 5
unique.append(5)        # Silently ignores duplicate
print(unique)           # UniqueList([1, 2, 3, 4, 5])

Key techniques: - Combines list interface with set for efficient lookups - Implements full sequence protocol (__len__, __getitem__, __iter__) - Overrides __contains__ for O(1) membership testing - Maintains invariants in __setitem__ and __delitem__

Pattern 2: Creating a Fluent API

Fluent APIs allow method chaining for readable code:

class QueryBuilder:
    """SQL query builder with fluent interface"""

    def __init__(self, table):
        self.table = table
        self._select = ["*"]
        self._where = []
        self._order = None
        self._limit = None

    def select(self, *fields):
        """SELECT fields"""
        self._select = fields
        return self  # Enable chaining

    def where(self, condition):
        """WHERE condition"""
        self._where.append(condition)
        return self  # Enable chaining

    def order_by(self, field):
        """ORDER BY field"""
        self._order = field
        return self  # Enable chaining

    def limit(self, n):
        """LIMIT n"""
        self._limit = n
        return self  # Enable chaining

    def __str__(self):
        """Convert to SQL string"""
        query = f"SELECT {', '.join(self._select)} FROM {self.table}"

        if self._where:
            query += f" WHERE {' AND '.join(self._where)}"

        if self._order:
            query += f" ORDER BY {self._order}"

        if self._limit:
            query += f" LIMIT {self._limit}"

        return query

    def __repr__(self):
        return f"QueryBuilder('{self.table}')"

# Usage - beautiful chaining:
query = (QueryBuilder("users")
         .select("name", "email")
         .where("age > 18")
         .where("active = true")
         .order_by("name")
         .limit(10))

print(query)
# SELECT name, email FROM users WHERE age > 18 AND active = true ORDER BY name LIMIT 10

Key techniques: - Every method returns self for chaining - __str__ provides the final conversion - Parentheses allow multi-line chaining

Pattern 3: Domain-Specific Language (DSL)

Use operator overloading to create intuitive APIs:

class Validator:
    """Validation DSL using operator overloading"""

    def __init__(self, name, value):
        self.name = name
        self.value = value
        self.errors = []

    def __gt__(self, other):
        """value > other"""
        if not self.value > other:
            self.errors.append(f"{self.name} must be greater than {other}")
        return self

    def __lt__(self, other):
        """value < other"""
        if not self.value < other:
            self.errors.append(f"{self.name} must be less than {other}")
        return self

    def __and__(self, other):
        """Combine validators"""
        self.errors.extend(other.errors)
        return self

    def is_valid(self):
        return len(self.errors) == 0

    def __bool__(self):
        return self.is_valid()

# Usage - reads almost like English:
age = Validator("Age", 15)
score = Validator("Score", 95)

validation = (age > 18) & (score < 100)

if not validation:
    print("Errors:", validation.errors)
    # Errors: ['Age must be greater than 18']

Pattern 4: Smart Configuration Objects

Callable objects for flexible configuration:

class Config:
    """Configuration with callable interface"""

    def __init__(self, **defaults):
        self._config = defaults

    def __call__(self, **updates):
        """Update config and return new instance"""
        new_config = self._config.copy()
        new_config.update(updates)
        return Config(**new_config)

    def __getattr__(self, name):
        """Access config values as attributes"""
        if name in self._config:
            return self._config[name]
        raise AttributeError(f"No config key: {name}")

    def __getitem__(self, key):
        """Access config values as dictionary"""
        return self._config[key]

    def __repr__(self):
        items = ", ".join(f"{k}={v}" for k, v in self._config.items())
        return f"Config({items})"

# Usage:
base_config = Config(host="localhost", port=8000, debug=False)
print(base_config.host)      # localhost

# Create variations without mutating original:
dev_config = base_config(debug=True, port=3000)
print(dev_config)            # Config(host=localhost, port=3000, debug=True)
print(base_config.debug)     # False - original unchanged

Pattern 5: Resource Management with Context Managers

Practical context manager for database transactions:

class Transaction:
    """Database transaction context manager"""

    def __init__(self, connection):
        self.connection = connection
        self.transaction_active = False

    def __enter__(self):
        """Start transaction"""
        self.connection.execute("BEGIN TRANSACTION")
        self.transaction_active = True
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        """Commit or rollback based on exceptions"""
        if self.transaction_active:
            if exc_type is None:
                # No exception - commit
                self.connection.execute("COMMIT")
            else:
                # Exception occurred - rollback
                self.connection.execute("ROLLBACK")
                # Return False to propagate the exception
                return False
        return False

    def rollback(self):
        """Manual rollback"""
        if self.transaction_active:
            self.connection.execute("ROLLBACK")
            self.transaction_active = False

# Usage:
# with Transaction(db_connection) as txn:
#     db_connection.execute("INSERT INTO users ...")
#     db_connection.execute("UPDATE accounts ...")
#     # Automatically commits on success, rolls back on error

Pattern 6: Immutable Objects with Tuple-like Behavior

Create immutable records:

class Point:
    """Immutable 2D point"""

    __slots__ = ('_x', '_y')  # Memory optimization

    def __init__(self, x, y):
        # Use object.__setattr__ to bypass immutability in __init__
        object.__setattr__(self, '_x', x)
        object.__setattr__(self, '_y', y)

    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    def __setattr__(self, name, value):
        """Prevent attribute modification"""
        raise AttributeError("Point is immutable")

    def __delattr__(self, name):
        """Prevent attribute deletion"""
        raise AttributeError("Point is immutable")

    def __hash__(self):
        """Immutable objects should be hashable"""
        return hash((self._x, self._y))

    def __eq__(self, other):
        if isinstance(other, Point):
            return self._x == other._x and self._y == other._y
        return NotImplemented

    def __repr__(self):
        return f"Point({self._x}, {self._y})"

    def __iter__(self):
        """Act like a tuple"""
        return iter((self._x, self._y))

    def __getitem__(self, index):
        """Support indexing like a tuple"""
        return (self._x, self._y)[index]

# Usage:
p = Point(3, 4)
x, y = p                # Unpacking works!
print(p[0], p[1])       # Indexing works!
# p.x = 5               # Error: Point is immutable

# Can use as dict keys:
distances = {Point(0, 0): 0, Point(1, 1): 1.41}

Common Pitfalls & Best Practices

Pitfall 1: Implementing __eq__ Without __hash__

Problem:

class BadPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y
    # Missing __hash__!

p1 = BadPoint(1, 2)
p2 = BadPoint(1, 2)

print(p1 == p2)  # True

# But this fails:
# points = {p1, p2}  # TypeError: unhashable type: 'BadPoint'

Solution: Always implement __hash__ when you implement __eq__:

class GoodPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        if isinstance(other, GoodPoint):
            return self.x == other.x and self.y == other.y
        return NotImplemented

    def __hash__(self):
        return hash((self.x, self.y))

Rule: Objects that compare equal must have the same hash value.

Pitfall 2: Confusing __repr__ and __str__

Problem:

class BadBook:
    def __init__(self, title):
        self.title = title

    def __str__(self):
        return self.title  # User-friendly but not debuggable

    # No __repr__!

book = BadBook("1984")
print(book)          # "1984" - looks nice
print([book])        # [<__main__.BadBook object at 0x...>] - ugly!

Solution: Always implement __repr__, optionally add __str__:

class GoodBook:
    def __init__(self, title):
        self.title = title

    def __repr__(self):
        return f"Book('{self.title}')"  # Unambiguous

    def __str__(self):
        return self.title  # User-friendly

Best practice: - __repr__ should be unambiguous: Book('1984') - __str__ should be readable: 1984 - When in doubt, just implement __repr__

Pitfall 3: Returning Wrong Type from Operators

Problem:

class BadVector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        return (self.x + other.x, self.y + other.y)  # Returns tuple!

v1 = BadVector(1, 2)
v2 = BadVector(3, 4)
v3 = v1 + v2
print(type(v3))  # <class 'tuple'> - Not a BadVector!
# v4 = v3 + v1   # Error: tuples don't have x, y attributes

Solution: Always return the same type:

class GoodVector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        return GoodVector(self.x + other.x, self.y + other.y)

Pitfall 4: Not Returning NotImplemented for Unsupported Operations

Problem:

class BadVector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        # Crashes if other doesn't have x, y:
        return BadVector(self.x + other.x, self.y + other.y)

v = BadVector(1, 2)
# result = v + 5  # AttributeError: 'int' object has no attribute 'x'

Solution: Return NotImplemented for unsupported types:

class GoodVector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        if isinstance(other, GoodVector):
            return GoodVector(self.x + other.x, self.y + other.y)
        return NotImplemented  # Lets Python try other.__radd__(self)

v = GoodVector(1, 2)
result = v + 5  # TypeError: unsupported operand type(s) - better error!

Why NotImplemented? It tells Python "I don't know how to do this, try something else" (like reverse operators or raising a proper TypeError).

Pitfall 5: Mutable Default Arguments in __init__

Problem:

class BadList:
    def __init__(self, items=[]):  # Mutable default!
        self.items = items

list1 = BadList()
list2 = BadList()
list1.items.append("A")
print(list2.items)  # ["A"] - WTF?! They share the same list!

Solution: Use None as default:

class GoodList:
    def __init__(self, items=None):
        self.items = items if items is not None else []

Pitfall 6: Forgetting to Return self in In-Place Operators

Problem:

class BadVector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __iadd__(self, other):
        self.x += other.x
        self.y += other.y
        # Forgot to return self!

v = BadVector(1, 2)
v += BadVector(3, 4)
print(v)  # None - assignment returned None!

Solution: Always return self:

class GoodVector:
    def __iadd__(self, other):
        self.x += other.x
        self.y += other.y
        return self  # Critical!

Pitfall 7: Infinite Recursion in __getattribute__

Problem:

class Bad:
    def __getattribute__(self, name):
        # Accessing self.name triggers __getattribute__ again!
        return self.name  # Infinite recursion!

Solution: Use super() or object.__getattribute__:

class Good:
    def __getattribute__(self, name):
        # Use super() to access attributes safely:
        return super().__getattribute__(name)

Pitfall 8: Modifying Objects During Iteration

Problem:

class BadCollection:
    def __init__(self):
        self.items = [1, 2, 3, 4, 5]

    def __iter__(self):
        return iter(self.items)

collection = BadCollection()
for item in collection:
    if item % 2 == 0:
        collection.items.remove(item)  # Modifying during iteration!
# May skip elements or raise errors

Solution: Iterate over a copy or collect items to remove:

class GoodCollection:
    def __init__(self):
        self.items = [1, 2, 3, 4, 5]

    def remove_evens(self):
        # Iterate over a copy:
        for item in list(self.items):
            if item % 2 == 0:
                self.items.remove(item)

Performance Considerations

1. Special Method Overhead

Special methods have a cost. Python calls them frequently, so implementation matters:

Example: Comparison operators

import timeit

class SlowPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        # Slow: creates intermediate objects
        return (self.x, self.y) == (other.x, other.y)

class FastPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        # Fast: direct comparison
        return self.x == other.x and self.y == other.y

# FastPoint is ~30% faster in equality checks

Tip: Keep special methods simple and fast. They're called often.

2. __slots__ for Memory Optimization

By default, Python stores attributes in a __dict__, which uses more memory. Use __slots__ for memory-critical classes:

class WithoutSlots:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class WithSlots:
    __slots__ = ('x', 'y')

    def __init__(self, x, y):
        self.x = x
        self.y = y

# Memory comparison:
# WithoutSlots: ~240 bytes per instance
# WithSlots: ~64 bytes per instance (70% reduction!)

Tradeoffs: - ✅ Significant memory savings (especially with many instances) - ✅ Slightly faster attribute access - ❌ Can't add new attributes dynamically - ❌ Can't use weak references (unless '__weakref__' is in slots)

When to use: Classes with many instances (thousands+) and fixed attributes.

3. __getattribute__ Performance Impact

__getattribute__ is called for every attribute access. This can be slow:

class LoggedAccess:
    def __getattribute__(self, name):
        print(f"Accessing {name}")  # I/O on every attribute access!
        return super().__getattribute__(name)

obj = LoggedAccess()
# This is very slow if you access attributes frequently:
for _ in range(1000):
    x = obj.some_attribute  # Prints and logs 1000 times

Tip: Avoid __getattribute__ unless absolutely necessary. Use __getattr__ instead (only called when attribute not found).

4. Generator-Based __iter__

For large sequences, use generators to save memory:

class HugeRange:
    """Memory-efficient range using generator"""

    def __init__(self, start, end):
        self.start = start
        self.end = end

    def __iter__(self):
        # Generator - doesn't create list in memory
        current = self.start
        while current < self.end:
            yield current
            current += 1

    def __len__(self):
        return self.end - self.start

# Memory efficient even with millions of numbers:
huge = HugeRange(0, 1_000_000)
for num in huge:
    if num > 10:
        break  # Only generates what we need

5. Caching Hash Values

If __hash__ is expensive, cache the result:

class ExpensiveHash:
    def __init__(self, data):
        self.data = data
        self._hash = None  # Cache

    def __hash__(self):
        if self._hash is None:
            # Expensive computation only happens once:
            self._hash = hash(tuple(self.data))
        return self._hash

    def __eq__(self, other):
        return self.data == other.data

Important: Only cache if objects are immutable!

6. When NOT to Use Operator Overloading

Operator overloading can hurt performance if used carelessly:

class MatrixBad:
    def __add__(self, other):
        # Creates a new matrix every time
        result = Matrix(self.rows, self.cols)
        for i in range(self.rows):
            for j in range(self.cols):
                result[i][j] = self[i][j] + other[i][j]
        return result

# This creates many intermediate objects:
result = m1 + m2 + m3 + m4  # 3 new matrices created!

Better: Use explicit methods for complex operations:

class MatrixGood:
    def add_inplace(self, other):
        # Modifies in place - no allocation
        for i in range(self.rows):
            for j in range(self.cols):
                self[i][j] += other[i][j]
        return self

# More efficient:
m1.add_inplace(m2).add_inplace(m3).add_inplace(m4)

Rule of thumb: Use operators for simple, expected operations. Use explicit methods for complex operations or when performance matters.

Quick Reference: When to Use What

Pattern Use When Avoid When
__repr__ Always implement Never skip this
__str__ User-facing output differs from repr repr is sufficient
__eq__ and __hash__ Objects need equality comparison Object is mutable
Arithmetic operators Natural mathematical meaning Confusing or unexpected
__getitem__ Object is logically a sequence/mapping Object isn't a container
__iter__ Object contains items to iterate Object is single-valued
__call__ Object acts as function or callback Standard methods are clearer
__enter__/__exit__ Resource management needed Simple function is enough
__slots__ Many instances, fixed attributes Need dynamic attributes
__getattribute__ Must intercept all attribute access __getattr__ would work

Resources & Further Reading