Design Philosophy

This document explains the design decisions, principles, and rationale behind HETorch's architecture. Understanding these choices will help you work effectively with HETorch and make informed decisions when extending or customizing the framework.

Overview
Core Principles
Key Design Decisions
Architectural Trade-offs
Design Patterns
Evolution and Lessons Learned
Future Directions
See Also

Overview

HETorch is designed to bridge the gap between PyTorch's familiar tensor-based programming model and the specialized requirements of homomorphic encryption (HE). The framework's design reflects several years of experience with HE compilation, balancing theoretical elegance with practical usability.

Design Goals

Accessibility: Make HE compilation accessible to ML practitioners without deep cryptography expertise
Flexibility: Support diverse HE schemes, backends, and optimization strategies
Performance: Enable efficient compilation and execution of encrypted computations
Maintainability: Create a codebase that's easy to understand, extend, and debug
Correctness: Ensure transformations preserve semantics and produce correct results

Non-Goals

Universal HE support: Not trying to support every possible HE scheme or operation
Production deployment: Focus on compilation, not key management or deployment infrastructure
Automatic optimization: Not attempting to automatically find optimal parameters
Backend implementation: Not implementing HE backends (use existing libraries like SEAL)

Core Principles

1. Tensor-Centric Approach

Principle: Treat encrypted data as tensors, preserving PyTorch's programming model.

Rationale:

ML practitioners think in terms of tensors and operations
PyTorch's tensor abstraction is powerful and well-understood
Enables reuse of existing PyTorch models with minimal modification
Simplifies reasoning about encrypted computation

Implementation:

# Users write standard PyTorch
class Model(nn.Module):
    def forward(self, x):
        return self.linear(x) + self.bias

# HETorch compiles to HE operations
compiled = compiler.compile(model, example_input)

Benefits:

Low learning curve for PyTorch users
Familiar debugging and development workflow
Easy integration with existing PyTorch ecosystem
Natural expression of ML computations

Challenges:

Some HE operations don't map cleanly to tensor operations
Ciphertext metadata (level, scale) doesn't fit tensor model
Batching semantics differ between plaintext and encrypted computation

Resolution:

Use metadata attached to graph nodes for HE-specific information
Provide HE-aware operations (rescale, relinearize, bootstrap) as functions
Document differences between plaintext and encrypted execution

2. Backend-Agnostic Design

Principle: Separate compilation logic from backend-specific implementation details.

Rationale:

HE libraries (SEAL, HElib, PALISADE) have different APIs and capabilities
Users may want to switch backends without rewriting compilation logic
Testing and development easier with lightweight fake backends
Future HE libraries can be integrated without major refactoring

Implementation:

# Backend interface
class HEBackend(ABC):
    @abstractmethod
    def encrypt(self, plaintext: Tensor) -> Ciphertext: ...

    @abstractmethod
    def cadd(self, ct1: Ciphertext, ct2: Ciphertext) -> Ciphertext: ...

    # ... other operations

# Multiple implementations
backend = SEALBackend()  # Real HE
backend = FakeBackend()  # Fast simulation
backend = CustomBackend()  # User-defined

Benefits:

Flexibility to use different HE libraries
Easy testing with fake backends
Clear separation of concerns
Extensibility for new backends

Challenges:

Lowest common denominator API may limit backend-specific optimizations
Abstraction overhead in performance-critical paths
Different backends have different capabilities and constraints

Resolution:

Define core operations all backends must support
Allow backend-specific extensions via metadata
Provide backend capability queries
Document backend-specific behavior

3. Modularity and Extensibility

Principle: Design the system as composable, independent modules that can be extended without modifying core code.

Rationale:

Research in HE optimization is rapidly evolving
Users have domain-specific optimization needs
Core framework should be stable while allowing experimentation
Open-source contributors need clear extension points

Implementation:

# Pass-based architecture
class TransformationPass(ABC):
    @abstractmethod
    def transform(self, graph, context): ...

# Users can add custom passes
class MyCustomPass(TransformationPass):
    def transform(self, graph, context):
        # Custom optimization logic
        return transformed_graph

# Compose passes into pipelines
pipeline = PassPipeline([
    InputPackingPass(),
    MyCustomPass(),  # User-defined
    RescalingInsertionPass(),
])

Benefits:

Easy to add new optimizations without touching core code
Clear boundaries between modules
Testable in isolation
Encourages community contributions

Challenges:

Pass ordering dependencies can be complex
Inter-pass communication requires careful design
Performance overhead from modularity
Debugging across multiple passes can be difficult

Resolution:

Explicit pass dependencies (requires/provides)
Shared compilation context for inter-pass communication
Pass validation to catch errors early
Debugging tools (PrintGraphPass, verbose mode)

4. Performance-Aware Compilation

Principle: Make performance characteristics visible and controllable throughout the compilation process.

Rationale:

HE operations have vastly different costs (addition vs multiplication vs bootstrapping)
Users need to understand and control performance trade-offs
Automatic optimization is difficult without domain knowledge
Transparency helps users make informed decisions

Implementation:

# Cost analysis
cost = pass.analyze_cost(graph, context)
print(f"Operations: {cost.total_operations}")
print(f"Estimated latency: {cost.estimated_latency}")
print(f"Noise budget consumed: {cost.noise_consumed}")

# Explicit control
pipeline = PassPipeline([
    NonlinearToPolynomialPass(degree=8),  # User controls accuracy
    RescalingInsertionPass(strategy="lazy"),  # User controls strategy
    BootstrappingInsertionPass(threshold=20.0),  # User controls threshold
])

Benefits:

Users understand performance implications
Explicit control over trade-offs
Easier to debug performance issues
Enables informed optimization decisions

Challenges:

Requires users to understand HE performance characteristics
Cost models may not match actual backend performance
More configuration options increase complexity

Resolution:

Provide sensible defaults for common cases
Document performance characteristics clearly
Offer cost analysis tools
Provide performance tuning guides

Key Design Decisions

Decision 1: Why torch.fx?

Decision: Use PyTorch's torch.fx for graph representation and manipulation.

Alternatives Considered:

Custom IR: Build a custom intermediate representation
TorchScript: Use PyTorch's TorchScript
ONNX: Use ONNX as intermediate format
Direct AST manipulation: Manipulate Python AST directly

Why torch.fx:

Pros:

Native PyTorch integration (no conversion overhead)
Symbolic tracing captures computation graph automatically
Graph manipulation API is powerful and well-documented
Active development and community support
Preserves Python semantics (unlike TorchScript)
Easy debugging (graphs are readable Python code)

Cons:

Relatively new (introduced PyTorch 1.8)
Some limitations on traceable code (dynamic control flow)
Learning curve for graph manipulation
Less mature than TorchScript

Rationale:

# torch.fx makes graph manipulation straightforward
import torch.fx as fx

traced = fx.symbolic_trace(model)
graph = traced.graph

# Easy to inspect
print(graph)  # Readable Python-like code

# Easy to modify
for node in graph.nodes:
    if node.target == torch.relu:
        # Replace ReLU with polynomial approximation
        node.target = polynomial_relu

traced.recompile()

The decision to use torch.fx was driven by:

Native integration: No conversion between PyTorch and custom IR
Ease of use: Graph manipulation is intuitive
Debugging: Graphs are human-readable
Future-proof: Active PyTorch development

Trade-offs Accepted:

Limited to traceable PyTorch code (no dynamic control flow)
Tied to PyTorch's release cycle
Some overhead from graph representation

Decision 2: Why Pass-Based Architecture?

Decision: Structure compilation as a pipeline of transformation passes.

Alternatives Considered:

Monolithic compiler: Single compilation function
Rule-based rewriting: Pattern matching and rewriting rules
Dataflow analysis: Traditional compiler dataflow framework
ML-based optimization: Learn optimal transformations

Why pass-based:

Pros:

Modularity: Each pass has single responsibility
Composability: Passes can be combined in different orders
Extensibility: Easy to add new passes
Testability: Passes can be tested in isolation
Understandability: Each pass is simple and focused

Cons:

Pass ordering dependencies can be complex
Multiple graph traversals (performance overhead)
Inter-pass optimization opportunities may be missed
Requires careful design of pass interfaces

Rationale:

# Each pass does one thing well
class NonlinearToPolynomialPass(TransformationPass):
    def transform(self, graph, context):
        # Only handles nonlinear → polynomial
        return graph_with_polynomials

class RescalingInsertionPass(TransformationPass):
    def transform(self, graph, context):
        # Only handles rescaling insertion
        return graph_with_rescaling

# Compose into pipeline
pipeline = PassPipeline([
    NonlinearToPolynomialPass(),
    RescalingInsertionPass(),
    DeadCodeEliminationPass(),
])

The pass-based architecture enables:

Incremental development: Add passes one at a time
Experimentation: Try different pass combinations
Maintenance: Fix bugs in isolated passes
Community contributions: Clear extension points

Trade-offs Accepted:

Some performance overhead from multiple traversals
Pass ordering requires careful thought
Global optimizations are harder to implement

Decision 3: Why Metadata-Driven?

Decision: Store HE-specific information (level, scale, noise) as metadata on graph nodes rather than in the type system.

Alternatives Considered:

Type system: Encode HE properties in tensor types
Separate data structures: Maintain parallel data structures for HE info
Wrapper classes: Wrap tensors in HE-aware classes
Implicit tracking: Infer properties from operations

Why metadata:

Pros:

Flexible: Can attach arbitrary information to nodes
Non-invasive: Doesn't change PyTorch's type system
Easy to add/remove: Metadata can be added by passes as needed
Debugging-friendly: Can inspect metadata without affecting execution

Cons:

Not type-safe: Metadata access isn't checked at compile time
Can be lost: Metadata must be explicitly preserved during transformations
Less discoverable: Metadata keys aren't documented in types
Runtime overhead: Dictionary lookups for metadata access

Rationale:

# Metadata attached to graph nodes
node.meta['ciphertext_info'] = CiphertextInfo(
    level=2,
    scale=2**40,
    noise_budget=85.0,
)

# Passes can read and update metadata
if 'ciphertext_info' in node.meta:
    info = node.meta['ciphertext_info']
    if info.noise_budget < 20.0:
        # Insert bootstrapping
        pass

The metadata approach provides:

Flexibility: Can add new properties without changing core types
Compatibility: Works with existing PyTorch infrastructure
Simplicity: No complex type system extensions needed
Extensibility: Users can add custom metadata

Trade-offs Accepted:

Metadata must be explicitly preserved during transformations
No compile-time type checking for metadata
Potential for metadata inconsistencies

Decision 4: Why Fake Backend First?

Decision: Implement a fast simulation backend before real HE backends.

Alternatives Considered:

Real backend first: Start with SEAL or HElib integration
Mock backend: Simple mocks for testing only
No backend: Just graph transformations
Multiple backends simultaneously: Implement several backends in parallel

Why fake backend first:

Pros:

Fast development: No HE library dependencies during development
Easy testing: Tests run quickly without actual encryption
Clear interface: Defines backend API before implementation
Debugging: Easier to debug without encryption overhead
Noise simulation: Can simulate noise without real HE

Cons:

Not production-ready: Can't use for real encrypted computation
May not match real backend behavior: Simulation isn't perfect
Extra work: Need to implement both fake and real backends
False confidence: Tests may pass with fake backend but fail with real HE

Rationale:

# FakeBackend for development and testing
class FakeBackend(HEBackend):
    def encrypt(self, plaintext):
        # Just wrap tensor, no actual encryption
        return FakeCiphertext(plaintext)

    def cadd(self, ct1, ct2):
        # Plain addition, track noise
        result = ct1.data + ct2.data
        noise = ct1.noise + ct2.noise + ADD_NOISE
        return FakeCiphertext(result, noise)

# Fast tests
def test_compilation():
    backend = FakeBackend()  # Instant, no HE overhead
    compiled = compiler.compile(model, backend)
    assert compiled(input) == expected

The fake backend enables:

Rapid iteration: Develop and test without HE overhead
Clear contracts: Define backend interface before implementation
Noise simulation: Test noise management without real HE
Debugging: Easier to debug compilation issues

Trade-offs Accepted:

Need to implement real backends eventually
Fake backend behavior may diverge from real HE
Tests with fake backend don't guarantee real HE correctness

Architectural Trade-offs

Trade-off 1: Flexibility vs Simplicity

Tension: Making the framework flexible enough for research vs keeping it simple for users.

Flexibility side:

Support multiple HE schemes (CKKS, BFV, BGV)
Allow custom passes and backends
Configurable optimization strategies
Extensible metadata system

Simplicity side:

Single recommended workflow
Sensible defaults
Minimal configuration required
Clear documentation

Resolution:

Layered API: Simple high-level API, powerful low-level API
Defaults: Provide good defaults for common cases
Progressive disclosure: Advanced features available but not required
Examples: Show simple cases first, advanced cases later

# Simple API (most users)
compiled = hetorch.compile(model, example_input)

# Advanced API (power users)
pipeline = PassPipeline([
    CustomPass(param1=value1),
    # ... detailed configuration
])
compiler = HETorchCompiler(context, pipeline)
compiled = compiler.compile(model, example_input)

Trade-off 2: Performance vs Generality

Tension: Optimizing for specific cases vs supporting general computations.

Performance side:

Specialize for neural networks
Assume CKKS scheme
Optimize for common patterns
Backend-specific optimizations

Generality side:

Support arbitrary PyTorch models
Support multiple HE schemes
Work with any backend
No assumptions about computation patterns

Resolution:

Core generality: Core framework supports general computations
Specialized passes: Optional passes for specific patterns (e.g., neural networks)
Backend capabilities: Backends can expose capabilities for optimization
User choice: Users choose between general and specialized pipelines

# General pipeline
general_pipeline = PassPipeline([
    InputPackingPass(),
    NonlinearToPolynomialPass(),
    RescalingInsertionPass(),
])

# Specialized for neural networks
nn_pipeline = PassPipeline([
    InputPackingPass(),
    BatchNormFoldingPass(),  # NN-specific
    NonlinearToPolynomialPass(),
    LinearLayerBSGSPass(),  # NN-specific optimization
    RescalingInsertionPass(),
])

Trade-off 3: Abstraction vs Control

Tension: Hiding HE complexity vs giving users control over HE operations.

Abstraction side:

Hide ciphertext management
Automatic rescaling and relinearization
Automatic bootstrapping insertion
Abstract away HE schemes

Control side:

Explicit ciphertext operations
Manual rescaling and relinearization
User-controlled bootstrapping
Expose HE scheme details

Resolution:

Default abstraction: Automatic management by default
Opt-in control: Users can take control when needed
Visibility: Make automatic decisions visible (logging, cost analysis)
Override mechanisms: Allow users to override automatic decisions

# Automatic (default)
pipeline = PassPipeline([
    RescalingInsertionPass(strategy="auto"),  # Automatic
    BootstrappingInsertionPass(strategy="auto"),  # Automatic
])

# Manual control
pipeline = PassPipeline([
    RescalingInsertionPass(strategy="manual", locations=[...]),
    BootstrappingInsertionPass(strategy="manual", locations=[...]),
])

Design Patterns

Pattern 1: Context Object

Pattern: Pass a shared CompilationContext through all passes.

Purpose: Share configuration and state across passes without global variables.

Implementation:

@dataclass
class CompilationContext:
    scheme: HEScheme
    params: CKKSParameters
    backend: HEBackend
    metadata: Dict[str, Any]  # Shared state

# All passes receive context
class TransformationPass(ABC):
    def transform(self, graph: fx.GraphModule, context: CompilationContext):
        # Access shared configuration
        if context.scheme == HEScheme.CKKS:
            # CKKS-specific logic
            pass

Benefits:

No global state
Easy to test (inject test context)
Clear dependencies
Thread-safe

Pattern 2: Visitor Pattern for Graph Traversal

Pattern: Use visitor pattern for graph node processing.

Purpose: Separate graph traversal from node processing logic.

Implementation:

class GraphVisitor:
    def visit(self, node: fx.Node):
        method_name = f'visit_{node.op}'
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(node)

    def visit_call_function(self, node: fx.Node):
        # Handle function calls
        pass

    def visit_call_module(self, node: fx.Node):
        # Handle module calls
        pass

Benefits:

Clean separation of concerns
Easy to extend with new node types
Reusable traversal logic

Pattern 3: Builder Pattern for Pipelines

Pattern: Use builder pattern for constructing pass pipelines.

Purpose: Make pipeline construction flexible and readable.

Implementation:

class PipelineBuilder:
    def __init__(self):
        self.passes = []

    def add_packing(self, strategy="row_major"):
        self.passes.append(InputPackingPass(strategy))
        return self

    def add_polynomial_approximation(self, degree=8):
        self.passes.append(NonlinearToPolynomialPass(degree))
        return self

    def build(self):
        return PassPipeline(self.passes)

# Fluent API
pipeline = (PipelineBuilder()
    .add_packing()
    .add_polynomial_approximation(degree=8)
    .add_rescaling()
    .build())

Benefits:

Readable pipeline construction
Validation during building
Reusable pipeline templates

Evolution and Lessons Learned

Lesson 1: Start with Fake Backend

Initial approach: Tried to integrate SEAL immediately.

Problem: Development was slow, debugging was difficult, tests took forever.

Solution: Built FakeBackend first for rapid development.

Lesson: Simulation backends are invaluable for development and testing.

Lesson 2: Metadata is Better Than Types

Initial approach: Tried to encode HE properties in custom tensor types.

Problem: Type system became complex, didn't integrate well with PyTorch.

Solution: Use metadata attached to graph nodes.

Lesson: Metadata provides flexibility without fighting the framework.

Lesson 3: Passes Should Be Small

Initial approach: Large passes that did multiple transformations.

Problem: Hard to test, debug, and maintain. Difficult to reuse.

Solution: Break into small, focused passes.

Lesson: Single-responsibility principle applies to compiler passes.

Lesson 4: Make Performance Visible

Initial approach: Automatic optimization without user visibility.

Problem: Users couldn't understand or control performance.

Solution: Add cost analysis and explicit configuration.

Lesson: Transparency is more valuable than magic in HE compilation.

Future Directions

Direction 1: Advanced Optimizations

Goal: Implement more sophisticated optimization passes.

Examples:

Loop fusion and tiling
Operator fusion (beyond consecutive additions)
Memory layout optimization
Backend-specific optimizations

Challenges:

Complexity of analysis
Maintaining correctness
Balancing generality and specialization

Direction 2: Automatic Parameter Selection

Goal: Help users choose CKKS parameters automatically.

Approach:

Analyze model to determine required depth
Suggest parameters based on security requirements
Provide parameter search tools

Challenges:

Parameter selection is complex
Trade-offs between security, performance, and accuracy
Different backends have different constraints

Direction 3: Production Deployment Support

Goal: Add features for production deployment.

Examples:

Key management integration
Serialization and deserialization
Client-server protocols
Monitoring and logging

Challenges:

Out of scope for core compilation
Many deployment-specific requirements
Security considerations

Table of Contents​

Overview​

Design Goals​

Non-Goals​

Core Principles​

1. Tensor-Centric Approach​

2. Backend-Agnostic Design​

3. Modularity and Extensibility​

4. Performance-Aware Compilation​

Key Design Decisions​

Decision 1: Why torch.fx?​

Decision 2: Why Pass-Based Architecture?​

Decision 3: Why Metadata-Driven?​

Decision 4: Why Fake Backend First?​

Architectural Trade-offs​

Trade-off 1: Flexibility vs Simplicity​

Trade-off 2: Performance vs Generality​

Trade-off 3: Abstraction vs Control​

Design Patterns​

Pattern 1: Context Object​

Pattern 2: Visitor Pattern for Graph Traversal​

Pattern 3: Builder Pattern for Pipelines​

Evolution and Lessons Learned​

Lesson 1: Start with Fake Backend​

Lesson 2: Metadata is Better Than Types​

Lesson 3: Passes Should Be Small​

Lesson 4: Make Performance Visible​

Future Directions​

Direction 1: Advanced Optimizations​

Direction 2: Automatic Parameter Selection​

Direction 3: Production Deployment Support​

See Also​

Table of Contents

Overview

Design Goals

Non-Goals

Core Principles

1. Tensor-Centric Approach

2. Backend-Agnostic Design

3. Modularity and Extensibility

4. Performance-Aware Compilation

Key Design Decisions

Decision 1: Why torch.fx?

Decision 2: Why Pass-Based Architecture?

Decision 3: Why Metadata-Driven?

Decision 4: Why Fake Backend First?

Architectural Trade-offs

Trade-off 1: Flexibility vs Simplicity

Trade-off 2: Performance vs Generality

Trade-off 3: Abstraction vs Control

Design Patterns

Pattern 1: Context Object

Pattern 2: Visitor Pattern for Graph Traversal

Pattern 3: Builder Pattern for Pipelines

Evolution and Lessons Learned

Lesson 1: Start with Fake Backend

Lesson 2: Metadata is Better Than Types

Lesson 3: Passes Should Be Small

Lesson 4: Make Performance Visible

Future Directions

Direction 1: Advanced Optimizations

Direction 2: Automatic Parameter Selection

Direction 3: Production Deployment Support

See Also