Design Philosophy
This document explains the design decisions, principles, and rationale behind HETorch's architecture. Understanding these choices will help you work effectively with HETorch and make informed decisions when extending or customizing the framework.
Table of Contents
- Overview
- Core Principles
- Key Design Decisions
- Architectural Trade-offs
- Design Patterns
- Evolution and Lessons Learned
- Future Directions
- See Also
Overview
HETorch is designed to bridge the gap between PyTorch's familiar tensor-based programming model and the specialized requirements of homomorphic encryption (HE). The framework's design reflects several years of experience with HE compilation, balancing theoretical elegance with practical usability.
Design Goals
- Accessibility: Make HE compilation accessible to ML practitioners without deep cryptography expertise
- Flexibility: Support diverse HE schemes, backends, and optimization strategies
- Performance: Enable efficient compilation and execution of encrypted computations
- Maintainability: Create a codebase that's easy to understand, extend, and debug
- Correctness: Ensure transformations preserve semantics and produce correct results
Non-Goals
- Universal HE support: Not trying to support every possible HE scheme or operation
- Production deployment: Focus on compilation, not key management or deployment infrastructure
- Automatic optimization: Not attempting to automatically find optimal parameters
- Backend implementation: Not implementing HE backends (use existing libraries like SEAL)
Core Principles
1. Tensor-Centric Approach
Principle: Treat encrypted data as tensors, preserving PyTorch's programming model.
Rationale:
- ML practitioners think in terms of tensors and operations
- PyTorch's tensor abstraction is powerful and well-understood
- Enables reuse of existing PyTorch models with minimal modification
- Simplifies reasoning about encrypted computation
Implementation:
# Users write standard PyTorch
class Model(nn.Module):
def forward(self, x):
return self.linear(x) + self.bias
# HETorch compiles to HE operations
compiled = compiler.compile(model, example_input)
Benefits:
- Low learning curve for PyTorch users
- Familiar debugging and development workflow
- Easy integration with existing PyTorch ecosystem
- Natural expression of ML computations
Challenges:
- Some HE operations don't map cleanly to tensor operations
- Ciphertext metadata (level, scale) doesn't fit tensor model
- Batching semantics differ between plaintext and encrypted computation
Resolution:
- Use metadata attached to graph nodes for HE-specific information
- Provide HE-aware operations (rescale, relinearize, bootstrap) as functions
- Document differences between plaintext and encrypted execution
2. Backend-Agnostic Design
Principle: Separate compilation logic from backend-specific implementation details.
Rationale:
- HE libraries (SEAL, HElib, PALISADE) have different APIs and capabilities
- Users may want to switch backends without rewriting compilation logic
- Testing and development easier with lightweight fake backends
- Future HE libraries can be integrated without major refactoring
Implementation:
# Backend interface
class HEBackend(ABC):
@abstractmethod
def encrypt(self, plaintext: Tensor) -> Ciphertext: ...
@abstractmethod
def cadd(self, ct1: Ciphertext, ct2: Ciphertext) -> Ciphertext: ...
# ... other operations
# Multiple implementations
backend = SEALBackend() # Real HE
backend = FakeBackend() # Fast simulation
backend = CustomBackend() # User-defined
Benefits:
- Flexibility to use different HE libraries
- Easy testing with fake backends
- Clear separation of concerns
- Extensibility for new backends
Challenges:
- Lowest common denominator API may limit backend-specific optimizations
- Abstraction overhead in performance-critical paths
- Different backends have different capabilities and constraints
Resolution:
- Define core operations all backends must support
- Allow backend-specific extensions via metadata
- Provide backend capability queries
- Document backend-specific behavior
3. Modularity and Extensibility
Principle: Design the system as composable, independent modules that can be extended without modifying core code.
Rationale:
- Research in HE optimization is rapidly evolving
- Users have domain-specific optimization needs
- Core framework should be stable while allowing experimentation
- Open-source contributors need clear extension points
Implementation:
# Pass-based architecture
class TransformationPass(ABC):
@abstractmethod
def transform(self, graph, context): ...
# Users can add custom passes
class MyCustomPass(TransformationPass):
def transform(self, graph, context):
# Custom optimization logic
return transformed_graph
# Compose passes into pipelines
pipeline = PassPipeline([
InputPackingPass(),
MyCustomPass(), # User-defined
RescalingInsertionPass(),
])
Benefits:
- Easy to add new optimizations without touching core code
- Clear boundaries between modules
- Testable in isolation
- Encourages community contributions
Challenges:
- Pass ordering dependencies can be complex
- Inter-pass communication requires careful design
- Performance overhead from modularity
- Debugging across multiple passes can be difficult
Resolution:
- Explicit pass dependencies (requires/provides)
- Shared compilation context for inter-pass communication
- Pass validation to catch errors early
- Debugging tools (PrintGraphPass, verbose mode)
4. Performance-Aware Compilation
Principle: Make performance characteristics visible and controllable throughout the compilation process.
Rationale:
- HE operations have vastly different costs (addition vs multiplication vs bootstrapping)
- Users need to understand and control performance trade-offs
- Automatic optimization is difficult without domain knowledge
- Transparency helps users make informed decisions
Implementation:
# Cost analysis
cost = pass.analyze_cost(graph, context)
print(f"Operations: {cost.total_operations}")
print(f"Estimated latency: {cost.estimated_latency}")
print(f"Noise budget consumed: {cost.noise_consumed}")
# Explicit control
pipeline = PassPipeline([
NonlinearToPolynomialPass(degree=8), # User controls accuracy
RescalingInsertionPass(strategy="lazy"), # User controls strategy
BootstrappingInsertionPass(threshold=20.0), # User controls threshold
])
Benefits:
- Users understand performance implications
- Explicit control over trade-offs
- Easier to debug performance issues
- Enables informed optimization decisions
Challenges:
- Requires users to understand HE performance characteristics
- Cost models may not match actual backend performance
- More configuration options increase complexity
Resolution:
- Provide sensible defaults for common cases
- Document performance characteristics clearly
- Offer cost analysis tools
- Provide performance tuning guides
Key Design Decisions
Decision 1: Why torch.fx?
Decision: Use PyTorch's torch.fx for graph representation and manipulation.
Alternatives Considered:
- Custom IR: Build a custom intermediate representation
- TorchScript: Use PyTorch's TorchScript
- ONNX: Use ONNX as intermediate format
- Direct AST manipulation: Manipulate Python AST directly
Why torch.fx:
Pros:
- Native PyTorch integration (no conversion overhead)
- Symbolic tracing captures computation graph automatically
- Graph manipulation API is powerful and well-documented
- Active development and community support
- Preserves Python semantics (unlike TorchScript)
- Easy debugging (graphs are readable Python code)
Cons:
- Relatively new (introduced PyTorch 1.8)
- Some limitations on traceable code (dynamic control flow)
- Learning curve for graph manipulation
- Less mature than TorchScript
Rationale:
# torch.fx makes graph manipulation straightforward
import torch.fx as fx
traced = fx.symbolic_trace(model)
graph = traced.graph
# Easy to inspect
print(graph) # Readable Python-like code
# Easy to modify
for node in graph.nodes:
if node.target == torch.relu:
# Replace ReLU with polynomial approximation
node.target = polynomial_relu
traced.recompile()
The decision to use torch.fx was driven by:
- Native integration: No conversion between PyTorch and custom IR
- Ease of use: Graph manipulation is intuitive
- Debugging: Graphs are human-readable
- Future-proof: Active PyTorch development
Trade-offs Accepted:
- Limited to traceable PyTorch code (no dynamic control flow)
- Tied to PyTorch's release cycle
- Some overhead from graph representation
Decision 2: Why Pass-Based Architecture?
Decision: Structure compilation as a pipeline of transformation passes.
Alternatives Considered:
- Monolithic compiler: Single compilation function
- Rule-based rewriting: Pattern matching and rewriting rules
- Dataflow analysis: Traditional compiler dataflow framework
- ML-based optimization: Learn optimal transformations
Why pass-based:
Pros:
- Modularity: Each pass has single responsibility
- Composability: Passes can be combined in different orders
- Extensibility: Easy to add new passes
- Testability: Passes can be tested in isolation
- Understandability: Each pass is simple and focused
Cons:
- Pass ordering dependencies can be complex
- Multiple graph traversals (performance overhead)
- Inter-pass optimization opportunities may be missed
- Requires careful design of pass interfaces
Rationale:
# Each pass does one thing well
class NonlinearToPolynomialPass(TransformationPass):
def transform(self, graph, context):
# Only handles nonlinear → polynomial
return graph_with_polynomials
class RescalingInsertionPass(TransformationPass):
def transform(self, graph, context):
# Only handles rescaling insertion
return graph_with_rescaling
# Compose into pipeline
pipeline = PassPipeline([
NonlinearToPolynomialPass(),
RescalingInsertionPass(),
DeadCodeEliminationPass(),
])
The pass-based architecture enables:
- Incremental development: Add passes one at a time
- Experimentation: Try different pass combinations
- Maintenance: Fix bugs in isolated passes
- Community contributions: Clear extension points
Trade-offs Accepted:
- Some performance overhead from multiple traversals
- Pass ordering requires careful thought
- Global optimizations are harder to implement
Decision 3: Why Metadata-Driven?
Decision: Store HE-specific information (level, scale, noise) as metadata on graph nodes rather than in the type system.
Alternatives Considered:
- Type system: Encode HE properties in tensor types
- Separate data structures: Maintain parallel data structures for HE info
- Wrapper classes: Wrap tensors in HE-aware classes
- Implicit tracking: Infer properties from operations
Why metadata:
Pros:
- Flexible: Can attach arbitrary information to nodes
- Non-invasive: Doesn't change PyTorch's type system
- Easy to add/remove: Metadata can be added by passes as needed
- Debugging-friendly: Can inspect metadata without affecting execution
Cons:
- Not type-safe: Metadata access isn't checked at compile time
- Can be lost: Metadata must be explicitly preserved during transformations
- Less discoverable: Metadata keys aren't documented in types
- Runtime overhead: Dictionary lookups for metadata access
Rationale:
# Metadata attached to graph nodes
node.meta['ciphertext_info'] = CiphertextInfo(
level=2,
scale=2**40,
noise_budget=85.0,
)
# Passes can read and update metadata
if 'ciphertext_info' in node.meta:
info = node.meta['ciphertext_info']
if info.noise_budget < 20.0:
# Insert bootstrapping
pass
The metadata approach provides:
- Flexibility: Can add new properties without changing core types
- Compatibility: Works with existing PyTorch infrastructure
- Simplicity: No complex type system extensions needed
- Extensibility: Users can add custom metadata
Trade-offs Accepted:
- Metadata must be explicitly preserved during transformations
- No compile-time type checking for metadata
- Potential for metadata inconsistencies
Decision 4: Why Fake Backend First?
Decision: Implement a fast simulation backend before real HE backends.
Alternatives Considered:
- Real backend first: Start with SEAL or HElib integration
- Mock backend: Simple mocks for testing only
- No backend: Just graph transformations
- Multiple backends simultaneously: Implement several backends in parallel
Why fake backend first:
Pros:
- Fast development: No HE library dependencies during development
- Easy testing: Tests run quickly without actual encryption
- Clear interface: Defines backend API before implementation
- Debugging: Easier to debug without encryption overhead
- Noise simulation: Can simulate noise without real HE
Cons:
- Not production-ready: Can't use for real encrypted computation
- May not match real backend behavior: Simulation isn't perfect
- Extra work: Need to implement both fake and real backends
- False confidence: Tests may pass with fake backend but fail with real HE
Rationale:
# FakeBackend for development and testing
class FakeBackend(HEBackend):
def encrypt(self, plaintext):
# Just wrap tensor, no actual encryption
return FakeCiphertext(plaintext)
def cadd(self, ct1, ct2):
# Plain addition, track noise
result = ct1.data + ct2.data
noise = ct1.noise + ct2.noise + ADD_NOISE
return FakeCiphertext(result, noise)
# Fast tests
def test_compilation():
backend = FakeBackend() # Instant, no HE overhead
compiled = compiler.compile(model, backend)
assert compiled(input) == expected
The fake backend enables:
- Rapid iteration: Develop and test without HE overhead
- Clear contracts: Define backend interface before implementation
- Noise simulation: Test noise management without real HE
- Debugging: Easier to debug compilation issues
Trade-offs Accepted:
- Need to implement real backends eventually
- Fake backend behavior may diverge from real HE
- Tests with fake backend don't guarantee real HE correctness
Architectural Trade-offs
Trade-off 1: Flexibility vs Simplicity
Tension: Making the framework flexible enough for research vs keeping it simple for users.
Flexibility side:
- Support multiple HE schemes (CKKS, BFV, BGV)
- Allow custom passes and backends
- Configurable optimization strategies
- Extensible metadata system
Simplicity side:
- Single recommended workflow
- Sensible defaults
- Minimal configuration required
- Clear documentation
Resolution:
- Layered API: Simple high-level API, powerful low-level API
- Defaults: Provide good defaults for common cases
- Progressive disclosure: Advanced features available but not required
- Examples: Show simple cases first, advanced cases later
# Simple API (most users)
compiled = hetorch.compile(model, example_input)
# Advanced API (power users)
pipeline = PassPipeline([
CustomPass(param1=value1),
# ... detailed configuration
])
compiler = HETorchCompiler(context, pipeline)
compiled = compiler.compile(model, example_input)
Trade-off 2: Performance vs Generality
Tension: Optimizing for specific cases vs supporting general computations.
Performance side:
- Specialize for neural networks
- Assume CKKS scheme
- Optimize for common patterns
- Backend-specific optimizations
Generality side:
- Support arbitrary PyTorch models
- Support multiple HE schemes
- Work with any backend
- No assumptions about computation patterns
Resolution:
- Core generality: Core framework supports general computations
- Specialized passes: Optional passes for specific patterns (e.g., neural networks)
- Backend capabilities: Backends can expose capabilities for optimization
- User choice: Users choose between general and specialized pipelines
# General pipeline
general_pipeline = PassPipeline([
InputPackingPass(),
NonlinearToPolynomialPass(),
RescalingInsertionPass(),
])
# Specialized for neural networks
nn_pipeline = PassPipeline([
InputPackingPass(),
BatchNormFoldingPass(), # NN-specific
NonlinearToPolynomialPass(),
LinearLayerBSGSPass(), # NN-specific optimization
RescalingInsertionPass(),
])
Trade-off 3: Abstraction vs Control
Tension: Hiding HE complexity vs giving users control over HE operations.
Abstraction side:
- Hide ciphertext management
- Automatic rescaling and relinearization
- Automatic bootstrapping insertion
- Abstract away HE schemes
Control side:
- Explicit ciphertext operations
- Manual rescaling and relinearization
- User-controlled bootstrapping
- Expose HE scheme details
Resolution:
- Default abstraction: Automatic management by default
- Opt-in control: Users can take control when needed
- Visibility: Make automatic decisions visible (logging, cost analysis)
- Override mechanisms: Allow users to override automatic decisions
# Automatic (default)
pipeline = PassPipeline([
RescalingInsertionPass(strategy="auto"), # Automatic
BootstrappingInsertionPass(strategy="auto"), # Automatic
])
# Manual control
pipeline = PassPipeline([
RescalingInsertionPass(strategy="manual", locations=[...]),
BootstrappingInsertionPass(strategy="manual", locations=[...]),
])
Design Patterns
Pattern 1: Context Object
Pattern: Pass a shared CompilationContext through all passes.
Purpose: Share configuration and state across passes without global variables.
Implementation:
@dataclass
class CompilationContext:
scheme: HEScheme
params: CKKSParameters
backend: HEBackend
metadata: Dict[str, Any] # Shared state
# All passes receive context
class TransformationPass(ABC):
def transform(self, graph: fx.GraphModule, context: CompilationContext):
# Access shared configuration
if context.scheme == HEScheme.CKKS:
# CKKS-specific logic
pass
Benefits:
- No global state
- Easy to test (inject test context)
- Clear dependencies
- Thread-safe
Pattern 2: Visitor Pattern for Graph Traversal
Pattern: Use visitor pattern for graph node processing.
Purpose: Separate graph traversal from node processing logic.
Implementation:
class GraphVisitor:
def visit(self, node: fx.Node):
method_name = f'visit_{node.op}'
visitor = getattr(self, method_name, self.generic_visit)
return visitor(node)
def visit_call_function(self, node: fx.Node):
# Handle function calls
pass
def visit_call_module(self, node: fx.Node):
# Handle module calls
pass
Benefits:
- Clean separation of concerns
- Easy to extend with new node types
- Reusable traversal logic
Pattern 3: Builder Pattern for Pipelines
Pattern: Use builder pattern for constructing pass pipelines.
Purpose: Make pipeline construction flexible and readable.
Implementation:
class PipelineBuilder:
def __init__(self):
self.passes = []
def add_packing(self, strategy="row_major"):
self.passes.append(InputPackingPass(strategy))
return self
def add_polynomial_approximation(self, degree=8):
self.passes.append(NonlinearToPolynomialPass(degree))
return self
def build(self):
return PassPipeline(self.passes)
# Fluent API
pipeline = (PipelineBuilder()
.add_packing()
.add_polynomial_approximation(degree=8)
.add_rescaling()
.build())
Benefits:
- Readable pipeline construction
- Validation during building
- Reusable pipeline templates
Evolution and Lessons Learned
Lesson 1: Start with Fake Backend
Initial approach: Tried to integrate SEAL immediately.
Problem: Development was slow, debugging was difficult, tests took forever.
Solution: Built FakeBackend first for rapid development.
Lesson: Simulation backends are invaluable for development and testing.
Lesson 2: Metadata is Better Than Types
Initial approach: Tried to encode HE properties in custom tensor types.
Problem: Type system became complex, didn't integrate well with PyTorch.
Solution: Use metadata attached to graph nodes.
Lesson: Metadata provides flexibility without fighting the framework.
Lesson 3: Passes Should Be Small
Initial approach: Large passes that did multiple transformations.
Problem: Hard to test, debug, and maintain. Difficult to reuse.
Solution: Break into small, focused passes.
Lesson: Single-responsibility principle applies to compiler passes.
Lesson 4: Make Performance Visible
Initial approach: Automatic optimization without user visibility.
Problem: Users couldn't understand or control performance.
Solution: Add cost analysis and explicit configuration.
Lesson: Transparency is more valuable than magic in HE compilation.
Future Directions
Direction 1: Advanced Optimizations
Goal: Implement more sophisticated optimization passes.
Examples:
- Loop fusion and tiling
- Operator fusion (beyond consecutive additions)
- Memory layout optimization
- Backend-specific optimizations
Challenges:
- Complexity of analysis
- Maintaining correctness
- Balancing generality and specialization
Direction 2: Automatic Parameter Selection
Goal: Help users choose CKKS parameters automatically.
Approach:
- Analyze model to determine required depth
- Suggest parameters based on security requirements
- Provide parameter search tools
Challenges:
- Parameter selection is complex
- Trade-offs between security, performance, and accuracy
- Different backends have different constraints
Direction 3: Production Deployment Support
Goal: Add features for production deployment.
Examples:
- Key management integration
- Serialization and deserialization
- Client-server protocols
- Monitoring and logging
Challenges:
- Out of scope for core compilation
- Many deployment-specific requirements
- Security considerations
See Also
- Architecture - Detailed architecture documentation
- Custom Passes - Guide to writing custom passes
- Custom Backends - Guide to implementing backends
- Basic Concepts - Introduction to HETorch concepts
- Phase Summaries - Development history and evolution
- Future Work - Roadmap and planned features