Skip to main content

Design Philosophy

This document explains the design decisions, principles, and rationale behind HETorch's architecture. Understanding these choices will help you work effectively with HETorch and make informed decisions when extending or customizing the framework.

Table of Contents


Overview

HETorch is designed to bridge the gap between PyTorch's familiar tensor-based programming model and the specialized requirements of homomorphic encryption (HE). The framework's design reflects several years of experience with HE compilation, balancing theoretical elegance with practical usability.

Design Goals

  1. Accessibility: Make HE compilation accessible to ML practitioners without deep cryptography expertise
  2. Flexibility: Support diverse HE schemes, backends, and optimization strategies
  3. Performance: Enable efficient compilation and execution of encrypted computations
  4. Maintainability: Create a codebase that's easy to understand, extend, and debug
  5. Correctness: Ensure transformations preserve semantics and produce correct results

Non-Goals

  1. Universal HE support: Not trying to support every possible HE scheme or operation
  2. Production deployment: Focus on compilation, not key management or deployment infrastructure
  3. Automatic optimization: Not attempting to automatically find optimal parameters
  4. Backend implementation: Not implementing HE backends (use existing libraries like SEAL)

Core Principles

1. Tensor-Centric Approach

Principle: Treat encrypted data as tensors, preserving PyTorch's programming model.

Rationale:

  • ML practitioners think in terms of tensors and operations
  • PyTorch's tensor abstraction is powerful and well-understood
  • Enables reuse of existing PyTorch models with minimal modification
  • Simplifies reasoning about encrypted computation

Implementation:

# Users write standard PyTorch
class Model(nn.Module):
def forward(self, x):
return self.linear(x) + self.bias

# HETorch compiles to HE operations
compiled = compiler.compile(model, example_input)

Benefits:

  • Low learning curve for PyTorch users
  • Familiar debugging and development workflow
  • Easy integration with existing PyTorch ecosystem
  • Natural expression of ML computations

Challenges:

  • Some HE operations don't map cleanly to tensor operations
  • Ciphertext metadata (level, scale) doesn't fit tensor model
  • Batching semantics differ between plaintext and encrypted computation

Resolution:

  • Use metadata attached to graph nodes for HE-specific information
  • Provide HE-aware operations (rescale, relinearize, bootstrap) as functions
  • Document differences between plaintext and encrypted execution

2. Backend-Agnostic Design

Principle: Separate compilation logic from backend-specific implementation details.

Rationale:

  • HE libraries (SEAL, HElib, PALISADE) have different APIs and capabilities
  • Users may want to switch backends without rewriting compilation logic
  • Testing and development easier with lightweight fake backends
  • Future HE libraries can be integrated without major refactoring

Implementation:

# Backend interface
class HEBackend(ABC):
@abstractmethod
def encrypt(self, plaintext: Tensor) -> Ciphertext: ...

@abstractmethod
def cadd(self, ct1: Ciphertext, ct2: Ciphertext) -> Ciphertext: ...

# ... other operations

# Multiple implementations
backend = SEALBackend() # Real HE
backend = FakeBackend() # Fast simulation
backend = CustomBackend() # User-defined

Benefits:

  • Flexibility to use different HE libraries
  • Easy testing with fake backends
  • Clear separation of concerns
  • Extensibility for new backends

Challenges:

  • Lowest common denominator API may limit backend-specific optimizations
  • Abstraction overhead in performance-critical paths
  • Different backends have different capabilities and constraints

Resolution:

  • Define core operations all backends must support
  • Allow backend-specific extensions via metadata
  • Provide backend capability queries
  • Document backend-specific behavior

3. Modularity and Extensibility

Principle: Design the system as composable, independent modules that can be extended without modifying core code.

Rationale:

  • Research in HE optimization is rapidly evolving
  • Users have domain-specific optimization needs
  • Core framework should be stable while allowing experimentation
  • Open-source contributors need clear extension points

Implementation:

# Pass-based architecture
class TransformationPass(ABC):
@abstractmethod
def transform(self, graph, context): ...

# Users can add custom passes
class MyCustomPass(TransformationPass):
def transform(self, graph, context):
# Custom optimization logic
return transformed_graph

# Compose passes into pipelines
pipeline = PassPipeline([
InputPackingPass(),
MyCustomPass(), # User-defined
RescalingInsertionPass(),
])

Benefits:

  • Easy to add new optimizations without touching core code
  • Clear boundaries between modules
  • Testable in isolation
  • Encourages community contributions

Challenges:

  • Pass ordering dependencies can be complex
  • Inter-pass communication requires careful design
  • Performance overhead from modularity
  • Debugging across multiple passes can be difficult

Resolution:

  • Explicit pass dependencies (requires/provides)
  • Shared compilation context for inter-pass communication
  • Pass validation to catch errors early
  • Debugging tools (PrintGraphPass, verbose mode)

4. Performance-Aware Compilation

Principle: Make performance characteristics visible and controllable throughout the compilation process.

Rationale:

  • HE operations have vastly different costs (addition vs multiplication vs bootstrapping)
  • Users need to understand and control performance trade-offs
  • Automatic optimization is difficult without domain knowledge
  • Transparency helps users make informed decisions

Implementation:

# Cost analysis
cost = pass.analyze_cost(graph, context)
print(f"Operations: {cost.total_operations}")
print(f"Estimated latency: {cost.estimated_latency}")
print(f"Noise budget consumed: {cost.noise_consumed}")

# Explicit control
pipeline = PassPipeline([
NonlinearToPolynomialPass(degree=8), # User controls accuracy
RescalingInsertionPass(strategy="lazy"), # User controls strategy
BootstrappingInsertionPass(threshold=20.0), # User controls threshold
])

Benefits:

  • Users understand performance implications
  • Explicit control over trade-offs
  • Easier to debug performance issues
  • Enables informed optimization decisions

Challenges:

  • Requires users to understand HE performance characteristics
  • Cost models may not match actual backend performance
  • More configuration options increase complexity

Resolution:

  • Provide sensible defaults for common cases
  • Document performance characteristics clearly
  • Offer cost analysis tools
  • Provide performance tuning guides

Key Design Decisions

Decision 1: Why torch.fx?

Decision: Use PyTorch's torch.fx for graph representation and manipulation.

Alternatives Considered:

  1. Custom IR: Build a custom intermediate representation
  2. TorchScript: Use PyTorch's TorchScript
  3. ONNX: Use ONNX as intermediate format
  4. Direct AST manipulation: Manipulate Python AST directly

Why torch.fx:

Pros:

  • Native PyTorch integration (no conversion overhead)
  • Symbolic tracing captures computation graph automatically
  • Graph manipulation API is powerful and well-documented
  • Active development and community support
  • Preserves Python semantics (unlike TorchScript)
  • Easy debugging (graphs are readable Python code)

Cons:

  • Relatively new (introduced PyTorch 1.8)
  • Some limitations on traceable code (dynamic control flow)
  • Learning curve for graph manipulation
  • Less mature than TorchScript

Rationale:

# torch.fx makes graph manipulation straightforward
import torch.fx as fx

traced = fx.symbolic_trace(model)
graph = traced.graph

# Easy to inspect
print(graph) # Readable Python-like code

# Easy to modify
for node in graph.nodes:
if node.target == torch.relu:
# Replace ReLU with polynomial approximation
node.target = polynomial_relu

traced.recompile()

The decision to use torch.fx was driven by:

  1. Native integration: No conversion between PyTorch and custom IR
  2. Ease of use: Graph manipulation is intuitive
  3. Debugging: Graphs are human-readable
  4. Future-proof: Active PyTorch development

Trade-offs Accepted:

  • Limited to traceable PyTorch code (no dynamic control flow)
  • Tied to PyTorch's release cycle
  • Some overhead from graph representation

Decision 2: Why Pass-Based Architecture?

Decision: Structure compilation as a pipeline of transformation passes.

Alternatives Considered:

  1. Monolithic compiler: Single compilation function
  2. Rule-based rewriting: Pattern matching and rewriting rules
  3. Dataflow analysis: Traditional compiler dataflow framework
  4. ML-based optimization: Learn optimal transformations

Why pass-based:

Pros:

  • Modularity: Each pass has single responsibility
  • Composability: Passes can be combined in different orders
  • Extensibility: Easy to add new passes
  • Testability: Passes can be tested in isolation
  • Understandability: Each pass is simple and focused

Cons:

  • Pass ordering dependencies can be complex
  • Multiple graph traversals (performance overhead)
  • Inter-pass optimization opportunities may be missed
  • Requires careful design of pass interfaces

Rationale:

# Each pass does one thing well
class NonlinearToPolynomialPass(TransformationPass):
def transform(self, graph, context):
# Only handles nonlinear → polynomial
return graph_with_polynomials

class RescalingInsertionPass(TransformationPass):
def transform(self, graph, context):
# Only handles rescaling insertion
return graph_with_rescaling

# Compose into pipeline
pipeline = PassPipeline([
NonlinearToPolynomialPass(),
RescalingInsertionPass(),
DeadCodeEliminationPass(),
])

The pass-based architecture enables:

  1. Incremental development: Add passes one at a time
  2. Experimentation: Try different pass combinations
  3. Maintenance: Fix bugs in isolated passes
  4. Community contributions: Clear extension points

Trade-offs Accepted:

  • Some performance overhead from multiple traversals
  • Pass ordering requires careful thought
  • Global optimizations are harder to implement

Decision 3: Why Metadata-Driven?

Decision: Store HE-specific information (level, scale, noise) as metadata on graph nodes rather than in the type system.

Alternatives Considered:

  1. Type system: Encode HE properties in tensor types
  2. Separate data structures: Maintain parallel data structures for HE info
  3. Wrapper classes: Wrap tensors in HE-aware classes
  4. Implicit tracking: Infer properties from operations

Why metadata:

Pros:

  • Flexible: Can attach arbitrary information to nodes
  • Non-invasive: Doesn't change PyTorch's type system
  • Easy to add/remove: Metadata can be added by passes as needed
  • Debugging-friendly: Can inspect metadata without affecting execution

Cons:

  • Not type-safe: Metadata access isn't checked at compile time
  • Can be lost: Metadata must be explicitly preserved during transformations
  • Less discoverable: Metadata keys aren't documented in types
  • Runtime overhead: Dictionary lookups for metadata access

Rationale:

# Metadata attached to graph nodes
node.meta['ciphertext_info'] = CiphertextInfo(
level=2,
scale=2**40,
noise_budget=85.0,
)

# Passes can read and update metadata
if 'ciphertext_info' in node.meta:
info = node.meta['ciphertext_info']
if info.noise_budget < 20.0:
# Insert bootstrapping
pass

The metadata approach provides:

  1. Flexibility: Can add new properties without changing core types
  2. Compatibility: Works with existing PyTorch infrastructure
  3. Simplicity: No complex type system extensions needed
  4. Extensibility: Users can add custom metadata

Trade-offs Accepted:

  • Metadata must be explicitly preserved during transformations
  • No compile-time type checking for metadata
  • Potential for metadata inconsistencies

Decision 4: Why Fake Backend First?

Decision: Implement a fast simulation backend before real HE backends.

Alternatives Considered:

  1. Real backend first: Start with SEAL or HElib integration
  2. Mock backend: Simple mocks for testing only
  3. No backend: Just graph transformations
  4. Multiple backends simultaneously: Implement several backends in parallel

Why fake backend first:

Pros:

  • Fast development: No HE library dependencies during development
  • Easy testing: Tests run quickly without actual encryption
  • Clear interface: Defines backend API before implementation
  • Debugging: Easier to debug without encryption overhead
  • Noise simulation: Can simulate noise without real HE

Cons:

  • Not production-ready: Can't use for real encrypted computation
  • May not match real backend behavior: Simulation isn't perfect
  • Extra work: Need to implement both fake and real backends
  • False confidence: Tests may pass with fake backend but fail with real HE

Rationale:

# FakeBackend for development and testing
class FakeBackend(HEBackend):
def encrypt(self, plaintext):
# Just wrap tensor, no actual encryption
return FakeCiphertext(plaintext)

def cadd(self, ct1, ct2):
# Plain addition, track noise
result = ct1.data + ct2.data
noise = ct1.noise + ct2.noise + ADD_NOISE
return FakeCiphertext(result, noise)

# Fast tests
def test_compilation():
backend = FakeBackend() # Instant, no HE overhead
compiled = compiler.compile(model, backend)
assert compiled(input) == expected

The fake backend enables:

  1. Rapid iteration: Develop and test without HE overhead
  2. Clear contracts: Define backend interface before implementation
  3. Noise simulation: Test noise management without real HE
  4. Debugging: Easier to debug compilation issues

Trade-offs Accepted:

  • Need to implement real backends eventually
  • Fake backend behavior may diverge from real HE
  • Tests with fake backend don't guarantee real HE correctness

Architectural Trade-offs

Trade-off 1: Flexibility vs Simplicity

Tension: Making the framework flexible enough for research vs keeping it simple for users.

Flexibility side:

  • Support multiple HE schemes (CKKS, BFV, BGV)
  • Allow custom passes and backends
  • Configurable optimization strategies
  • Extensible metadata system

Simplicity side:

  • Single recommended workflow
  • Sensible defaults
  • Minimal configuration required
  • Clear documentation

Resolution:

  • Layered API: Simple high-level API, powerful low-level API
  • Defaults: Provide good defaults for common cases
  • Progressive disclosure: Advanced features available but not required
  • Examples: Show simple cases first, advanced cases later
# Simple API (most users)
compiled = hetorch.compile(model, example_input)

# Advanced API (power users)
pipeline = PassPipeline([
CustomPass(param1=value1),
# ... detailed configuration
])
compiler = HETorchCompiler(context, pipeline)
compiled = compiler.compile(model, example_input)

Trade-off 2: Performance vs Generality

Tension: Optimizing for specific cases vs supporting general computations.

Performance side:

  • Specialize for neural networks
  • Assume CKKS scheme
  • Optimize for common patterns
  • Backend-specific optimizations

Generality side:

  • Support arbitrary PyTorch models
  • Support multiple HE schemes
  • Work with any backend
  • No assumptions about computation patterns

Resolution:

  • Core generality: Core framework supports general computations
  • Specialized passes: Optional passes for specific patterns (e.g., neural networks)
  • Backend capabilities: Backends can expose capabilities for optimization
  • User choice: Users choose between general and specialized pipelines
# General pipeline
general_pipeline = PassPipeline([
InputPackingPass(),
NonlinearToPolynomialPass(),
RescalingInsertionPass(),
])

# Specialized for neural networks
nn_pipeline = PassPipeline([
InputPackingPass(),
BatchNormFoldingPass(), # NN-specific
NonlinearToPolynomialPass(),
LinearLayerBSGSPass(), # NN-specific optimization
RescalingInsertionPass(),
])

Trade-off 3: Abstraction vs Control

Tension: Hiding HE complexity vs giving users control over HE operations.

Abstraction side:

  • Hide ciphertext management
  • Automatic rescaling and relinearization
  • Automatic bootstrapping insertion
  • Abstract away HE schemes

Control side:

  • Explicit ciphertext operations
  • Manual rescaling and relinearization
  • User-controlled bootstrapping
  • Expose HE scheme details

Resolution:

  • Default abstraction: Automatic management by default
  • Opt-in control: Users can take control when needed
  • Visibility: Make automatic decisions visible (logging, cost analysis)
  • Override mechanisms: Allow users to override automatic decisions
# Automatic (default)
pipeline = PassPipeline([
RescalingInsertionPass(strategy="auto"), # Automatic
BootstrappingInsertionPass(strategy="auto"), # Automatic
])

# Manual control
pipeline = PassPipeline([
RescalingInsertionPass(strategy="manual", locations=[...]),
BootstrappingInsertionPass(strategy="manual", locations=[...]),
])

Design Patterns

Pattern 1: Context Object

Pattern: Pass a shared CompilationContext through all passes.

Purpose: Share configuration and state across passes without global variables.

Implementation:

@dataclass
class CompilationContext:
scheme: HEScheme
params: CKKSParameters
backend: HEBackend
metadata: Dict[str, Any] # Shared state

# All passes receive context
class TransformationPass(ABC):
def transform(self, graph: fx.GraphModule, context: CompilationContext):
# Access shared configuration
if context.scheme == HEScheme.CKKS:
# CKKS-specific logic
pass

Benefits:

  • No global state
  • Easy to test (inject test context)
  • Clear dependencies
  • Thread-safe

Pattern 2: Visitor Pattern for Graph Traversal

Pattern: Use visitor pattern for graph node processing.

Purpose: Separate graph traversal from node processing logic.

Implementation:

class GraphVisitor:
def visit(self, node: fx.Node):
method_name = f'visit_{node.op}'
visitor = getattr(self, method_name, self.generic_visit)
return visitor(node)

def visit_call_function(self, node: fx.Node):
# Handle function calls
pass

def visit_call_module(self, node: fx.Node):
# Handle module calls
pass

Benefits:

  • Clean separation of concerns
  • Easy to extend with new node types
  • Reusable traversal logic

Pattern 3: Builder Pattern for Pipelines

Pattern: Use builder pattern for constructing pass pipelines.

Purpose: Make pipeline construction flexible and readable.

Implementation:

class PipelineBuilder:
def __init__(self):
self.passes = []

def add_packing(self, strategy="row_major"):
self.passes.append(InputPackingPass(strategy))
return self

def add_polynomial_approximation(self, degree=8):
self.passes.append(NonlinearToPolynomialPass(degree))
return self

def build(self):
return PassPipeline(self.passes)

# Fluent API
pipeline = (PipelineBuilder()
.add_packing()
.add_polynomial_approximation(degree=8)
.add_rescaling()
.build())

Benefits:

  • Readable pipeline construction
  • Validation during building
  • Reusable pipeline templates

Evolution and Lessons Learned

Lesson 1: Start with Fake Backend

Initial approach: Tried to integrate SEAL immediately.

Problem: Development was slow, debugging was difficult, tests took forever.

Solution: Built FakeBackend first for rapid development.

Lesson: Simulation backends are invaluable for development and testing.

Lesson 2: Metadata is Better Than Types

Initial approach: Tried to encode HE properties in custom tensor types.

Problem: Type system became complex, didn't integrate well with PyTorch.

Solution: Use metadata attached to graph nodes.

Lesson: Metadata provides flexibility without fighting the framework.

Lesson 3: Passes Should Be Small

Initial approach: Large passes that did multiple transformations.

Problem: Hard to test, debug, and maintain. Difficult to reuse.

Solution: Break into small, focused passes.

Lesson: Single-responsibility principle applies to compiler passes.

Lesson 4: Make Performance Visible

Initial approach: Automatic optimization without user visibility.

Problem: Users couldn't understand or control performance.

Solution: Add cost analysis and explicit configuration.

Lesson: Transparency is more valuable than magic in HE compilation.


Future Directions

Direction 1: Advanced Optimizations

Goal: Implement more sophisticated optimization passes.

Examples:

  • Loop fusion and tiling
  • Operator fusion (beyond consecutive additions)
  • Memory layout optimization
  • Backend-specific optimizations

Challenges:

  • Complexity of analysis
  • Maintaining correctness
  • Balancing generality and specialization

Direction 2: Automatic Parameter Selection

Goal: Help users choose CKKS parameters automatically.

Approach:

  • Analyze model to determine required depth
  • Suggest parameters based on security requirements
  • Provide parameter search tools

Challenges:

  • Parameter selection is complex
  • Trade-offs between security, performance, and accuracy
  • Different backends have different constraints

Direction 3: Production Deployment Support

Goal: Add features for production deployment.

Examples:

  • Key management integration
  • Serialization and deserialization
  • Client-server protocols
  • Monitoring and logging

Challenges:

  • Out of scope for core compilation
  • Many deployment-specific requirements
  • Security considerations

See Also