Architecture¶
Overview¶
Version: v0.5.1
Polyglot FFI uses a multi-stage pipeline to generate FFI bindings:
flowchart LR
A["Source File (.mli)"] --> B["Parser"]
B --> C["IR"]
C --> D["Type Registry"]
D --> E["Generators"]
E --> F["Output Files"]
C --> G["Type Mappings<br/>(OCaml / Python / C / Rust)"]
Design Principles¶
- Language-Agnostic IR - Intermediate representation decouples source and target languages
- Pluggable Generators - Easy to add new target languages
- Type-Safe - Preserves type information throughout pipeline
- Memory-Safe - Proper CAMLparam/CAMLreturn, no leaks
- Testable - Each component independently unit tested
- Extensible Type System - Type Registry allows custom type mappings
Components¶
1. Parser (src/polyglot_ffi/parsers/ocaml.py)¶
Purpose: Parse OCaml .mli files into AST
Features: - Regex-based parsing - Multi-line signature support (partial) - Documentation comment extraction - Error messages with line numbers
Example:
from polyglot_ffi.parsers.ocaml import parse_mli_file
module = parse_mli_file("crypto.mli")
# Returns IRModule with functions
2. Intermediate Representation (src/polyglot_ffi/ir/types.py)¶
Purpose: Language-agnostic type system
Key Types:
- IRModule - Top-level module
- IRFunction - Function with params and return type
- IRParameter - Function parameter
- IRType - Type representation (primitive, option, list, tuple, record, variant)
- IRTypeDefinition - Custom type definitions (records, variants)
- TypeKind - Enum of type categories
Enhancements:
- Added OPTION, LIST, TUPLE type kinds
- Added RECORD and VARIANT type kinds
- Support for nested and combined types
- Type variables for polymorphic functions
Example:
from polyglot_ffi.ir.types import (
IRFunction, IRParameter, STRING, INT,
ir_option, ir_list, ir_tuple
)
# Simple function
func = IRFunction(
name="add",
params=[IRParameter("x", INT), IRParameter("y", INT)],
return_type=INT
)
# Function with complex types
find_func = IRFunction(
name="find_user",
params=[IRParameter("name", STRING)],
return_type=ir_option(STRING) # Returns string option
)
3. Type Registry (src/polyglot_ffi/type_system/)¶
Purpose: Manage type mappings between languages
The Type Registry provides centralized, extensible type mapping management.
Key Components:
A. TypeRegistry (registry.py)¶
- Registers primitive type mappings
- Handles complex type conversions
- Supports custom type converters
- Validates type mappings
Example:
from polyglot_ffi.type_system import TypeRegistry
registry = TypeRegistry()
# Register a primitive type
registry.register_primitive("string", {
"ocaml": "string",
"python": "str",
"c": "char*",
"rust": "String"
})
# Get mapping for target language
python_type = registry.get_mapping(ir_type, "python")
B. Built-in Types (builtin.py)¶
Pre-registered mappings for all standard types: - Primitives: string, int, float, bool, unit - Complex: option, list, tuple types - Multi-language support: OCaml, Python, C, Rust
Features:
- Automatic handling of Optional[T], List[T], Tuple[T1, T2]
- Consistent naming conventions across languages
- Extensible via custom converters
Example:
from polyglot_ffi.type_system import get_default_registry
from polyglot_ffi.ir.types import ir_option, STRING
registry = get_default_registry()
# Automatic complex type mapping
option_str = ir_option(STRING)
python_type = registry.get_mapping(option_str, "python")
# Returns: "Optional[str]"
4. Generators (src/polyglot_ffi/generators/)¶
A. Ctypes Generator (ctypes_gen.py)¶
Generates OCaml ctypes bindings:
- type_description.ml - Type definitions module
- function_description.ml - Foreign function declarations
B. C Stub Generator (c_stubs_gen.py)¶
Generates C wrapper code:
- OCaml runtime initialization (ml_init() function with guard)
- Proper CAMLparam/CAMLlocal/CAMLreturn macros
- Type conversions (OCaml ↔ C)
- Memory management (strdup for strings)
- Multi-parameter callback support
C. Python Generator (python_gen.py)¶
Generates Python wrapper:
- Automatic runtime initialization (calls ml_init() at import)
- Platform detection (macOS .dylib, Linux .so, Windows .dll)
- Type hints
- Error handling with custom exceptions
- UTF-8 encoding/decoding
- Pythonic API
D. Dune Generator (dune_gen.py)¶
Generates build configuration:
- dune - Library and rule definitions with threading support
- dune-project - Project metadata
- Shared library creation rules for macOS and Linux
- OCaml library support - Automatically links additional OCaml libraries (str, unix, threads, etc.)
- C library flags - Adds appropriate C linker flags for OCaml libraries (e.g., -lcamlstr for str)
5. Commands (src/polyglot_ffi/commands/)¶
High-level command implementations:
init.py- Project scaffolding with templatesgenerate.py- Binding generation orchestrationcheck.py- Project validation and dependency checkingclean.py- Generated file cleanupwatch.py- Auto-regenerate on file changes
6. CLI (src/polyglot_ffi/cli/main.py)¶
Click-based command-line interface with rich output:
Features: - Progress indicators and spinners - Colored output for better UX - Shell completions (Bash, Zsh, Fish) - Verbose mode for debugging - Dry-run mode for preview - Force regeneration option
Commands:
- polyglot-ffi init - Initialize new project
- polyglot-ffi generate - Generate bindings
- polyglot-ffi check - Validate configuration
- polyglot-ffi clean - Clean generated files
- polyglot-ffi watch - Watch for changes
Data Flow¶
Generation Pipeline¶
- Parse
.mlifile → AST - Convert AST → IR (IRModule)
- Validate IR types
- Generate target code from IR
- Write files to output directory
Type Mapping¶
OCaml Type → IR Type → C Type → Python Type
----------- ---------- -------- ------------
string → STRING → char* → str
int → INT → int → int
float → FLOAT → double → float
bool → BOOL → int → bool
unit → UNIT → void → None
Generated Code Structure¶
For crypto.mli:
generated/
├── type_description.ml # OCaml: module Types (F : TYPE) = ...
├── function_description.ml # OCaml: module Functions (F : FOREIGN) = ...
├── crypto_stubs.c # C: ml_encrypt, ml_decrypt, ml_hash
├── crypto_stubs.h # C: function declarations
├── dune # Dune: library + rule
├── dune-project # Dune: project metadata
└── crypto_py.py # Python: encrypt(), decrypt(), hash()
Memory Safety¶
OCaml Side¶
- Functions registered with
Callback.register - GC-managed memory
C Side¶
CAMLparam0()- Declare no parametersCAMLlocal2(ml_x, ml_y)- Declare local GC rootscaml_copy_string()- Copy C string to OCamlString_val()- Get C string from OCamlstrdup()- Duplicate string for C ownershipCAMLreturnT(type, value)- Return with GC awareness
Python Side¶
encode('utf-8')- Convert Python str to bytesdecode('utf-8')- Convert bytes to Python str- Error handling prevents NULL pointer dereferences
Extensibility¶
Adding a New Target Language¶
- Create
src/polyglot_ffi/generators/rust_gen.py - Implement generator class with
generate()method - Map IR types to Rust types
- Register in
__init__.py - Add tests
Adding a New Source Language¶
- Create
src/polyglot_ffi/parsers/rust.py - Implement parser class with
parse()method - Convert to IR
- Add tests
Testing Strategy¶
Current Status (v0.5.1): - 364 tests passing - 71% code coverage (20 modules at 100%) - Comprehensive test suite covering all components
Test Categories:
- Unit tests (
tests/unit/) - Test each component independently - Parser tests (OCaml
.mliparsing) - Generator tests (ctypes, C stubs, Python, Dune)
- Type system tests (primitives, complex types, registry)
-
IR tests (type definitions, functions, modules)
-
Integration tests (
tests/integration/) - Test end-to-end generation - Full generation pipeline
- CLI command testing
- Multi-file projects
-
Error handling
-
Fixtures (
tests/fixtures/) - Example.mlifiles for testing - Simple functions (primitives)
- Complex types (options, lists, tuples)
- Real-world examples
Coverage Configuration: - Excludes CLI entry points (tested via integration tests) - Targets business logic and core components - Standard production coverage practices
Performance¶
- Parsing: < 10ms for typical files
- Generation: < 100ms total
- Zero runtime overhead: All code generated at build time