Architecture Layers¶

Engineering Documentation

This is internal engineering documentation. For general usage, see the Architecture page.

Package Structure¶

JoyfulJay is organized into clear architectural layers:

Text Only

src/joyfuljay/
├── core/           # Core abstractions (Flow, Packet, Pipeline, Config)
├── capture/        # Packet capture backends (Scapy, dpkt, remote)
├── extractors/     # Feature extraction modules
├── schema/         # Feature registry, profiles, schema generation
├── resources/      # Runtime resources (profiles, schemas)
├── output/         # Output formats (CSV, JSON, Parquet, database)
├── remote/         # Remote capture protocol and server
├── monitoring/     # Prometheus metrics integration
├── utils/          # Shared utilities
├── cli/            # Command-line interface
└── extensions/     # Optional Cython accelerators

Layer Responsibilities¶

Core Layer (`core/`)¶

The foundation of JoyfulJay:

packet.py: Packet dataclass representing parsed network packets
flow.py: Flow and FlowKey for bidirectional flow tracking
pipeline.py: Pipeline orchestrates extraction from capture to output
config.py: Config dataclass for all configuration options

Dependency rule: Core depends only on stdlib and typing.

Capture Layer (`capture/`)¶

Packet capture backends:

scapy_backend.py: Primary backend using Scapy (default)
dpkt_backend.py: Alternative backend using dpkt (faster for some workloads)
remote_backend.py: Client for remote capture from remote/ servers

Dependency rule: Capture depends on Core and external capture libraries.

Extractors Layer (`extractors/`)¶

Feature extraction modules:

base.py: BaseExtractor abstract class defining the extractor interface
Each extractor file implements one feature group (timing, tls, quic, etc.)

Extractor contract:

Python

class MyExtractor(BaseExtractor):
    @staticmethod
    def feature_ids() -> list[str]:
        """Return list of feature IDs this extractor produces."""

    @staticmethod
    def feature_meta() -> list[FeatureMeta]:
        """Return metadata for all features."""

    def extract(self, flow: Flow) -> dict[str, Any]:
        """Extract features from a flow."""

Dependency rule: Extractors depend on Core and Schema (for FeatureMeta).

Schema Layer (`schema/`)¶

Feature registry and profiles:

registry.py: FeatureMeta dataclass, feature ID collection
profiles.py: Profile loading (JJ-CORE, JJ-EXTENDED, JJ-EXPERIMENTAL)
tiering.py: Validation that all features are assigned to profiles
generate.py: Schema JSON generation

Dependency rule: Schema depends only on Core and resources.

Resources Layer (`resources/`)¶

Runtime resources shipped with the package:

profiles/: Profile text files listing feature IDs
schema/v1.0/: Generated feature schema JSON

Loaded via importlib.resources for reliable access in installed packages.

Remote Layer (`remote/`)¶

Remote capture protocol:

protocol.py: Message types and serialization (msgpack)
server.py: WebSocket server for streaming packets
discovery.py: mDNS/Bonjour service discovery

Relationship to capture: - remote/ defines the transport protocol and server - capture/remote_backend.py is a capture backend that consumes remote/

Text Only

┌─────────────────┐     WebSocket      ┌─────────────────┐
│ capture/        │ ◄────────────────► │ remote/         │
│ remote_backend  │   (msgpack proto)  │ server          │
└─────────────────┘                    └─────────────────┘

Output Layer (`output/`)¶

Output format handlers:

formats.py: CSV, JSON, Parquet writers
database.py: SQLite, PostgreSQL writers
kafka.py: Kafka streaming output
schema.py: Feature documentation generation

Extensions Layer (`extensions/`)¶

Optional Cython accelerators:

_fast_entropy.pyx: Accelerated entropy calculation
_fast_stats.pyx: Accelerated statistical computations
build_extensions.py: Build script for Cython modules

Pure Python fallback: All accelerated code has pure Python alternatives. If Cython modules aren't available, the library works with slightly lower performance.

Dependency Graph¶

Text Only

                    ┌──────────┐
                    │   CLI    │
                    └────┬─────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
    ┌─────────┐    ┌──────────┐    ┌──────────┐
    │ Output  │    │ Pipeline │    │ Monitoring│
    └────┬────┘    └────┬─────┘    └────┬─────┘
         │              │               │
         │    ┌─────────┴─────────┐     │
         │    │                   │     │
         ▼    ▼                   ▼     │
    ┌─────────┐              ┌─────────┐│
    │ Schema  │◄─────────────│Extractors││
    └────┬────┘              └────┬────┘│
         │                        │     │
         ├────────────────────────┤     │
         │                        │     │
         ▼                        ▼     │
    ┌─────────┐              ┌─────────┐│
    │Resources│              │ Capture │◄┘
    └─────────┘              └────┬────┘
                                  │
                             ┌────┴────┐
                             │  Core   │
                             └─────────┘

Extension Points¶

Adding a New Extractor¶

Create extractors/my_feature.py
Implement BaseExtractor with feature_ids(), feature_meta(), extract()
Register in extractors/__init__.py
Add feature IDs to appropriate profile in resources/profiles/
Add tests in tests/unit/extractors/test_my_feature.py

Adding a New Capture Backend¶

Create capture/my_backend.py
Implement the backend interface (iter_packets_offline, iter_packets_live)
Register in capture/__init__.py
Add to Config.backend choices

Adding a New Output Format¶

Add writer function to output/formats.py or create new module
Register in CLI --format choices
Add tests

Architecture Layers¶

Package Structure¶

Layer Responsibilities¶

Core Layer (core/)¶

Capture Layer (capture/)¶

Extractors Layer (extractors/)¶

Schema Layer (schema/)¶

Resources Layer (resources/)¶

Remote Layer (remote/)¶

Output Layer (output/)¶

Extensions Layer (extensions/)¶

Dependency Graph¶

Extension Points¶

Adding a New Extractor¶

Adding a New Capture Backend¶

Adding a New Output Format¶

Core Layer (`core/`)¶

Capture Layer (`capture/`)¶

Extractors Layer (`extractors/`)¶

Schema Layer (`schema/`)¶

Resources Layer (`resources/`)¶

Remote Layer (`remote/`)¶

Output Layer (`output/`)¶

Extensions Layer (`extensions/`)¶