Feature Extractors¶

JoyfulJay uses a modular extractor architecture to extract features from network traffic. Each extractor focuses on a specific aspect of network behavior, producing ML-ready features without requiring decryption.

How Extractors Work¶

Extractors process network flows (bidirectional connections) and produce feature vectors:

graph LR
    A[PCAP/Live] --> B[Flow Assembly]
    B --> C[Packet Processing]
    C --> D[Feature Extractors]
    D --> E[ML-Ready Output]

Each extractor implements a simple interface:

extract(flow): Process a flow and return features
get_feature_names(): Return list of feature names
reset(): Clear state for next flow

Available Extractors¶

Core Extractors¶

These extractors handle fundamental network traffic analysis:

Extractor	Group	Features	Description
FlowMetaExtractor	`flow_meta`	22	Flow identification, duration, packet/byte counts
TimingExtractor	`timing`	35	Inter-arrival times, bursts, idle periods
SizeExtractor	`size`	15	Packet length statistics

Protocol Extractors¶

Specialized extractors for encrypted protocol analysis:

Extractor	Group	Features	Description
TLSExtractor	`tls`	30+	TLS metadata, JA3/JA3S fingerprints, certificates
QUICExtractor	`quic`	10	QUIC version, connection IDs, SNI
SSHExtractor	`ssh`	10	SSH version, HASSH fingerprints
DNSExtractor	`dns`	15	DNS queries, response codes, TTLs

TCP Analysis Extractors¶

Detailed TCP behavior analysis:

Extractor	Group	Features	Description
TCPExtractor	`tcp`	26	TCP flags, handshake, retransmissions

Traffic Classification Extractors¶

Pattern detection for traffic fingerprinting:

Extractor	Group	Features	Description
FingerprintExtractor	`fingerprint`	8	Tor, VPN, DoH detection
EntropyExtractor	`entropy`	6	Payload entropy analysis
PaddingExtractor	`padding`	14	Constant-size and rate detection

Selecting Feature Groups¶

Extract All Features (Default)¶

Python

import joyfuljay as jj

# All extractors enabled by default
df = jj.extract("capture.pcap")

Select Specific Groups¶

Choose only the feature groups you need:

Python

import joyfuljay as jj

# Only timing, TLS, and fingerprint features
df = jj.extract("capture.pcap", features=["timing", "tls", "fingerprint"])

Using Configuration¶

For fine-grained control:

Python

import joyfuljay as jj

config = jj.Config(
    features=["flow_meta", "timing", "size", "tls"],
    bidirectional_split=True,  # Separate forward/backward features
    include_raw_sequences=True,  # Include SPLT sequences
    max_sequence_length=100,
)

pipeline = jj.Pipeline(config)
df = pipeline.process_pcap("capture.pcap")

Command Line¶

Bash

# Select specific feature groups
jj extract capture.pcap --features timing tls fingerprint -o features.csv

# List all available features
jj features

Feature Naming Convention¶

All features follow a consistent naming pattern:

Text Only

{category}_{metric}[_{direction}]

Examples:

Feature	Meaning
`iat_mean`	Mean inter-arrival time
`pkt_len_std`	Packet length standard deviation
`tcp_syn_count`	Count of TCP SYN flags
`tls_version`	TLS protocol version
`ja3_hash`	JA3 client fingerprint

Directional Features¶

When bidirectional_split=True, features are computed separately for each direction:

Base Feature	Forward	Backward
`iat_mean`	`iat_mean_fwd`	`iat_mean_bwd`
`pkt_len_std`	`pkt_len_std_fwd`	`pkt_len_std_bwd`
`total_bytes`	`bytes_fwd`	`bytes_bwd`

Forward = Client to Server (flow initiator) Backward = Server to Client (flow responder)

Feature Types¶

Type	Python Type	Example
Integer	`int`	Packet counts, flag counts
Float	`float`	Statistics, ratios, durations
String	`str`	Hashes, IP addresses, SNI
Boolean	`bool`	Detection flags (`likely_tor`)
List	`list[int]`	Sequences (SPLT)

Performance Characteristics¶

Fast Extractors¶

These extractors work on packet headers only:

flow_meta - Basic flow statistics
timing - Inter-arrival times
size - Packet lengths
tcp - TCP header analysis

Deep Inspection Extractors¶

These extractors require payload access:

tls - TLS handshake parsing
quic - QUIC header parsing
ssh - SSH banner parsing
dns - DNS message parsing
entropy - Payload entropy calculation

The pipeline automatically enables payload capture when these groups are selected.

Recommended Feature Sets¶

For Traffic Classification¶

Python

config = jj.Config(
    features=["timing", "size", "tls", "fingerprint"],
    bidirectional_split=True,
)

For Anomaly Detection¶

Python

config = jj.Config(
    features=["timing", "size", "entropy", "tcp"],
    include_raw_sequences=True,
    max_sequence_length=50,
)

For Application Identification¶

Python

config = jj.Config(
    features=["tls", "quic", "dns", "flow_meta"],
)

Minimal Feature Set (Fast)¶

Python

config = jj.Config(
    features=["flow_meta", "timing", "size"],
)

Creating Custom Extractors¶

See the Custom Extractors Tutorial for a complete guide to creating your own extractors.

Basic structure:

Python

from joyfuljay.extractors.base import FeatureExtractor

class MyExtractor(FeatureExtractor):
    """Custom feature extractor."""

    name = "my_features"

    def get_feature_names(self) -> list[str]:
        return ["my_feature_1", "my_feature_2"]

    def extract(self, flow) -> dict:
        return {
            "my_feature_1": self._compute_feature_1(flow),
            "my_feature_2": self._compute_feature_2(flow),
        }

Feature Extractors¶

How Extractors Work¶

Available Extractors¶

Core Extractors¶

Protocol Extractors¶

TCP Analysis Extractors¶

Traffic Classification Extractors¶

Selecting Feature Groups¶

Extract All Features (Default)¶

Select Specific Groups¶

Using Configuration¶

Command Line¶

Feature Naming Convention¶

Directional Features¶

Feature Types¶

Performance Characteristics¶

Fast Extractors¶

Deep Inspection Extractors¶

Recommended Feature Sets¶

For Traffic Classification¶

For Anomaly Detection¶

For Application Identification¶

Minimal Feature Set (Fast)¶

Creating Custom Extractors¶

See Also¶