Benchmark Protocol¶

This document describes the exact methodology for reproducing JoyfulJay benchmark results. It is written in the style of a methods section to enable precise replication.

Test Environment¶

Hardware Configuration¶

Component	Specification
CPU	Apple M4 (ARM64)
RAM	24 GB
Storage	NVMe SSD
OS	macOS 15.1 (Darwin 25.1.0)
Python	3.14.0a3

Software Versions¶

Package	Version
JoyfulJay	0.1.0
dpkt	1.9.8
scapy	2.5.0
nfstream	6.5.3
pandas	2.2.0
numpy	1.26.3

Datasets¶

Primary Test Files¶

Dataset	Size	Packets	Flows	Description
`smallFlows.pcap`	9 MB	14,261	641	Mixed web traffic (ISCX)
`bigFlows.pcap`	351 MB	791,615	28,471	Enterprise network (CIC)

Data Sources¶

Datasets are from the Canadian Institute for Cybersecurity (CIC) public datasets: - ISCX-IDS-2012 - CICIDS-2017

Feature Profile¶

All benchmarks use the JJ-CORE v1.0 feature profile unless otherwise specified:

Profile: JJ-CORE
Schema Version: v1.0
Feature Count: 151 features
Stability: Frozen (no breaking changes)

Extraction Configuration¶

Default Configuration¶

Python

import joyfuljay as jj

config = jj.Config(
    flow_timeout=120.0,           # Flow timeout in seconds
    terminate_on_fin_rst=False,   # Do not terminate flows on FIN/RST
    profile="JJ-CORE",            # Feature profile
    backend="dpkt",               # Packet parsing backend
)

CLI Equivalent¶

Bash

jj extract bigFlows.pcap \
    --profile JJ-CORE \
    --backend dpkt \
    --flow-timeout 120 \
    --no-terminate-on-fin-rst \
    -o features.csv

Measurement Protocol¶

Metrics Collected¶

Metric	Description	Method
Wall-clock time	Total extraction time	`time.perf_counter()`
Peak memory	Maximum memory usage	`tracemalloc.get_traced_memory()`
Throughput (MB/s)	File size / time	`file_size_mb / elapsed_seconds`
Throughput (PPS)	Packets per second	`packet_count / elapsed_seconds`
Flow count	Extracted flows	`len(df)`
Feature count	Output columns	`len(df.columns)`

Measurement Procedure¶

Warmup: Run garbage collection before each measurement
Single run: Each configuration is run once (not averaged)
Isolation: No other intensive processes during measurement
Memory tracking: tracemalloc enabled before extraction

Python

import gc
import time
import tracemalloc

gc.collect()
tracemalloc.start()

start = time.perf_counter()
df = jj.extract(pcap_path, config=config)
elapsed = time.perf_counter() - start

current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

Reproducibility¶

Random Seeds¶

JoyfulJay feature extraction is deterministic and does not use random seeds. Given the same: - PCAP file - JoyfulJay version - Configuration

The output is byte-for-byte identical.

Configuration Hash¶

Each benchmark includes a configuration hash for verification:

Python

from joyfuljay.provenance import compute_config_hash

config_hash = compute_config_hash(config.to_dict())
# Returns: "sha256:abc123..."

Provenance Sidecar¶

Full provenance metadata is written alongside benchmark results:

Bash

ls benchmarks/results/
# comprehensive_benchmark_results.json
# comprehensive_benchmark_results.json.provenance.json

Running Benchmarks¶

Full Benchmark Suite¶

Bash

# Clone repository
git clone https://github.com/cenab/joyfuljay.git
cd joyfuljay

# Create virtual environment
python -m venv benchmarks/.venv
source benchmarks/.venv/bin/activate

# Install dependencies
pip install -e .
pip install nfstream pandas dpkt scapy

# Run comprehensive benchmarks
python benchmarks/comprehensive_benchmark.py

Individual Configuration¶

Bash

# Run with specific backend
python -m benchmarks.run \
    --pcap data/bigFlows.pcap \
    --backend dpkt \
    --profile JJ-CORE \
    --output results/benchmark.json

Comparison Methodology¶

Fair Comparison¶

When comparing with other tools:

Same flow definition: Bidirectional 5-tuple
Same timeout: 120 seconds idle timeout
Same dataset: Identical PCAP file
Same hardware: All measurements on same machine

NFStream Comparison Settings¶

Python

from nfstream import NFStreamer

streamer = NFStreamer(
    source="bigFlows.pcap",
    idle_timeout=120,
    active_timeout=1800,
)
df = streamer.to_pandas()

Results Reporting¶

Required Information¶

When citing benchmark results, include:

JoyfulJay version (e.g., 0.1.0)
Feature profile (e.g., JJ-CORE v1.0)
Schema version (e.g., v1.0)
Backend (e.g., dpkt)
Hardware specification
Dataset description
Configuration hash

Example Citation¶

Benchmarks were performed using JoyfulJay v0.1.0 with the JJ-CORE v1.0 feature profile (151 features) on an Apple M4 system with 24 GB RAM. The bigFlows.pcap dataset (351 MB, 791,615 packets) was processed with a 120-second flow timeout using the dpkt backend, achieving 3.6 MB/s throughput and extracting 28,471 flows with 387 features each.

Benchmark Protocol¶

Test Environment¶

Hardware Configuration¶

Software Versions¶

Datasets¶

Primary Test Files¶

Data Sources¶

Feature Profile¶

Extraction Configuration¶

Default Configuration¶

CLI Equivalent¶

Measurement Protocol¶

Metrics Collected¶

Measurement Procedure¶

Reproducibility¶

Random Seeds¶

Configuration Hash¶

Provenance Sidecar¶

Running Benchmarks¶

Full Benchmark Suite¶

Individual Configuration¶

Comparison Methodology¶

Fair Comparison¶

NFStream Comparison Settings¶

Results Reporting¶

Required Information¶

Example Citation¶

Known Limitations¶

Platform-Specific Notes¶

Comparison Caveats¶