Skip to content

JoyfulJay Documentation

JoyfulJay Logo

Extract ML-Ready Features from Encrypted Network Traffic

Analyze TLS, QUIC, SSH, and more - without decryption

Python 3.10+ License: MIT

JoyfulJay ML Ready Encrypted Traffic Research Tool

Quick Start | Installation | Features | Tutorials


The Problem: Encrypted Traffic is Hard to Analyze

Over 95% of web traffic is now encrypted with TLS. Traditional deep packet inspection can't see inside encrypted connections, yet you still need to:

  • Classify traffic (streaming vs. browsing vs. malicious)
  • Detect threats (malware C2, data exfiltration, policy violations)
  • Identify applications (what's running on your network?)
  • Detect anonymization (Tor, VPNs, DNS-over-HTTPS)

The solution? Analyze the behavior of encrypted traffic - timing patterns, packet sizes, protocol handshakes - without ever needing to decrypt it.


How JoyfulJay Works

JoyfulJay extracts 200+ behavioral features from network flows that reveal traffic characteristics without exposing content:

Python
import joyfuljay as jj

# Extract features from encrypted traffic
df = jj.extract("https_traffic.pcap")

# You now have ML-ready features like:
# - Timing: iat_mean, iat_std, burstiness_index
# - Size: pkt_len_mean, pkt_len_std, byte_asymmetry
# - TLS: ja3_hash, tls_sni, tls_cipher_count
# - Detection: likely_tor, vpn_type, likely_doh

These features can distinguish a Tor connection from a VPN connection from regular HTTPS - all without seeing the encrypted payload.


What Can You Do With JoyfulJay?

Traffic Classification

Train ML models to identify application types (Netflix vs. YouTube vs. Zoom) based on behavioral patterns.

Threat Detection

Detect malware command-and-control traffic, data exfiltration attempts, and suspicious communication patterns.

Network Forensics

Analyze captured traffic to understand what happened during a security incident.

Privacy Research

Study anonymization tools and encrypted DNS to understand their fingerprints.

QoS Monitoring

Classify traffic for quality-of-service policies without inspecting content.


Quick Start

Installation

Bash
pip install joyfuljay

Extract Features from PCAP

Python
import joyfuljay as jj

# Extract all features (200+)
df = jj.extract("traffic.pcap")
print(f"Extracted {len(df)} flows with {len(df.columns)} features")

# Select specific feature groups
df = jj.extract("traffic.pcap", features=["timing", "tls", "fingerprint"])

# Save for ML training
df.to_csv("features.csv", index=False)

Command Line Interface

Bash
# Extract to CSV
jj extract capture.pcap -o features.csv

# Select features
jj extract capture.pcap --features timing tls -o features.csv

# Live capture from network interface
jj live eth0 --duration 60 -o live.csv

# View available features
jj features

Detect Tor Traffic

Python
import joyfuljay as jj

df = jj.extract("capture.pcap", features=["fingerprint", "padding"])

# Find Tor flows
tor_flows = df[df['likely_tor'] == True]
print(f"Detected {len(tor_flows)} potential Tor connections")

# Check confidence scores
print(tor_flows[['src_ip', 'dst_ip', 'tor_confidence', 'padding_score']])

Train a Traffic Classifier

Python
import joyfuljay as jj
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Extract features
df = jj.extract("labeled_traffic.pcap", features=["timing", "size", "tls"])

# Prepare for ML (numeric features only)
X = df.select_dtypes(include=['number']).fillna(0)
y = df['label']  # Assuming you have labels

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train classifier
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)

# Evaluate
print(f"Accuracy: {clf.score(X_test, y_test):.2%}")

Key Capabilities

Capability Description
200+ Features Timing, size, TLS, QUIC, SSH, DNS, TCP, entropy, padding, graph metrics
12x Faster Process 10+ GB/s with optimized DPKT backend
Protocol Analysis Parse TLS handshakes, extract JA3/JA3S fingerprints, SNI, certificates
Traffic Fingerprinting Detect Tor, VPN (WireGuard, OpenVPN, IPSec), DNS-over-HTTPS
Enterprise Ready Kafka streaming, Prometheus metrics, parallel processing
Privacy Preserving Anonymize IPs, exclude identifiers for research datasets

Feature Groups

JoyfulJay organizes 200+ features into logical groups:

Group Description Key Features
flow_meta Flow identification src_ip, dst_ip, duration, packet counts
timing Temporal patterns iat_mean, iat_std, burstiness, burst_count
size Packet sizes pkt_len_mean, pkt_len_std, byte_asymmetry
tls TLS analysis ja3_hash, tls_sni, tls_version, cipher_count
quic QUIC protocol quic_version, quic_alpn, quic_sni
ssh SSH fingerprinting ssh_hassh, ssh_version, ssh_software
tcp TCP behavior syn_count, rst_count, handshake_complete
fingerprint Traffic detection likely_tor, likely_vpn, vpn_type, likely_doh
entropy Payload analysis entropy_payload, printable_ratio
padding Obfuscation detection is_constant_size, padding_score
connection Graph analysis unique_dsts, betweenness, community_id

Select groups when extracting:

Python
# Only what you need (faster)
df = jj.extract("capture.pcap", features=["timing", "tls"])

# Everything (comprehensive)
df = jj.extract("capture.pcap", features=["all"])

See Complete Feature Reference for detailed documentation of every feature.


Documentation Guide

Getting Started

Document Description
Installation Install JoyfulJay and optional dependencies
Quick Start Your first feature extraction in 5 minutes
Why JoyfulJay? Comparison with alternatives

Core Concepts

Document Description
Features Reference Complete guide to all 200+ features
Configuration All configuration options explained
CLI Reference Command-line interface
Architecture System design and data flow

Feature Extractors

Document Description
Extractors Overview All extractors at a glance
Flow Metadata Basic flow identification
Timing Inter-arrival times and bursts
Size Packet size analysis
TLS TLS/SSL analysis and JA3
Fingerprint Tor/VPN/DoH detection

Tutorials

Tutorial Level Description
Traffic Classification Beginner Train ML models on network traffic
Encrypted Traffic Analysis Beginner Detect Tor, VPN, DoH
Batch Processing Intermediate Process large datasets
Real-time Monitoring Advanced Kafka + Prometheus + Grafana
Custom Extractors Advanced Create your own extractors

Advanced

Document Description
Kafka Streaming Real-time feature pipelines
Prometheus Monitoring Metrics and dashboards
Remote Capture Distributed packet capture
Benchmarks Performance data

Development

Document Description
Developer Guide Contribute to JoyfulJay
Testing Run tests, add coverage
API Reference Python API documentation

System Requirements

  • Python: 3.10, 3.11, or 3.12
  • OS: Linux, macOS, Windows
  • Dependencies: Auto-installed (scapy, pandas, numpy)

Optional Extras

Bash
# Fast PCAP parsing (recommended for large files)
pip install joyfuljay[fast]

# Kafka streaming output
pip install joyfuljay[kafka]

# Prometheus metrics
pip install joyfuljay[monitoring]

# Connection graph analysis
pip install joyfuljay[graphs]

# Everything
pip install joyfuljay[all]

Example: Complete Traffic Analysis

Python
import joyfuljay as jj

# Configure for comprehensive analysis
config = jj.Config(
    features=["flow_meta", "timing", "size", "tls", "fingerprint"],
    include_raw_sequences=True,
    max_sequence_length=100,
)

# Create pipeline
pipeline = jj.Pipeline(config)

# Process PCAP
df = pipeline.process_pcap("enterprise_traffic.pcap")

# Analyze results
print(f"Total flows: {len(df)}")
print(f"TLS flows: {df['tls_detected'].sum()}")
print(f"Tor flows: {df['likely_tor'].sum()}")
print(f"VPN flows: {df['likely_vpn'].sum()}")

# Top destinations by volume
top_dsts = df.groupby('dst_ip')['total_bytes'].sum().nlargest(10)
print("Top destinations:")
print(top_dsts)

# Export for ML
df.to_parquet("features.parquet")

Getting Help

License

MIT License - see LICENSE