Quick Start Guide¶

Get started with JoyfulJay in 5 minutes.

Installation¶

Bash

pip install joyfuljay

Verify installation:

Bash

jj status

Your First Extraction¶

Python¶

Python

import joyfuljay as jj

# Extract features from a PCAP file
df = jj.extract("capture.pcap")

# View results
print(f"Flows: {len(df)}")
print(f"Features: {len(df.columns)}")
print(df.head())

Command Line¶

Bash

# Extract to CSV
jj extract capture.pcap -o features.csv

# View PCAP info
jj info capture.pcap

Select Feature Groups¶

Don't need all 387 features? Select specific groups:

Python

import joyfuljay as jj

# Only timing and TLS features
df = jj.extract("capture.pcap", features=["timing", "tls"])

# Only flow metadata
df = jj.extract("capture.pcap", features=["flow_meta"])

Available groups:

Group	Description
`flow_meta`	IPs, ports, duration, packet counts
`timing`	Inter-arrival times, burst metrics
`size`	Packet length statistics
`tls`	TLS version, ciphers, JA3 fingerprints
`quic`	QUIC protocol metadata
`ssh`	SSH version, HASSH fingerprints
`dns`	DNS queries and responses
`tcp`	TCP flags, handshake analysis
`fingerprint`	Tor/VPN/DoH detection
`entropy`	Payload entropy
`padding`	Padding detection

Live Capture¶

Capture traffic from a network interface:

Python¶

Python

import joyfuljay as jj

# Capture for 30 seconds
df = jj.extract_live("eth0", duration=30)
print(f"Captured {len(df)} flows")

Command Line¶

Bash

# Capture with BPF filter
jj live eth0 --duration 60 --filter "port 443" -o tls_traffic.csv

Note: Live capture requires root/admin privileges.

Configure Extraction¶

Use Config for fine-grained control:

Python

import joyfuljay as jj

# Create custom configuration
config = jj.Config(
    features=["timing", "tls", "fingerprint"],
    flow_timeout=30.0,          # 30 second flow timeout
    include_ip_addresses=True,  # Include IPs in output
    anonymize_ips=False,        # Don't hash IPs
)

# Create pipeline
pipeline = jj.Pipeline(config)

# Process PCAP
df = pipeline.process_pcap("capture.pcap")

# Or iterate over flows (memory efficient)
for features in pipeline.iter_features("large_capture.pcap"):
    print(features)

Output Formats¶

DataFrame (Default)¶

Python

df = jj.extract("capture.pcap")
# Returns pandas DataFrame

NumPy Array¶

Python

config = jj.Config(features=["timing", "size"])
pipeline = jj.Pipeline(config)
array = pipeline.process_pcap("capture.pcap", output_format="numpy")
# Returns numpy array with numeric features only

CSV File¶

Bash

jj extract capture.pcap -o features.csv

JSON File¶

Bash

jj extract capture.pcap -o features.json --format json

Use with Machine Learning¶

JoyfulJay output is designed for ML:

Python

import joyfuljay as jj
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Extract features
df = jj.extract("labeled_traffic.pcap", features=["timing", "size"])

# Prepare data
X = df.select_dtypes(include=['number']).fillna(0)
y = df['label']  # Your labels

# Train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Evaluate
print(f"Accuracy: {clf.score(X_test, y_test):.2%}")

Detect Encrypted Traffic Types¶

Use the fingerprint extractor:

Python

import joyfuljay as jj

df = jj.extract("capture.pcap", features=["fingerprint"])

# Check for Tor
tor_flows = df[df['likely_tor'] == True]
print(f"Tor flows: {len(tor_flows)}")

# Check for VPN
vpn_flows = df[df['likely_vpn'] == True]
print(f"VPN flows: {len(vpn_flows)}")

# Check for DoH
doh_flows = df[df['likely_doh'] == True]
print(f"DoH flows: {len(doh_flows)}")

Process Multiple Files¶

Python

import joyfuljay as jj
from pathlib import Path

# Create pipeline once
config = jj.Config(features=["timing", "tls"])
pipeline = jj.Pipeline(config)

# Process multiple files
pcap_dir = Path("./captures")
all_features = []

for pcap_file in pcap_dir.glob("*.pcap"):
    df = pipeline.process_pcap(str(pcap_file))
    df['source_file'] = pcap_file.name
    all_features.append(df)

# Combine
import pandas as pd
combined = pd.concat(all_features, ignore_index=True)
combined.to_csv("all_features.csv", index=False)

Command Line Cheat Sheet¶

Bash

# Basic extraction
jj extract capture.pcap -o features.csv

# Select features
jj extract capture.pcap -o features.csv --features timing tls

# Live capture
jj live eth0 --duration 60 -o live.csv

# With BPF filter
jj live eth0 --filter "tcp port 443" -o tls.csv

# View PCAP info
jj info capture.pcap

# Check system status
jj status

# List available features
jj features

# Export feature schema
jj schema -o schema.json

# Start remote capture server
jj serve eth0 --port 8765

# Discover remote servers
jj discover

# Connect to remote server
jj connect jj://192.168.1.100:8765 -o features.csv

Next Steps¶

Configuration - All configuration options
Features - Complete feature reference
CLI Reference - All CLI commands
Extractors - Extractor documentation
Remote Capture - Distributed capture setup