Skip to content

Compression

Compression settings and performance tuning for event archival.

Related: Archival - Setup, commands, and troubleshooting

Compression Levels

LevelCompressionSpeedUse Case
0NoneFastestTesting
1-3LowFastReal-time archival
4-6MediumBalancedMost use cases
7-9HighSlowStorage-constrained

Level 6 (default) provides a good balance of compression and speed.

Configuration

php
// config/machine.php
'archival' => [
    'level' => env('MACHINE_EVENTS_COMPRESSION_LEVEL', 6),
    'threshold' => env('MACHINE_EVENTS_ARCHIVAL_THRESHOLD', 1000),
],
OptionDefaultDescription
level6gzip compression level (0-9)
threshold1000Minimum bytes before applying compression

Level Selection Guide

ScenarioRecommended LevelRationale
Real-time archival1-3Speed matters more than size
Nightly batch jobs6Good balance (default)
Storage-constrained9Maximum compression
SSD storage4-6Fast I/O compensates
HDD storage7-9Minimize disk usage

Performance Tuning

Dispatch Limit Optimization

The dispatch_limit controls how many workflows (unique root_event_ids) are found and dispatched per scheduler run:

php
// Conservative - fewer jobs per run, more runs
'dispatch_limit' => 25,   // Memory-constrained environments

// Aggressive - more jobs per run, faster archival
'dispatch_limit' => 200,  // High-capacity environments

Throughput estimation:

  • dispatch_limit × runs_per_hour × workers = workflows/hour
  • Example: 50 × 12 × 4 = 2400 workflows/hour

Index Strategy

Ensure these indexes exist for optimal performance:

sql
-- For finding archival candidates
CREATE INDEX idx_machine_events_last_activity
ON machine_events (root_event_id, created_at);

-- For archive lookups
CREATE INDEX idx_archives_machine_archived
ON machine_event_archives (machine_id, archived_at);

Query Optimization for Large Tables

php
// Instead of loading all eligible machines at once
$service = new ArchiveService();

// Process in chunks
$eligible = $service->getEligibleInstances(limit: 100);

while ($eligible->isNotEmpty()) {
    $rootIds = $eligible->pluck('root_event_id')->toArray();
    $service->batchArchive($rootIds);

    $eligible = $service->getEligibleInstances(limit: 100);
}

Storage Estimation

Calculate Current Storage

php
use Tarfinlabs\EventMachine\Models\MachineEvent;
use Tarfinlabs\EventMachine\Support\CompressionManager;

// Sample 100 machines for estimation
$samples = MachineEvent::selectRaw('root_event_id, COUNT(*) as count')
    ->groupBy('root_event_id')
    ->limit(100)
    ->get();

$totalOriginal = 0;
$totalCompressed = 0;

foreach ($samples as $sample) {
    $events = MachineEvent::where('root_event_id', $sample->root_event_id)
        ->get()
        ->toArray();

    $jsonData = json_encode($events);
    $originalSize = strlen($jsonData);
    $totalOriginal += $originalSize;

    if (CompressionManager::shouldCompress($jsonData)) {
        $compressed = CompressionManager::compressJson($jsonData);
        $totalCompressed += strlen($compressed);
    } else {
        $totalCompressed += $originalSize;
    }
}

$ratio = $totalCompressed / $totalOriginal;
echo "Estimated compression ratio: " . round($ratio * 100) . "%";
echo "Potential savings: " . round((1 - $ratio) * 100) . "%";

Project Future Storage

php
// Current stats
$currentEvents = MachineEvent::count();
$avgEventSize = 500; // bytes, estimate from sampling
$growthRate = 10000; // events per day

// Project 30 days
$futureEvents = $currentEvents + ($growthRate * 30);
$uncompressedSize = $futureEvents * $avgEventSize;
$compressedSize = $uncompressedSize * 0.15; // 85% compression

echo "30-day projection without archival: " .
    round($uncompressedSize / 1024 / 1024 / 1024, 2) . " GB";
echo "30-day projection with archival: " .
    round($compressedSize / 1024 / 1024 / 1024, 2) . " GB";

Compression Statistics

Query archive statistics to monitor compression effectiveness:

php
use Tarfinlabs\EventMachine\Models\MachineEventArchive;

// Average compression ratio by machine type
$stats = MachineEventArchive::selectRaw('
    machine_id,
    AVG(compressed_size::float / original_size) as avg_ratio,
    SUM(original_size) as total_original,
    SUM(compressed_size) as total_compressed
')
    ->groupBy('machine_id')
    ->get();

foreach ($stats as $stat) {
    echo "{$stat->machine_id}: " . round((1 - $stat->avg_ratio) * 100) . "% savings\n";
}

Adjusting Compression for Specific Machines

If certain machine types compress poorly or are frequently accessed:

php
use Tarfinlabs\EventMachine\Services\ArchiveService;

// Custom configuration for specific machine
$service = new ArchiveService([
    'level' => 3,  // Lower compression for frequently restored machines
]);

$service->archiveMachine($rootEventId);

CPU vs Storage Trade-offs

ScenarioRecommendation
Many small eventsLevel 1-3 (overhead not worth it)
Few large eventsLevel 7-9 (compression pays off)
High restore frequencyLevel 1-3 (decompress faster)
Archive rarely accessedLevel 7-9 (storage savings)
Limited CPULevel 1-3
Limited storageLevel 7-9

Released under the MIT License.