High-Precision Fractional Bucketing for Sub-Percent Traffic Allocation

This ADR proposes enhancing the fractional operation to support high-precision traffic allocation down to 0.001% granularity by increasing the internal bucket count from 100 to 100,000 while maintaining the existing weight-based API.

Background

The current fractional operation in flagd uses a 100-bucket system that maps hash values to percentages in the range [0, 100]. This approach works well for most use cases but has significant limitations in high-throughput environments where precise sub-percent traffic allocation is required.

Currently, the smallest allocation possible is 1%, which is insufficient for:

Gradual rollouts in ultra-high-traffic systems where 1% could represent millions of users
A/B testing scenarios requiring precise control over small experimental groups
Canary deployments where operators need to start with very small traffic percentages (e.g., 0.1% or 0.01%)

The current implementation in fractional.go calculates bucket assignment using:

bucket := hashRatio * 100 // in range [0, 100]

This limits granularity to 1% increments, making it impossible to achieve the precision required for sophisticated traffic management strategies.

Requirements

Support traffic allocation precision down to 0.001% (3 decimal places)
Maintain backwards compatibility with existing weight-based API
Preserve deterministic bucketing behavior (same hash input always produces same bucket)
Ensure consistent bucket assignment across different programming languages
Support weight values up to a reasonable maximum that works across multiple languages
Maintain current performance characteristics
Prevent users from being moved between buckets when only distribution percentages change
Guarantee that any variant with weight > 0 receives some traffic allocation
Handle edge cases gracefully without silent failures
Validate weight configurations and provide clear error messages for invalid inputs

Considered Options

Option 1: 10,000 buckets (0.01% precision) - 1 in every 10,000 users, better but still not sufficient for many high-throughput use cases
Option 2: 100,000 buckets (0.001% precision) - 1 in every 100,000 users, meets most high-precision needs
Option 3: 1,000,000 buckets (0.0001% precision) - 1 in every 1,000,000 users, likely overkill and could impact performance

Proposal

Implement a 100,000-bucket system that provides 0.001% precision while maintaining the existing integer weight-based API.

API changes

No API changes are required. The existing fractional operation syntax remains unchanged:

"fractional": [
  { "cat": [{ "var": "$flagd.flagKey" }, { "var": "email" }] },
  ["red", 50],
  ["blue", 30], 
  ["green", 20]
]

Implementation Changes

Bucket Count: Change from 100 to 100,000 buckets by modifying bucket calculation from hashRatio * 100 to hashRatio * 100000
Minimum Allocation Guarantee: Any variant with weight > 0 receives at least 1 bucket (0.001%)
Excess Bucket Handling: Remove excess buckets from the largest variant to maintain exactly 100,000 total buckets
Weight Sum Validation: Reject configurations where total weight exceeds maximum safe integer value
Maximum Weight Sum: Use language-specific maximum 32-bit signed integer constants for cross-platform compatibility

Minimum Allocation Guarantee

To prevent silent configuration failures, any variant with a positive weight will receive at least 0.001% allocation (1 bucket), even if the calculated percentage would round to zero. This ensures predictable behavior where positive weights always result in some traffic allocation.

Example: Configuration ["variant-a", 1], ["variant-b", 1000000]

Without guarantee: variant-a gets 0% (never selected)
With guarantee: variant-a gets 0.001%, variant-b gets 99.999%

Excess Bucket Management

When minimum allocations cause the total to exceed 100,000 buckets, excess buckets are removed from the variant with the largest allocation. This approach:

Maintains the minimum guarantee for small variants
Has minimal impact on large variants (small relative reduction)
Preserves deterministic behavior
Prevents bucket count overflow

Weight Sum Validation

When the total weight sum exceeds the maximum safe integer value, the fractional evaluation will return a validation error with a clear message. This prevents integer overflow issues and provides immediate feedback to users about invalid configurations.

import "math"

func validateWeightSum(variants []fractionalEvaluationVariant) error {
    var totalWeight int64 = 0
    for _, variant := range variants {
        totalWeight += int64(variant.weight)
        if totalWeight > math.MaxInt32 {
            return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", 
                totalWeight, math.MaxInt32)
        }
    }
    return nil
}

Implementations should prefer built-in language constants (e.g., math.MaxInt32 in Go, Integer.MAX_VALUE in Java, int.MaxValue in C#) rather than hardcoded values to ensure maintainability and clarity.

Edge Case Handling

The implementation addresses several edge cases:

All weights are 0: Returns empty string (maintains current behavior)
Negative weights: Treated as 0 (maintains current validation behavior)
Single variant: Receives all 100,000 buckets regardless of weight value
Empty variants: Returns error (maintains current validation behavior)
Weight sum overflow: Returns validation error with clear message
Multiple variants with minimum allocation: Excess distributed fairly among largest variants

Maximum Weight Considerations

To ensure cross-language compatibility, we establish a maximum total weight sum equal to the maximum 32-bit signed integer value (2,147,483,647). This limit:

Works reliably across all target languages (Go, Java, .NET, JavaScript, Python)
Provides more than sufficient range for any practical use case
Prevents integer overflow issues in 32-bit signed integer systems
Allows for extremely fine-grained control (individual weights can be 1 out of 2+ billion)
Uses language-native constants for better maintainability

Code Changes

The following shows how the core logic in fractional.go would be modified.

const bucketCount = 100000

// bucketAllocation represents the number of buckets allocated to a variant
type bucketAllocation struct {
    variant string
    buckets int
}

func (fe *Fractional) Evaluate(values, data any) any {
    valueToDistribute, feDistributions, err := parseFractionalEvaluationData(values, data)
    if err != nil {
        fe.Logger.Warn(fmt.Sprintf("parse fractional evaluation data: %v", err))
        return nil
    }

    if err := validateWeightSum(feDistributions.weightedVariants); err != nil {
        fe.Logger.Warn(fmt.Sprintf("weight validation failed: %v", err))
        return nil
    }

    return distributeValue(valueToDistribute, feDistributions)
}

func validateWeightSum(variants []fractionalEvaluationVariant) error {
    var totalWeight int64 = 0
    for _, variant := range variants {
        totalWeight += int64(variant.weight)
        if totalWeight > math.MaxInt32 {
            return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", 
                totalWeight, math.MaxInt32)
        }
    }
    return nil
}

func calculateBucketAllocations(variants []fractionalEvaluationVariant, totalWeight int) []bucketAllocation {
    allocations := make([]bucketAllocation, len(variants))
    totalAllocated := 0

    // Calculate initial allocations
    for i, variant := range variants {
        if variant.weight == 0 {
            allocations[i] = bucketAllocation{variant: variant.variant, buckets: 0}
        } else {
            // Calculate proportional allocation
            proportional := int((int64(variant.weight) * bucketCount) / int64(totalWeight))
            // Ensure minimum allocation of 1 bucket for any positive weight
            buckets := max(1, proportional)
            allocations[i] = bucketAllocation{variant: variant.variant, buckets: buckets}
        }
        totalAllocated += allocations[i].buckets
    }

    // Handle excess buckets by removing from largest allocation
    excess := totalAllocated - bucketCount
    if excess > 0 {
        // Sort indices by bucket count (descending) to find largest allocation
        indices := make([]int, len(allocations))
        for i := range indices {
            indices[i] = i
        }
        sort.Slice(indices, func(i, j int) bool {
            if allocations[indices[i]].buckets == allocations[indices[j]].buckets {
                return allocations[indices[i]].variant < allocations[indices[j]].variant // Tie-break by variant name
            }
            return allocations[indices[i]].buckets > allocations[indices[j]].buckets
        })

        // Remove excess from largest allocation, respecting minimum guarantee
        for _, idx := range indices {
            if excess <= 0 {
                break
            }

            // Don't reduce below 1 bucket if original weight > 0
            minAllowed := 0
            if variants[idx].weight > 0 {
                minAllowed = 1
            }

            canRemove := allocations[idx].buckets - minAllowed
            toRemove := min(excess, canRemove)
            allocations[idx].buckets -= toRemove
            excess -= toRemove
        }
    }

    return allocations
}

5. Replace the distribution logic:

func distributeValue(value string, feDistribution *fractionalEvaluationDistribution) string {
    if feDistribution.totalWeight == 0 {
        return ""
    }

    allocations := calculateBucketAllocations(feDistribution.weightedVariants, feDistribution.totalWeight)

    hashValue := int32(murmur3.StringSum32(value))
    hashRatio := math.Abs(float64(hashValue)) / math.MaxInt32
    bucket := int(hashRatio * bucketCount) // in range [0, bucketCount)

    currentBucket := 0
    for _, allocation := range allocations {
        currentBucket += allocation.buckets
        if bucket < currentBucket {
            return allocation.variant
        }
    }

    return ""
}

Consequences

Good, because it enables precise traffic control for high-throughput environments
Good, because it matches industry-standard precision offered by leading vendors
Good, because it maintains API backwards compatibility
Good, because integer weights remain simple to understand and configure
Good, because it prevents silent configuration failures through minimum allocation guarantee
Good, because excess handling is predictable and fair
Good, because weight validation provides clear error messages for invalid configurations
Bad, because it represents a behavioral breaking change for existing configurations
Bad, because it slightly increases memory usage for bucket calculations
Bad, because actual percentages may differ slightly from configured weights due to minimum allocations

Implementation Plan

Update flagd-testbed with comprehensive test cases for high-precision fractional bucketing across all evaluation modes
Implement core logic in flagd to support 100,000-bucket system with minimum allocation guarantee and excess handling
Update flagd providers to ensure consistent behavior and testing across language implementations
Documentation updates, migration guides, and example configurations to demonstrate the new precision capabilities