Microsoft Resiliency Extensions and Polly Part 3 - Reactive Strategies

February 10, 2026#Software Development

When building modern applications, dealing with failures in external dependencies is inevitable. Network issues, service outages, temporary overloads—these problems are part of the distributed systems landscape. The key isn’t preventing these failures (you can’t), but rather handling them gracefully. Polly provides several reactive resilience strategies that respond to failures as they occur. In this post, we’ll explore four essential reactive strategies: Retry, Circuit Breaker, Fallback, and Hedging. Each addresses different failure scenarios and can be combined to create robust resilience pipelines.

Understanding Reactive vs Proactive Strategies

Before diving in, it’s important to understand what makes a strategy “reactive.” Reactive strategies respond to failures after they occur. They detect problems and take corrective action. This contrasts with proactive strategies like Rate Limiting and Timeout, which prevent problems before they happen by constraining resource usage or limiting execution time.

The Retry Strategy

The Retry strategy is the most straightforward resilience pattern: if something fails, try it again. This works well for transient failures—temporary problems that resolve themselves quickly, like brief network hiccups or momentary service unavailability. When to Use Retry

Retry is ideal for:

Transient network failures
Temporary service unavailability (503 Service Unavailable)
Database deadlocks or connection timeouts
Any failure that’s likely to succeed if you just try again

Basic Retry Configuration

builder.Services.AddResiliencePipeline("retry-pipeline", pipelineBuilder =>
{
    pipelineBuilder.AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(2),
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true
    });
});

This configuration will retry up to 3 times. The backoff type setting causes the delay between requests to increase exponentaily. Intially there will be a delay of 2 seconds before the first retry, 4 seconds before the second, and 8 seconds before the third. A random jitter to prevent thundering herd problems

Handling Specific Exceptions

Not all failures are worth retrying. Some errors, like authentication failures or bad requests, won’t be resolved by simply trying again. That’s why Polly allows you to configure exactly which failures should trigger a retry through the ShouldHandle predicate.

The PredicateBuilder provides a fluent API for defining retry conditions. You can chain multiple exception types together, or even combine exception handling with result inspection. This gives you fine-grained control over your retry logic.

For example, you might want to retry on transient network errors HttpRequestException or timeouts TimeoutException, but you’d also want to retry when a service returns a 503 Service Unavailable status—indicating temporary overload. Here’s how you configure that:

pipelineBuilder.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    ShouldHandle = new PredicateBuilder()
        .Handle<HttpRequestException>()
        .Handle<TimeoutException>()
        .HandleResult<HttpResponseMessage>(response => 
            response.StatusCode == HttpStatusCode.ServiceUnavailable)
});

The Danger of Naive Retries

While retries are powerful, they must be implemented carefully. Retrying too aggressively can actually make problems worse. When a service is already struggling under heavy load, a flood of retry attempts can overwhelm it further, preventing it from recovering. This can trigger cascading failures that ripple through your entire system.

One particularly problematic scenario is the “thundering herd” problem. Imagine hundreds or thousands of clients all experiencing the same failure at the same moment. If they all immediately retry, and then retry again at the same intervals, you’ve effectively amplified the load on the struggling service rather than giving it room to recover.

This is why exponential backoff and jitter are critical components of a well-designed retry strategy. Exponential backoff increases the delay between successive retry attempts, while jitter adds randomness to those delays, spreading out retry attempts across time. These techniques prevent clients from synchronizing their retries and give downstream services breathing room to stabilize.

For even better protection, combining a Retry strategy with a Circuit Breaker creates a robust defense mechanism. The circuit breaker can detect when a service is persistently failing and stop sending requests altogether, preventing your retry logic from contributing to the problem.

The Circuit Breaker Strategy

The Circuit Breaker pattern prevents your application from repeatedly trying operations that are likely to fail. It “opens” (stops executing requests) when failure rates exceed a threshold, giving the failing service time to recover.

When to Use Circuit Breaker:

Calling external services that might be down
You want to fail fast instead of waiting for timeouts
Protecting downstream services from overload
Preventing cascading failures in microservices

A circuit breaker operates through three distinct states. In the Closed state, the circuit breaker allows requests to pass through normally while monitoring for failures. When the failure threshold is exceeded, it transitions to the Open state, where all requests fail immediately without even attempting to call the downstream service. This prevents additional load on an already struggling service and gives it time to recover. After a configured delay period, the circuit breaker moves to the Half-Open state, where it allows a limited number of test requests through to determine if the service has recovered. If these test requests succeed, the circuit breaker returns to the Closed state and resumes normal operation. However, if the test requests fail, it immediately returns to the Open state to continue protecting the system.

State	Behavior	Transition
Closed	Requests pass through normally	Opens when failure threshold is exceeded
Open	Requests fail immediately without attempting	Transitions to Half-Open after a delay
Half-Open	Allows a limited number of test requests	Closes if tests succeed, reopens if they fail

Basic Circuit Breaker Configuration

pipelineBuilder.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    FailureRatio = 0.5,
    SamplingDuration = TimeSpan.FromSeconds(30),
    MinimumThroughput = 10,
    BreakDuration = TimeSpan.FromSeconds(30),
    ShouldHandle = new PredicateBuilder()
        .Handle<HttpRequestException>()
        .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
});

This circuit breaker monitors incoming requests and opens when 50% of them fail within a 30-second sampling window. To avoid premature triggering on low traffic, it requires at least 10 requests before evaluating the failure ratio. Once opened, the circuit breaker remains in that state for 30 seconds, blocking all requests to give the downstream service time to recover. After this break duration expires, it transitions to a half-open state to test if the service has stabilized. The circuit breaker is configured to handle both HTTP request exceptions and responses with non-success status codes, treating either condition as a failure that counts toward the threshold.

Monitoring Circuit Breaker State You can also monitor changes in the circuit’s state.

pipelineBuilder.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    // ... configuration ...
    OnOpened = args =>
    {
        logger.LogWarning("Circuit breaker opened for {BreakDuration}", 
            args.BreakDuration);
        return ValueTask.CompletedTask;
    },
    OnClosed = args =>
    {
        logger.LogInformation("Circuit breaker closed");
        return ValueTask.CompletedTask;
    },
    OnHalfOpened = args =>
    {
        logger.LogInformation("Circuit breaker testing recovery");
        return ValueTask.CompletedTask;
    }
});

The Fallback Strategy

Imagine your e-commerce site relies on a product recommendation service to show personalized suggestions to customers. When that service goes down, you have a choice: display an error message and lose potential sales, or gracefully fall back to showing your bestselling products instead. The Fallback strategy enables the second option, allowing your application to provide degraded but functional service rather than complete failure.

The Fallback strategy provides an alternative when the primary operation fails. Instead of propagating errors to the caller, you return a default value, call a backup service, or return cached data. This keeps your application running and maintains a positive user experience even when dependencies fail.

When to Use Fallback

Fallback is perfect for:

Providing degraded functionality when services are unavailable
Returning cached or stale data
Using default values for non-critical operations
Switching to a backup service

Fallback shines in scenarios where some data is better than no data, or where alternative sources can temporarily substitute for your primary service. Consider using fallback when you need to provide degraded functionality during service outages—for example, showing static content when your content management system is down, or displaying last week’s product catalog when your inventory service is unavailable.

Returning cached or stale data is one of the most common fallback patterns. A news site might show articles from its cache when the database is unreachable, or a weather app might display the last successful forecast when the weather API fails. While this data isn’t current, it’s often better than showing nothing at all.

For non-critical operations, using default values can maintain functionality without disrupting the user experience. If a personalization service fails, falling back to a generic homepage keeps users engaged rather than blocking them with an error. Similarly, when you have redundancy built into your architecture, fallback enables automatic switching to backup services—like routing to a secondary data center when the primary one becomes unavailable.

Basic Fallback Configuration

pipelineBuilder.AddFallback(new FallbackStrategyOptions<WeatherForecast>
{
    ShouldHandle = new PredicateBuilder<WeatherForecast>()
        .Handle<HttpRequestException>()
        .Handle<BrokenCircuitException>(),
    FallbackAction = args =>
    {
        // Return cached or default data
        var cachedForecast = cache.Get<WeatherForecast>("weather");
        return Outcome.FromResult(cachedForecast ?? GetDefaultForecast());
    }
});

Fallback to Alternative Service

pipelineBuilder.AddFallback(new FallbackStrategyOptions<ProductData>
{
    ShouldHandle = new PredicateBuilder<ProductData>()
        .Handle<HttpRequestException>(),
    FallbackAction = async args =>
    {
        // Try backup service
        var backupClient = args.Context.ServiceProvider
            .GetRequiredService<IBackupProductService>();
        var result = await backupClient.GetProductAsync(args.Context);
        return Outcome.FromResult(result);
    }
});

Informing Users of Degraded Service

FallbackAction = args =>
{
    var response = GetCachedData();
    response.IsCached = true;
    response.CacheAge = DateTime.UtcNow - response.CachedAt;
    return Outcome.FromResult(response);
}

The Hedging Strategy

Hedging (also called “parallel requests” or “backup requests”) executes multiple requests in parallel and uses the first successful response. This strategy addresses a common challenge in distributed systems: unpredictable latency. Even when services are healthy, individual requests can experience unexpected delays due to factors like garbage collection pauses, network congestion, or thread pool exhaustion on the server.

The hedging strategy works by sending an initial request and, if it doesn’t complete within a specified time window, launches additional parallel requests to the same endpoint (or alternative endpoints). Whichever request completes first wins, and the others are canceled. This approach dramatically reduces tail latency. Those frustratingly slow requests that fall in the 95th or 99th percentile of response times.

Consider a search service where most queries return in 100ms, but 5% take over 2 seconds due to cache misses or complex queries. Without hedging, users occasionally experience those painful 2-second delays. With hedging configured to send a second request after 200ms, you can cut those worst-case scenarios significantly. If the first request is slow, the second one likely won’t encounter the same bottleneck and will return quickly.

This strategy is particularly valuable for read operations against replicated data stores, time-sensitive APIs where user experience depends on fast responses, and systems where predictable performance is more important than minimizing resource usage. However, hedging does come with a trade-off: you’re increasing load on downstream services by making redundant requests, so it should be used thoughtfully and typically only for the slowest percentile of requests.

When to Use Hedging

Hedging is valuable when:

Tail latencies are a problem (some requests are much slower than others)
You have multiple equivalent service endpoints
Latency is more important than resource consumption
Idempotent operations where duplicate requests are safe

Basic Hedging Configuration

pipelineBuilder.AddHedging(new HedgingStrategyOptions<HttpResponseMessage>
{
    MaxHedgedAttempts = 2,
    Delay = TimeSpan.FromSeconds(1),
    ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
        .HandleResult(r => !r.IsSuccessStatusCode)
});

Combining Strategies

The real power of Polly comes from combining strategies. Here’s a complete resilience pipeline using all four reactive strategies:

builder.Services.AddResiliencePipeline("comprehensive-pipeline", pipelineBuilder =>
{
    pipelineBuilder
        // Fallback provides the outer safety net
        .AddFallback(new FallbackStrategyOptions<WeatherForecast>
        {
            ShouldHandle = new PredicateBuilder<WeatherForecast>()
                .Handle<BrokenCircuitException>()
                .Handle<HttpRequestException>(),
            FallbackAction = args => Outcome.FromResult(GetCachedWeather())
        })
        // Circuit breaker prevents overwhelming failing services
        .AddCircuitBreaker(new CircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            SamplingDuration = TimeSpan.FromSeconds(30),
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(30)
        })
        // Retry handles transient failures
        .AddRetry(new RetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        })
        // Hedging reduces tail latency
        .AddHedging(new HedgingStrategyOptions<WeatherForecast>
        {
            MaxHedgedAttempts = 1,
            Delay = TimeSpan.FromMilliseconds(500)
        });
});

Strategy Order Matters

The order you add strategies determines how they interact:

Outer strategies wrap inner strategies: Fallback (outer) catches exceptions from Circuit Breaker (inner)
Circuit Breaker should wrap Retry: This prevents retries from overwhelming an already-failing service
Hedging is typically innermost: Individual hedged attempts should go through retry/circuit breaker logic

Best Practices

Start simple: Begin with just Retry or Circuit Breaker, then add complexity as needed
Monitor everything: Use telemetry to understand when and why strategies activate
Test your resilience: Simulate failures to verify your strategies work as expected
Consider the user experience: Fast failures (Circuit Breaker) are often better than slow retries
Be mindful of downstream services: Aggressive retries can make problems worse
Use exponential backoff and jitter: Prevent thundering herd problems
Make fallbacks meaningful: Don’t just return null—provide useful degraded functionality

Conclusion

Reactive resilience strategies are essential tools for building reliable distributed systems. Each strategy addresses different failure scenarios:

Retry handles transient failures
Circuit Breaker prevents cascading failures and allows recovery
Fallback provides graceful degradation
Hedging reduces tail latency

By understanding when and how to use each strategy, and how to combine them effectively, you can build applications that gracefully handle the inevitable failures in distributed systems. The key is finding the right balance between resilience, performance, and resource consumption for your specific use case.

Microsoft Resiliency Extensions and Polly Part 3 - Reactive Strategies

Understanding Reactive vs Proactive Strategies

The Retry Strategy

Basic Retry Configuration

Handling Specific Exceptions

The Danger of Naive Retries

The Circuit Breaker Strategy

Basic Circuit Breaker Configuration

The Fallback Strategy

When to Use Fallback

Basic Fallback Configuration

Fallback to Alternative Service

Informing Users of Degraded Service

The Hedging Strategy

When to Use Hedging

Basic Hedging Configuration

Combining Strategies

Strategy Order Matters

Best Practices

Conclusion

Authors

Categories

Series

Understanding Reactive vs Proactive Strategies

The Retry Strategy

Basic Retry Configuration

Handling Specific Exceptions

The Danger of Naive Retries

The Circuit Breaker Strategy

Basic Circuit Breaker Configuration

The Fallback Strategy

When to Use Fallback

Basic Fallback Configuration

Fallback to Alternative Service

Informing Users of Degraded Service

The Hedging Strategy

When to Use Hedging

Basic Hedging Configuration

Combining Strategies

Strategy Order Matters

Best Practices

Conclusion

Get these blog posts delivered to your email!

Authors

Categories

Series