C# Retry Mechanisms for Handling Transient Failures in Cloud Apps

March 25, 20252407 words13 min read
C#
ai generated

Retry patterns in C#

Cloud environments are prone to transient failures due to their distributed nature, shared resources, hardware replacements, and infrastructure scaling. These temporary failures include network connectivity issues, service unavailability timeouts, and database connection problems that typically resolve themselves after a short period. Implementing proper retry logic is essential for building resilient cloud applications in C#.

Common Retry Patterns

Basic Retry Pattern

The simplest retry mechanism involves attempting an operation again after it fails, up to a maximum number of attempts.

async Task<HttpResponseMessage> RequestWithRetry(HttpClient client, string url, int maxRetryAttempts = 5)
{
    // repeat the request up to maxRetryAttempts times
    for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
    {
        try
        {
            return await client.GetAsync(url);
        }
        catch (HttpRequestException e)
        {
            if (attempt == maxRetryAttempts)
                throw new Exception($"Request to '{url}' failed after {maxRetryAttempts} attempts!");
            
            // Wait before next retry
            await Task.Delay(1000);
        }
    }
    
    throw new Exception("Should not reach here");
}

While simple, this approach lacks sophistication for handling different types of failures and can potentially overload already stressed services.

Exponential Backoff

Exponential backoff increases the delay between retry attempts exponentially, giving the service more time to recover.

async Task<HttpResponseMessage> RequestWithExponentialBackoff(HttpClient client, string url, int maxRetryAttempts = 3)
{
    for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
    {
        try
        {
            return await client.GetAsync(url);
        }
        catch (HttpRequestException e)
        {
            if (attempt == maxRetryAttempts)
                throw;
                
            // Calculate exponential backoff delay
            int delayMilliseconds = (int)Math.Pow(2, attempt) * 500;
            await Task.Delay(delayMilliseconds);
        }
    }
    
    throw new Exception("Should not reach here");
}

This approach gives services breathing time to recover, especially useful when failures occur due to service overload.

Exponential Backoff with Jitter

Adding randomness to backoff times prevents the “thundering herd” problem where all clients retry simultaneously.

async Task<HttpResponseMessage> RequestWithJitter(HttpClient client, string url, int maxRetryAttempts = 3)
{
    Random random = new Random();
    
    for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
    {
        try
        {
            return await client.GetAsync(url);
        }
        catch (HttpRequestException e)
        {
            if (attempt == maxRetryAttempts)
                throw;
                
            // Calculate exponential backoff with jitter
            int baseDelay = (int)Math.Pow(2, attempt) * 500;
            int jitteredDelay = baseDelay + random.Next(0, baseDelay / 2);
            await Task.Delay(jitteredDelay);
        }
    }
    
    throw new Exception("Should not reach here");
}

Jitter (fluctuation or variation in latency) helps spread out retry attempts across clients, reducing the risk of overwhelming the service during recovery.

Implementing Retry Logic with Polly

Polly is a .NET resilience library that provides a clean, fluent API for implementing retry patterns. It’s widely used in C# applications for handling transient failures.

using Polly;
using Polly.Retry;

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .Or<TimeoutException>()
    .Retry(3, (exception, retryCount) =>
    {
        Console.WriteLine($"Retrying due to {exception.GetType().Name}... Attempt {retryCount}");
    });

// Execute with the retry policy
retryPolicy.Execute(() =>
{
    // Your operation that might fail transiently
    return httpClient.GetAsync("https://api.example.com");
});

This creates a policy that retries up to 3 times when encountering an HttpRequestException or TimeoutException.

Wait and Retry with Polly

var waitAndRetryPolicy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetry(
        retryCount: 3,
        sleepDurationProvider: retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
        onRetry: (exception, timeSpan, retryCount, context) =>
        {
            Console.WriteLine($"Retry {retryCount} after {timeSpan.TotalSeconds} seconds due to: {exception.Message}");
        });

waitAndRetryPolicy.Execute(() => 
{
    // Your operation that might fail transiently
});

This policy waits between retries with an exponential backoff strategy.

Circuit Breaker Pattern with Polly

The circuit breaker prevents continuous retries when a service is consistently failing.

var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreaker(
        exceptionsAllowedBeforeBreaking: 2,
        durationOfBreak: TimeSpan.FromSeconds(30),
        onBreak: (ex, timespan) => Console.WriteLine($"Circuit broken for {timespan.TotalSeconds} seconds due to: {ex.Message}"),
        onReset: () => Console.WriteLine("Circuit reset, calls allowed again")
    );

circuitBreakerPolicy.Execute(() => 
{
    // Your operation that might fail transiently
});

After two consecutive failures, the circuit “opens” and fails fast for 30 seconds before allowing a test request.

Polly allows combining multiple policies for comprehensive resilience:

var combinedPolicy = Policy.Wrap(
    waitAndRetryPolicy,
    circuitBreakerPolicy
);

combinedPolicy.Execute(() => 
{
    // Your operation that might fail transiently
});

This first applies the retry policy, and if failures persist, the circuit breaker takes over.

Integration with ASP.NET Core and HttpClientFactory

Modern ASP.NET Core applications can leverage IHttpClientFactory with Polly for built-in retry capabilities.

builder.Services.AddHttpClient("ResilientClient")
    .AddTransientHttpErrorPolicy(policyBuilder =>
        policyBuilder.WaitAndRetryAsync(
            3,
            retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(2, retryAttempt) * 100),
            onRetry: (outcome, timespan, retryCount, context) =>
            {
                Console.WriteLine($"Retry {retryCount} after {timespan.TotalMilliseconds}ms due to {outcome.Exception?.Message}");
            }
        )
    )
    .AddTransientHttpErrorPolicy(policyBuilder =>
        policyBuilder.CircuitBreakerAsync(
            5,
            TimeSpan.FromSeconds(30)
        )
    );

This registers an HttpClient with both retry and circuit breaker policies that handle transient HTTP errors (5xx responses, 408 responses, and network failures).

Azure SDK Retry Mechanisms

Azure SDK includes built-in retry mechanisms for Azure services.

// Configure Azure clients with custom retry settings
builder.Services.AddAzureClients(clientBuilder =>
{
    // Global defaults
    clientBuilder.ConfigureDefaults(builder.Configuration.GetSection("AzureDefaults"));
    
    // Service-specific retry policy
    clientBuilder.AddBlobServiceClient(builder.Configuration.GetSection("Storage"))
        .ConfigureOptions(options => 
        {
            options.Retry.Mode = Azure.Core.RetryMode.Exponential;
            options.Retry.MaxRetries = 5;
            options.Retry.MaxDelay = TimeSpan.FromSeconds(120);
        });
});

Azure SDK supports configuring retry behavior through code or configuration files.

Deferred Processing Queue Pattern

For long-running operations, consider using a deferred processing queue:

// When a database operation fails
async Task ProcessOperation(object data)
{
    try
    {
        await database.SaveDataAsync(data);
    }
    catch (Exception ex) when (IsTransient(ex))
    {
        // Instead of immediate retry, add to a deferred queue
        await deferredPQueue.EnqueueAsync(new DeferredOperation
        {
            Data = data,
            OperationType = OperationType.DatabaseSave,
            AttemptCount = 1
        });
    }
}

// A scheduled job processes the deferred queue
async Task ProcessDeferredQueue()
{
    var operations = await deferredQueue.DequeueAsync(batchSize: 10);
    foreach (var operation in operations)
    {
        try
        {
            switch (operation.OperationType)
            {
                case OperationType.DatabaseSave:
                    await database.SaveDataAsync(operation.Data);
                    break;
                // Handle other operation types
            }
        }
        catch (Exception ex) when (IsTransient(ex) && operation.AttemptCount < maxAttempts)
        {
            // Requeue with increased attempt count
            operation.AttemptCount++;
            await deferredPQueue.EnqueueAsync(operation);
        }
    }
}

This pattern is useful for background processes where immediate success isn’t required.

Best Practices for Retry Logic

1.  Properly classify exceptions: Only retry operations for truly transient failures (network issues, temporary service unavailability) not for permanent failures (authentication issues, validation errors).
2.  Use idempotent operations: Ensure operations can be safely retried without side effects.
3.  Consider operation timeout: Set appropriate timeouts to avoid long-running operations.
4.  Adjust retry strategy based on context: Use different retry policies for background vs. interactive operations.
5.  Implement proper logging: Track retry attempts to identify recurring issues.
6.  Use jitter for distributed systems: Add randomness to prevent synchronized retry storms.
7.  Combine with circuit breakers: Stop retrying when a service is consistently failing.
8.  Don’t overdo retries: Excessive retries can worsen service overload conditions.

Using these retry mechanisms and best practices will significantly improve your C# cloud application’s resilience against transient failures, resulting in better user experience and system stability.

Generated with Perplexity.

If you want to get in touch and hear more about this topic, feel free to contact me on or via .

© 2025 Andrei Bodea