C# Retry Mechanisms for Handling Transient Failures in Cloud Apps
Cloud environments are prone to transient failures due to their distributed nature, shared resources, hardware replacements, and infrastructure scaling. These temporary failures include network connectivity issues, service unavailability timeouts, and database connection problems that typically resolve themselves after a short period. Implementing proper retry logic is essential for building resilient cloud applications in C#.
Common Retry Patterns
- Basic Retry Pattern
- Exponential Backoff
- Exponential Backoff with Jitter
- Implementing Retry Logic with Polly
- Wait and Retry with Polly
- Circuit Breaker Pattern with Polly
- Integration with ASP.NET Core and HttpClientFactory
- Azure SDK Retry Mechanisms
- Deferred Processing Queue Pattern
- Best Practices for Retry Logic
Basic Retry Pattern
The simplest retry mechanism involves attempting an operation again after it fails, up to a maximum number of attempts.
async Task<HttpResponseMessage> RequestWithRetry(HttpClient client, string url, int maxRetryAttempts = 5)
{
// repeat the request up to maxRetryAttempts times
for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
{
try
{
return await client.GetAsync(url);
}
catch (HttpRequestException e)
{
if (attempt == maxRetryAttempts)
throw new Exception($"Request to '{url}' failed after {maxRetryAttempts} attempts!");
// Wait before next retry
await Task.Delay(1000);
}
}
throw new Exception("Should not reach here");
}
While simple, this approach lacks sophistication for handling different types of failures and can potentially overload already stressed services.
Exponential Backoff
Exponential backoff increases the delay between retry attempts exponentially, giving the service more time to recover.
async Task<HttpResponseMessage> RequestWithExponentialBackoff(HttpClient client, string url, int maxRetryAttempts = 3)
{
for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
{
try
{
return await client.GetAsync(url);
}
catch (HttpRequestException e)
{
if (attempt == maxRetryAttempts)
throw;
// Calculate exponential backoff delay
int delayMilliseconds = (int)Math.Pow(2, attempt) * 500;
await Task.Delay(delayMilliseconds);
}
}
throw new Exception("Should not reach here");
}
This approach gives services breathing time to recover, especially useful when failures occur due to service overload.
Exponential Backoff with Jitter
Adding randomness to backoff times prevents the “thundering herd” problem where all clients retry simultaneously.
async Task<HttpResponseMessage> RequestWithJitter(HttpClient client, string url, int maxRetryAttempts = 3)
{
Random random = new Random();
for (int attempt = 1; attempt <= maxRetryAttempts; attempt++)
{
try
{
return await client.GetAsync(url);
}
catch (HttpRequestException e)
{
if (attempt == maxRetryAttempts)
throw;
// Calculate exponential backoff with jitter
int baseDelay = (int)Math.Pow(2, attempt) * 500;
int jitteredDelay = baseDelay + random.Next(0, baseDelay / 2);
await Task.Delay(jitteredDelay);
}
}
throw new Exception("Should not reach here");
}
Jitter (fluctuation or variation in latency) helps spread out retry attempts across clients, reducing the risk of overwhelming the service during recovery.
Implementing Retry Logic with Polly
Polly is a .NET resilience library that provides a clean, fluent API for implementing retry patterns. It’s widely used in C# applications for handling transient failures.
using Polly;
using Polly.Retry;
var retryPolicy = Policy
.Handle<HttpRequestException>()
.Or<TimeoutException>()
.Retry(3, (exception, retryCount) =>
{
Console.WriteLine($"Retrying due to {exception.GetType().Name}... Attempt {retryCount}");
});
// Execute with the retry policy
retryPolicy.Execute(() =>
{
// Your operation that might fail transiently
return httpClient.GetAsync("https://api.example.com");
});
This creates a policy that retries up to 3 times when encountering an HttpRequestException or TimeoutException.
Wait and Retry with Polly
var waitAndRetryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetry(
retryCount: 3,
sleepDurationProvider: retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
onRetry: (exception, timeSpan, retryCount, context) =>
{
Console.WriteLine($"Retry {retryCount} after {timeSpan.TotalSeconds} seconds due to: {exception.Message}");
});
waitAndRetryPolicy.Execute(() =>
{
// Your operation that might fail transiently
});
This policy waits between retries with an exponential backoff strategy.
Circuit Breaker Pattern with Polly
The circuit breaker prevents continuous retries when a service is consistently failing.
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.CircuitBreaker(
exceptionsAllowedBeforeBreaking: 2,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timespan) => Console.WriteLine($"Circuit broken for {timespan.TotalSeconds} seconds due to: {ex.Message}"),
onReset: () => Console.WriteLine("Circuit reset, calls allowed again")
);
circuitBreakerPolicy.Execute(() =>
{
// Your operation that might fail transiently
});
After two consecutive failures, the circuit “opens” and fails fast for 30 seconds before allowing a test request.
Polly allows combining multiple policies for comprehensive resilience:
var combinedPolicy = Policy.Wrap(
waitAndRetryPolicy,
circuitBreakerPolicy
);
combinedPolicy.Execute(() =>
{
// Your operation that might fail transiently
});
This first applies the retry policy, and if failures persist, the circuit breaker takes over.
Integration with ASP.NET Core and HttpClientFactory
Modern ASP.NET Core applications can leverage IHttpClientFactory with Polly for built-in retry capabilities.
builder.Services.AddHttpClient("ResilientClient")
.AddTransientHttpErrorPolicy(policyBuilder =>
policyBuilder.WaitAndRetryAsync(
3,
retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(2, retryAttempt) * 100),
onRetry: (outcome, timespan, retryCount, context) =>
{
Console.WriteLine($"Retry {retryCount} after {timespan.TotalMilliseconds}ms due to {outcome.Exception?.Message}");
}
)
)
.AddTransientHttpErrorPolicy(policyBuilder =>
policyBuilder.CircuitBreakerAsync(
5,
TimeSpan.FromSeconds(30)
)
);
This registers an HttpClient with both retry and circuit breaker policies that handle transient HTTP errors (5xx responses, 408 responses, and network failures).
Azure SDK Retry Mechanisms
Azure SDK includes built-in retry mechanisms for Azure services.
// Configure Azure clients with custom retry settings
builder.Services.AddAzureClients(clientBuilder =>
{
// Global defaults
clientBuilder.ConfigureDefaults(builder.Configuration.GetSection("AzureDefaults"));
// Service-specific retry policy
clientBuilder.AddBlobServiceClient(builder.Configuration.GetSection("Storage"))
.ConfigureOptions(options =>
{
options.Retry.Mode = Azure.Core.RetryMode.Exponential;
options.Retry.MaxRetries = 5;
options.Retry.MaxDelay = TimeSpan.FromSeconds(120);
});
});
Azure SDK supports configuring retry behavior through code or configuration files.
Deferred Processing Queue Pattern
For long-running operations, consider using a deferred processing queue:
// When a database operation fails
async Task ProcessOperation(object data)
{
try
{
await database.SaveDataAsync(data);
}
catch (Exception ex) when (IsTransient(ex))
{
// Instead of immediate retry, add to a deferred queue
await deferredPQueue.EnqueueAsync(new DeferredOperation
{
Data = data,
OperationType = OperationType.DatabaseSave,
AttemptCount = 1
});
}
}
// A scheduled job processes the deferred queue
async Task ProcessDeferredQueue()
{
var operations = await deferredQueue.DequeueAsync(batchSize: 10);
foreach (var operation in operations)
{
try
{
switch (operation.OperationType)
{
case OperationType.DatabaseSave:
await database.SaveDataAsync(operation.Data);
break;
// Handle other operation types
}
}
catch (Exception ex) when (IsTransient(ex) && operation.AttemptCount < maxAttempts)
{
// Requeue with increased attempt count
operation.AttemptCount++;
await deferredPQueue.EnqueueAsync(operation);
}
}
}
This pattern is useful for background processes where immediate success isn’t required.
Best Practices for Retry Logic
1. Properly classify exceptions: Only retry operations for truly transient failures (network issues, temporary service unavailability) not for permanent failures (authentication issues, validation errors).
2. Use idempotent operations: Ensure operations can be safely retried without side effects.
3. Consider operation timeout: Set appropriate timeouts to avoid long-running operations.
4. Adjust retry strategy based on context: Use different retry policies for background vs. interactive operations.
5. Implement proper logging: Track retry attempts to identify recurring issues.
6. Use jitter for distributed systems: Add randomness to prevent synchronized retry storms.
7. Combine with circuit breakers: Stop retrying when a service is consistently failing.
8. Don’t overdo retries: Excessive retries can worsen service overload conditions.
Using these retry mechanisms and best practices will significantly improve your C# cloud application’s resilience against transient failures, resulting in better user experience and system stability.
Generated with Perplexity.