Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know nothing. But I'd imagine the number of 'events' generated during this period of downtime will eclipse that number every minute.




"I felt a great disturbance in us-east-1, as if millions of outage events suddenly cried out in terror and were suddenly silenced"

(Be interesting to see how many events currently going to DynamoDB are actually outage information.)


I wonder how many companies have properly designed their clients. So that the timing before re-attempt is randomised and the re-attempt timing cycle is logarithmic.

nowadays i think a single immediate retry is preferred over exponential backoff with jitter.

if you ran into a problem that an instant retry cant fix, chances are you will be waiting so long that your own customer doesnt care anymore.


Most companies will use the AWS SDK client's default retry policy.

Why randomized?

It’s the Thundering Herd Problem.

See https://en.wikipedia.org/wiki/Thundering_herd_problem

In short, if it’s all at the same schedule you’ll end up with surges of requests followed by lulls. You want that evened out to reduce stress on the server end.


Thank you. Bonsai and adzm as well. :)

It's just a safe pattern that's easy to implement. If your services back-off attempts happen to be synced, for whatever reason, even if they are backing off and not slamming AWS with retries, when it comes online they might slam your backend.

It's also polite to external services but at the scale of something like AWS that's not a concern for most.


> they might slam your backend

Heh


Helps distribute retries rather than having millions synchronize



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: