It all started when I was trying to by something from Mercado Livre, one of the biggest portals here in Brazil. Couldn´t load account specifics, cart or change other profile settings, like adding a credit card.
So I decided to buy it from Amazon, same behavior. Went to Brazil's Down Detector and it seems to me that all services that rely on AWS are failing.
Went to the the US Down Detector site and I am seeing what seems to be the same cascading failure right now.
Any1 facing similar problems?
AWS Global Infrastructure
The AWS Cloud spans 120 Availability Zones within 38 Geographic Regions, with announced plans for 10 more Availability Zones and 3 more AWS Regions in the Kingdom of Saudi Arabia, Chile, and the AWS European Sovereign Cloud.
I thought that for companies like Amazon, Delta, Snapchat, Google and Venmo multi region setup was standard. One of the main premises of cloud services is the resilience to outage of one region or node. And yet, once us-east-1 is down, it's all over.
Was that the fault of AWS or those who used AWS tied to one region?
Edit: from the responses I came to conclusion that I'm gonna have my own resiliency with blackjack and hookers nginx and multiple cloud providers and it probably gonna work better than AWS.
Hi All,
As title suggests, I just popped in as a non-technical non-user aside from knowing that Flickr is down and has been all day long now, and apparently many other large sites, Reddit included.
Anyone here know the real deal and what's what and can explain it to me like I'm 5?
This AWS outage reminded me of how reliant many shops are on the platform. Do you think anyone will move towards a different cloud provider or a multi-cloud approach to ensure stability? Or just chalk it up to a black swan event and move on.
Love my Immich instance on a $15/month VDS. Still going strong when half the internet is down.
Isn't the point of availability zones to prevent shit like this from happening?
I've heard a ton about "Well everything's on the cloud, so a server goes down, and there goes the whole internet" which does not really make sense to me on some level. Isn't this stuff multiple-times redundant? Aren't there fallbacks, safeties, etc?
I thought modern networks are de-centralized and redundant. Why wasn't AWS?
If I lose my 436 day streak over this, I'm going to be really pissed. I've tried commenting. I've tried up votes on Posts and Comments. Now I'm trying a new Post.
EDIT EDIT: This is a past event although it looks like there are still errors trickling in. Leaving this up for a week and then potting it.
EDIT: AWS now appears to be largely working.
In terms of possible root cases, as hypothesised by u/tiredITguy42:
So what most likely happened:
DNS entry from DynamoDB API was bad.
Services can't access DynamoDB
It seems AWS is string IAM rules in DynamoDB
Users can't access services as they can't get access to resources resolved.
It seems that systems with main operation in other regions were OK even if some are running stuff in us-east-1 as well. It seems that they maintained access to DynamoDB in their region, so they could resolve access to resources in us-east-1.
These are just pieces I put together, we need to wait for proper postmortem analysis.
As some of you can tell, AWS is currently experiencing outages
In order to keep the subreddit a bit cleaner, post your gripes, stories, theories, memes etc. into here.
We salute all those on call getting shouted at.
The Amazon Web Services (AWS) outage on Monday doesn’t seem to have been caused by a cyberattack. But critics are already pointing to the fact that Amazon has laid off at least 27,000 employees, including – surprise – senior engineers, since 2022.
It's so weird, the main reason people use AWS is for safety and stability, this fails and fails massively but somehow it dosen't move stock even a little bit?
What is going on with this market? Does CEO needs to commit war crimes on pandas or smth for stock to go down (of course if this CEO is Elon, then Tesla would go to new ATH)
Edit: Ok, I clearly don't understate the stock or psychology. Crowdstrike created massive outage - stock get massive hit, pepole say that they are evil, their monopoly is bad and will be broken, everybody panics and company will probably go under. Amazon creates massive outage - yeah, bullish af, they are evil, their monopoly is great and we hope it continues, everybody cheers that so many companies can be affected.
Short summary: https://www.bleepingcomputer.com/news/technology/amazon-explains-the-cause-behind-tuesday-s-massive-aws-outage/
Full summary: https://aws.amazon.com/message/12721/