By the looks of it, you are well aware of what you are trying to achieve. Even though you are saying that it is not the NACLs, I would check them one more time, as sometimes one can easily overlook something minor. Take into account the snippet below taken from this AWS troubleshooting article and make sure that you have the right S3 CIDRs in your rules for the respective region:
Make sure that the network ACLs associated with your EC2 instance's subnet allow the following: Egress on port 80 (HTTP) and 443 (HTTPS) to the Regional S3 service. Ingress on ephemeral TCP ports from the Regional S3 service. Ephemeral ports are 1024-65535. The Regional S3 service is the CIDR for the subnet containing your S3 interface endpoint. Or, if you're using an S3 gateway, the Regional S3 service is the public IP CIDR for the S3 service. Network ACLs don't support prefix lists. To add the S3 CIDR to your network ACL, use 0.0.0.0/0 as the S3 CIDR. You can also add the actual S3 CIDRs into the ACL. However, keep in mind that the S3 CIDRs can change at any time.
Your S3 endpoint policy looks good to me on first look, but you are right that it is very likely that the policy or the endpoint configuration in general could be the cause, so I would re-check it one more time too.
One additional thing that I have observed before is that depending on the AMI you use and your VPC settings (DHCP options set, DNS, etc) sometimes the EC2 instance cannot properly set it's default region in the yum config. Please check whether the files awsregion and awsdomain exist within the /etc/yum/vars directory and what's their content. In your use case, the awsregion should have:
$ cat /etc/yum/vars/awsregion
ap-southeast-2
You can check whether the DNS resolving on your instance is working properly with:
dig amazonlinux.ap-southeast-2.amazonaws.com
If DNS seems to be working fine, you can compare whether the IP in the output resides within the ranges you have allowed in your NACLs.
EDIT:
After having a second look, this line, is a bit stricter than it should be:
arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/*
According to the docs it should be something like:
arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2/*
Hi @nick https://stackoverflow.com/users/9405602/nick --> these are excellent suggestions writing a 'answer' because trouble shooting will be valuable for others plus char limit in comment.
The problem is definitely the policy.
sh-4.2$ cat /etc/yum/vars/awsregion
ap-southeast-2sh-4.2$
dig:
sh-4.2$ dig amazonlinux.ap-southeast-2.amazonaws.com
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> amazonlinux.ap-southeast-2.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 598 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;amazonlinux.ap-southeast-2.amazonaws.com. IN A
;; ANSWER SECTION: amazonlinux.ap-southeast-2.amazonaws.com. 278 IN CNAME s3.dualstack.ap-southeast-2.amazonaws.com. s3.dualstack.ap-southeast-2.amazonaws.com. 2 IN A 52.95.134.91
;; Query time: 4 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Mon Sep 20 00:03:36 UTC 2021 ;; MSG SIZE rcvd: 112
let's check in on the NACLs:
NACL OUTBOUND RULES description: 100 All traffic All All 0.0.0.0/0
Allow
101 All traffic All All 52.95.128.0/21
Allow
150 All traffic All All 3.5.164.0/22
Allow
200 All traffic All All 3.5.168.0/23
Allow
250 All traffic All All 3.26.88.0/28
Allow
300 All traffic All All 3.26.88.16/28
Allow
All traffic All All 0.0.0.0/0
Deny
NACL INBOUND RULES inbound rule description: 100 All traffic All All 10.0.0.0/24
Allow
150 All traffic All All 10.0.1.0/24
Allow
200 All traffic All All 10.0.2.0/24
Allow
250 All traffic All All 10.0.3.0/24
Allow
400 All traffic All All 52.95.128.0/21
Allow
450 All traffic All All 3.5.164.0/22
Allow
500 All traffic All All 3.5.168.0/23
Allow
550 All traffic All All 3.26.88.0/28
Allow
600 All traffic All All 3.26.88.16/28
Allow
All traffic All All 0.0.0.0/0
Deny
SO -----> '52.95.134.91' is captured by rule 101 outbound and 400 inbound so that looks good NACL wise. (future people trouble shooting, this is what you should look for)
Also regarding those CIDR blocks, Deploy script pulls those from the current list and grabs out the s3 ones for ap-southeast-2 with jq and pass those as parameters to the CF deploy.
docs on how to do that for others: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html#aws-ip-download
Another note, you might notice the out 0.0.0.0/0, I realize (and for other people looking pls note )this makes the other rules redundant, I just put it in 'in case' while fiddling (and removed out -> pub subnets). private subnet traffic outbound 0.0.0.0/0 is routed to the respective NATs in public subnets. I'll add outbound for my public subnets and remove this rule at some point.
subnetting atm is simply: 10.0.0.0/16 pub a : 10.0.0.0/24 pub b : 10.0.1.0/24 priv a : 10.0.2.0/24 priv b : 10.0.3.0/24
so out rules for pub a and b blocks will be re-introduced so i can remove the allow on 0.0.0.0/0
I am now sure it is the policy.
I just click-ops amended the policy in console to 'full access' to give that a crack and had success.
My guess is the mirror list makes it hard to pin-down what to explicitly allow, so even though I cast the net broad I wasn't capturing the required bucket. But I don't know much about how aws mirrors work so that's a guess.
I probably don't want a super duper permissive policy, so this isn't really a fix but it confirms where the issue is.