Today we stumbled upon an interesting case which I want to share as it might help you in your debugging journey.
Let's assume you have the following infrastructure setup:
- Kubernetes Cluster on the Google Kubernetes Engine
- Workload Identity enabled.
egressNetwork Policies in use.
You might think everything is fine, but your service which is communicating with Google APIs (like Google Cloud Storage Client Libraries, etc.) complains with something like:
Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information.
The first reaction is: How can this be the case when I configured Workload Identity as recommended? When looking behind the curtain it is likely that the culprit here is not Workload Identity at all.
In our case, the root cause was a restrictive network policy which blocked the
egress traffic to the GKE metadata server.
The solution depends slightly on the Kubernetes version you're operating.
1.21.0-gke.1000 and later
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: your-network-policy namespace: default spec: egress: - ports: - port: 988 protocol: TCP to: - ipBlock: cidr: 169.254.169.252/32 policyTypes: - Egress
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: your-network-policy namespace: default spec: egress: - ports: - port: 988 protocol: TCP to: - ipBlock: cidr: 127.0.0.1/32 policyTypes: - Egress