Troubleshoot self-hosted Kubernetes network

When connecting to external services from your Ascend Instance, several network issues can occur. This guide helps you diagnose and troubleshoot common connectivity problems. While Ascend cannot directly debug your specific network setup, we've compiled these troubleshooting techniques based on common issues users have encountered with self-hosted Kubernetes.

Set up a network test pod

To test network connectivity, you'll need to exec into a pod with network utilities available. Since our application images don't include additional shell network utilities (to minimize security vulnerabilities and image size), we recommend using the amouat/network-utils image, which includes common tools like nc and dig.

Option 1: Debug from existing pod context (most accurate)

kubectl debug -n $INSTANCE_NAMESPACE instance-backend-84fd85f558-hgztn -it --image=amouat/network-utils

This provides the most accurate test results because it performs networking requests from the same network context as your workload.

Option 2: Create temporary test pod If the debug command doesn't work in your cluster, try:

kubectl run network-tester-shell --rm -i --tty --image amouat/network-utils -- bash

Option 3: Custom pod If neither option works (potentially due to strict admission control policies), create a custom pod with network utilities and exec into it:

kubectl exec $POD_NAME -it -n $POD_NAMESPACE -- bash

Diagnose the issue

Every network connection follows these core steps:

Resolve DNS (if connecting to a hostname)
Connect to the target IP and port

Let's narrow down the failure step-by-step:

DNS Resolution

This step only applies when connecting to a hostname (like google.com). If you're connecting directly to an IP address (like 10.2.54.73), skip to the next section.

Test DNS resolution:

dig $TARGET_HOSTNAME

Successful DNS resolution looks like this:

root@network-tester-shell:/# dig google.com

; DiG 9.9.5-9+deb8u15-Debian google.com
;; global options: +cmd
;; Got answer:
;; -HEADER opcode: QUERY, status: NOERROR, id: 55662
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.            IN    A

;; ANSWER SECTION:
google.com.        30    IN    A    74.125.201.100
google.com.        30    IN    A    74.125.201.102
google.com.        30    IN    A    74.125.201.101
google.com.        30    IN    A    74.125.201.138
google.com.        30    IN    A    74.125.201.113
google.com.        30    IN    A    74.125.201.139

;; Query time: 3 msec
;; SERVER: 192.168.2.10#53(192.168.2.10)
;; WHEN: Wed Jun 04 23:04:40 UTC 2025
;; MSG SIZE  rcvd: 195

Look for the ANSWER section with a count greater than 0 (in this example, ANSWER: 6 indicates 6 IP addresses were found).

No DNS record found

If no DNS record exists, you'll see:

root@network-tester-shell:/# dig this-is-a-fake-domain.com

; DiG 9.9.5-9+deb8u15-Debian this-is-a-fake-domain.com
;; global options: +cmd
;; Got answer:
;; HEADER opcode: QUERY, status: NXDOMAIN, id: 10031
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;this-is-a-fake-domain.com.        IN    A

;; AUTHORITY SECTION:
com.            30    IN    SOA    a.gtld-servers.net. nstld.verisign-grs.com. 1749078367 1800 900 604800 900

;; Query time: 22 msec
;; SERVER: 192.168.2.10#53(192.168.2.10)
;; WHEN: Wed Jun 04 23:06:39 UTC 2025
;; MSG SIZE  rcvd: 126

Notice ANSWER: 0 and only an AUTHORITY section, indicating the domain doesn't exist.

Common causes:

Typo in the hostname/URL
Private domain not in your workload's DNS resolution path
- Solution: Configure Private DNS Zones or Stub Domains (implementation varies by Kubernetes networking/DNS provider)

Failed to reach DNS servers

If DNS requests fail completely:

root@network-tester-shell:/# dig google.com

; DiG 9.9.5-9+deb8u15-Debian  google.com
;; global options: +cmd
;; connection timed out; no servers could be reached

This means the pod cannot reach upstream DNS servers. Get your DNS server IP:

root@network-tester-shell:/# cat /etc/resolv.conf | grep nameserver
nameserver 192.168.2.10

Test DNS server connectivity:

root@network-tester-shell:/# nc -uzv 192.168.2.10 22
Connection to 192.168.2.10 22 port [udp/ssh] succeeded!

Common causes if this fails:

Misconfigured Network Policy (often TCP is allowed but UDP is not, or vice versa)
Routing issues

For additional troubleshooting, consult your DNS or networking provider's documentation, or refer to Google's kube-dns troubleshooting guide.

Test target connectivity

Once DNS resolution works (or if connecting directly to an IP), test raw connectivity:

nc -zv $TARGET_IP_OR_HOSTNAME $TARGET_PORT

This checks TCP connectivity. For UDP (less common), use -uzv instead of -zv.

Successful connection:

# Connecting to an IP
root@network-tester-shell:/# nc -zv 74.125.201.100 80
Connection to 74.125.201.100 80 port [tcp/http] succeeded!

# Connecting to a hostname
root@network-tester-shell:/# nc -zv google.com 80
Connection to google.com 80 port [tcp/http] succeeded!

Connection failures with helpful errors

Sometimes nc provides useful error messages:

Connection refused:

nc: connect to 192.168.1.1 port 80 (tcp) failed: Connection refused

This indicates explicit traffic blocking, which could be:

Egress policy on your pod
Firewall on the target system
Firewall on intermediate network hops

If this happens, there is something on the way from you to the target IP:port that is explicitly blocking your traffic. This does not tell you what that is:

It could be egress policy on your pod
It could be a firewall on the target system
It could be a firewall on some network hop on the way to the target system

This is not a comprehensive list, but this error does at least tell you this is a policy or firewall DENY. In order to resolve this, you can check for traffic from your pod and see if requests are reaching that system from your pod IP (see the section below on Where is your Pod traffic coming from?).

Error - No route to host

nc: connect to 192.168.1.1 port 80 (tcp) failed: No route to host

This indicates a routing problem. While uncommon within Kubernetes clusters, this can occur when connecting to on-premises networks with multiple network hops.

Connection timed out:

nc: connect to 192.168.1.1 port 80 (tcp) failed: Connection timed out

This is the most common failure we encounter. It indicates either broken routing or traffic blocking, requiring investigation of both possibilities.

Understanding pod traffic origins

Pod traffic may not originate from the IP address you expect. Understanding this is crucial for configuring firewall rules and access policies.

Background

When cloud providers developed managed Kubernetes services (GKE, EKS, AKS), they needed to integrate with existing cloud-native networking models that typically granted access based on VM network identities. Since Kubernetes runs multiple workloads per node, traditional VM-based access models no longer worked.

Cloud providers initially solved this by assigning first-class cloud IPs to pods:

GKE and AKS: Allocated /24 IP ranges from selected subnets to each node
EKS: Attached additional ENIs (network interfaces) to nodes for each pod

However, these approaches consumed large numbers of IP addresses. To address this, providers introduced NAT features that translate pod traffic to use node IPs when leaving the cluster, reducing IP space requirements.

Identifying traffic origin IPs

Your pod traffic may originate from either the pod IP or the node IP. You'll likely need to allow traffic from the entire IP range for one of these.

Find the pod IP

kubectl get pods -n $NAMESPACE -o wide

Example output:

❯ kubectl get pods -n test-ns -o wide
NAME                                                  READY   STATUS    RESTARTS   AGE   IP             NODE                            NOMINATED NODE   READINESS GATES
instance-backend-78fb4c4c54-ncrx7                     1/1     Running   0          15h   10.4.214.151   ip-192-168-0-156.ec2.internal   none           none
runtime-3hc7oa-018fcb05-a214-7734-b3a4-8036a5e7f932   1/1     Running   0          15h   10.10.219.28   ip-192-168-8-207.ec2.internal   none           none
runtime-3hc7oa-019272e8-43d9-74f0-b147-682e4ab8e6e3   1/1     Running   0          15h   10.8.72.97     ip-192-168-8-8.ec2.internal     none           none

Pod IPs are in the sixth column (e.g., 10.4.214.151).

Find the node IP

Using the node name from the previous output (e.g., ip-192-168-0-156.ec2.internal):

kubectl describe node ip-192-168-0-156.ec2.internal | grep InternalIP
  InternalIP:   192.168.0.156

Final step

Once you've figured out which IP the traffic is coming from, find the Pod or Node range from your cluster or network provider, and ensure traffic is permitted from that range.

Set up a network test pod​

Diagnose the issue​

DNS Resolution​

No DNS record found​

Failed to reach DNS servers​

Test target connectivity​

Connection failures with helpful errors​

Error - No route to host​

Understanding pod traffic origins​

Background​

Identifying traffic origin IPs​

Find the pod IP​

Find the node IP​

Final step​