* [kubectl - BASH autocompletion](#kubectl-bash-autocompletion) * [Install k3s](#install-k3s) * [On premises/IaaS](#install-k3s-on-premises) * [Configure upstream DNS-resolver](#upstream-dns-resolver) * [Change NodePort range](#nodeport-range) * [Clustering](#clustering) * [On Docker with k3d](#install-k3s-on-docker-k3d) * [Namespaces and resource limits](#namespaces-limits) * [Persistent volumes (StorageClass - dynamic provisioning)](#pv) * [Rancher Local (k3s default)](#pv-local) * [NFS](#pv-nfs) * [Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)](#pv-longhorn) * [Custom StorageClass](#pv-longhorn-custom-storageclass) * [Volume backups with S3 (compatible) storage](#pv-longhorn-s3-backup) * [Ingress controller](#ingress-controller) * [Disable Traefik-ingress](#disable-traefik-ingress) * [Enable NGINX-ingress with OCSP stapling](#enable-nginx-ingress) * [Installation](#install-nginx-ingress) * [Cert-Manager (references ingress controller)](#cert-manager) * [Installation](#cert-manager-install) * [Cluster-internal CA issuer](#cert-manager-cluster-ca-issuer) * [Let´s Encrypt (HTTP-01/DNS-01) issuer](#cert-manager-le-issuer) * [Deploying a LE-certificate with ingress](#cert-manager-ingress) * [Deploying a LE-certificate by CRD](#cert-manager-crd) * [Troubleshooting](#cert-manager-troubleshooting) * [HELM charts](#helm) * [Create a chart](#helm-create) * [Install local chart without packaging](#helm-install-without-packaging) * [List deployed helm charts](#helm-list) * [Upgrade local chart without packaging](#helm-upgrade) * [Get status of deployed chart](#helm-status) * [Get deployment history](#helm-history) * [Rollback](#helm-rollback) * [Kubernetes in action](#kubernetes-in-action) * [Running DaemonSets with `hostNetwork: true`](#running-daemonsets) * [Running StatefulSet with NFS storage](#running-statefulset-nfs) * [Services](#services) * [Client-IP transparency and loadbalancing](#services-client-ip-transparency) * [Session affinity/persistence](#services-session-persistence) * [Keeping the cluster balanced](#keep-cluster-balanced) * [Node maintenance](#node-maintenance) * [What happens if a node goes down?](#what-happens-node-down) * [Dealing with disruptions](#disruptions) * [Troubleshooting](#troubleshooting) * [Deleting a stuck namespace](#ts-delete-stuck-namespace) * [Deleting stuck CRDs](#ts-delete-stuck-crd) # kubectl - BASH autocompletion For current shell only: ``` source <(kubectl completion bash) ``` Persistent: ``` echo "source <(kubectl completion bash)" >> ~/.bashrc ``` # Install k3s ## On premises/IaaS https://k3s.io/: ``` curl -sfL https://get.k3s.io | sh - ``` ### Upstream DNS-resolver Docs: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/ Default: 8.8.8.8 => does not resolve local domains! 1. local /etc/resolv.k3s.conf -> ip-of-dnsresolver (127.0.0.1 **does not work!**) 2. vi /etc/systemd/system/k3s.service: ``` [...] ExecStart=/usr/local/bin/k3s \ server [...] --resolv-conf /etc/resolv.k3s.conf \ ``` 3. Re-load systemd config: `systemctl daemon-reload` 4. Re-start k3s: `systemctl restart k3s.service` 5. Re-deploy coredns-pods: `kubectl -n kube-system delete pod name-of-coredns-pods` ### Change NodePort range to 1 - 65535 1. vi /etc/systemd/system/k3s.service: ``` [...] ExecStart=/usr/local/bin/k3s \ server [...] --kube-apiserver-arg service-node-port-range=1-65535 \ ``` 2. Re-load systemd config: `systemctl daemon-reload` 3. Re-start k3s: `systemctl restart k3s.service` ### Clustering If you want to build a K3s-cluster the default networking model is *overlay@VXLAN*. In this case make sure that * all of your nodes can reach (ping) each other over the underlying network (local, routed/vpn). This is required for the overlay network to work properly. VXLAN spans a mashed network over all K3s-nodes. * if your nodes are spread over public networks (like the internet) use a VPN (like IPSec or OpenVPN) to secure the traffic between the nodes. **VXLAN uses plain UDP for transport!** * if your nodes are connected through VPN, `flannel` (overlay network daemon) should explicitly communicate via the vpn network interface instead of the public network interface. Following settings should be made on the nodes: ``` /etc/systemd/system/k3s-agent.service: [...] ExecStartPre=sleep 60 ExecStart=/usr/local/bin/k3s \ agent \ --flannel-iface \ ``` ## On Docker with K3d K3d is a terraforming orchestrator which deploys a K3s cluster (masters and nodes) directly on docker without the need for virtual machines for each node (master/worker). * Prerequisites: a local docker installation **without user-namespaces enabled**. * **Warning**: K3d deploys privileged containers! https://k3d.io/: ``` curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash ``` Create a K3s cluster without `traefik` as well as `metrics-server` ``` k3d cluster create cluster1 \ --agents 2 \ --k3s-server-arg '--disable=traefik' \ --k3s-server-arg '--disable=metrics-server' \ --k3s-server-arg '--kube-apiserver-arg=service-node-port-range=1-65535' ``` If you encounter `helm` throwing errors like this one: ``` Error: Kubernetes cluster unreachable ``` ... just do: ``` $ kubectl config view --raw > ~/kubeconfig-k3d.yaml $ export KUBECONFIG=~/kubeconfig-k3d.yaml ``` If you need to change the upstream DNS-resolver: ``` kubectl -n kube-system edit configmap coredns ``` Find the line containing ``` forward . /etc/resolv.conf ``` and change the content to ``` forward . ipaddr.of.your.dns-resolver ``` Finally re-deploy the CoreDNS deployment with: `kubectl -n kube-system rollout restart deployment coredns` **Note:** If you restart the cluster (`k3d cluster stop your-cluster` and `k3d cluster start your-cluster`), the changes will be gone! # Namespaces and resource limits ``` kubectl apply -f https://gitea.zwackl.de/dominik/k3s/raw/branch/master/namespaces_limits.yaml ``` # Persistent Volumes (StorageClass - dynamic provisioning) Read more about [AccessModes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) ## Rancher Local (k3s default) https://rancher.com/docs/k3s/latest/en/storage/ Only supports *AccessMode*: ReadWriteOnce (RWO) ## NFS For testing purposes as well as simplicity you may use following [NFS container image](https://hub.docker.com/r/itsthenetwork/nfs-server-alpine): ``` mkdir -p docker run -d --name nfs-server \ --net=host \ --privileged \ -v /data/docker/nfs-server/data/:/nfsshare \ -e SHARED_DIRECTORY=/nfsshare \ itsthenetwork/nfs-server-alpine:latest ``` **All Nodes need to have the NFS-client package (Ubuntu: `nfs-common`) installed** ``` helm repo add ckotzbauer https://ckotzbauer.github.io/helm-charts helm install my-nfs-client-provisioner --set nfs.server= --set nfs.path= ckotzbauer/nfs-client-provisioner ``` Check if NFS *StorageClass* is available: ``` $ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 101d nfs-client cluster.local/my-nfs-client-provisioner Delete Immediate true 172m ``` Now you can use `nfs-client` as StorageClass like so: ``` apiVersion: apps/v1 kind: StatefulSet [...] volumeClaimTemplates: - metadata: name: nfs-backend spec: accessModes: [ "ReadWriteMany" ] storageClassName: "nfs-client" resources: requests: storage: 32Mi ``` or so: ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfs-pvc-1 namespace: spec: storageClassName: "nfs-client" accessModes: - ReadWriteMany resources: requests: storage: 32Mi ``` ## Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-) * Requirements: https://longhorn.io/docs/0.8.0/install/requirements/ * Debian/Ubuntu: `apt install open-iscsi` * Ubuntu: uninstall the multipathd as it can interfere with iscsid: `apt purge multipath-tools` * Install: https://rancher.com/docs/k3s/latest/en/storage/ ### Custom StorageClass The following storageClass `longhorn-2r` will define 2 `replicas`, no `dataLocality` and EXT4 as filesystem: ``` allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: longhorn.io/last-applied-configmap: | kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-2r annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: "Delete" volumeBindingMode: Immediate parameters: numberOfReplicas: "2" staleReplicaTimeout: "30" fromBackup: "" fsType: "ext4" dataLocality: "disabled" storageclass.kubernetes.io/is-default-class: "true" name: longhorn-2r parameters: fromBackup: "" fsType: ext4 numberOfReplicas: "2" staleReplicaTimeout: "30" dataLocality: "disabled" provisioner: driver.longhorn.io reclaimPolicy: Delete volumeBindingMode: Immediate ``` ### Volume backups with S3 (compatible) storage If you do not want to expose your volume backups to public cloud (e.g. AWS), you need to provide a local S3 storage server. This can be easily done with [minio](https://longhorn.io/docs/1.2.2/snapshots-and-backups/backup-and-restore/set-backup-target/#set-up-a-local-testing-backupstore): ``` apiVersion: v1 kind: Secret metadata: name: s3-backup-secret namespace: longhorn-system type: Opaque data: AWS_ACCESS_KEY_ID: bG9uZ2hvcm4= # Base64: AWS_SECRET_ACCESS_KEY: Qmx1YmIxMjM0IQ== # Base64: AWS_ENDPOINTS: aHR0cHM6Ly95b3VyLnMzLmVuZHBvaW50 # Base64: ``` # Ingress controller ## Disable Traefik-ingress edit /etc/systemd/system/k3s.service: ``` [...] ExecStart=/usr/local/bin/k3s \ server [...] --disable traefik \ [...] ``` Finally `systemctl daemon-reload` and `systemctl restart k3s` ## Enable K8s own NGINX-ingress with OCSP stapling ### Installation This is the helm chart of the K8s own nginx ingress controller: https://kubernetes.github.io/ingress-nginx/deploy/#using-helm ``` $ kubectl create ns ingress-nginx $ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx $ helm install nginx-ingress ingress-nginx/ingress-nginx -n ingress-nginx ``` ``` $ kubectl -n ingress-nginx get all NAME READY STATUS RESTARTS AGE pod/svclb-nginx-ingress-controller-m6gxl 2/2 Running 0 110s pod/nginx-ingress-controller-695774d99c-t794f 1/1 Running 0 110s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/nginx-ingress-controller-admission ClusterIP 10.43.116.191 443/TCP 110s service/nginx-ingress-controller LoadBalancer 10.43.55.41 192.168.178.116 80:31110/TCP,443:31476/TCP 110s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/svclb-nginx-ingress-controller 1 1 1 1 1 110s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx-ingress-controller 1/1 1 1 110s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-ingress-controller-695774d99c 1 1 1 110s ``` The nginx ingress global configuration can be modified as follows: ``` $ kubectl -n ingress-nginx edit configmap ingress-nginx-controller apiVersion: v1 <<>> data: enable-ocsp: "true" use-gzip: "true" worker-processes: "1" <<>> kind: ConfigMap [...] ``` Finally the deployment needs to be restarted: `kubectl -n ingress-nginx rollout restart deployment ingress-nginx-controller` **If you are facing deployment problems like the following one** ``` Error: UPGRADE FAILED: cannot patch "gitea-ingress-staging" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-ingress-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded ``` A possible fix: `kubectl -n ingress-nginx delete ValidatingWebhookConfiguration ingress-nginx-admission` # Cert-Manager (references ingress controller) ## Installation Docs: https://hub.helm.sh/charts/jetstack/cert-manager **Note on split-horizon DNS**: If you are planning to use DNS-01 validation in term of [split-horizon-DNS](https://en.wikipedia.org/wiki/Split-horizon_DNS), you will need to specify an external DNS-resolver (Google, Cloudflare or your ISPs resolver) instead of your internal upstream DNS-resolver for DNS self-checks! Read [this](https://cert-manager.io/docs/configuration/acme/dns01/#setting-nameservers-for-dns01-self-check) for further details. ``` helm repo add jetstack https://charts.jetstack.io helm repo update kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.0.2/cert-manager.crds.yaml kubectl create namespace cert-manager helm install cert-manager --namespace cert-manager --set 'extraArgs={--dns01-recursive-nameservers-only,--dns01-recursive-nameservers=8.8.8.8:53\,1.1.1.1:53}' jetstack/cert-manager kubectl -n cert-manager get all ``` ## Cluster-internal CA Issuer Docs: https://cert-manager.io/docs/configuration/ca/ ## Let´s Encrypt (HTTP-01/DNS-01) issuer Docs: https://cert-manager.io/docs/tutorials/acme/ingress/#step-6-configure-let-s-encrypt-issuer ``` ClusterIssuers are a resource type similar to Issuers. They are specified in exactly the same way, but they do not belong to a single namespace and can be referenced by Certificate resources from multiple different namespaces. ``` lets-encrypt-cluster-issuers.yaml: ``` apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-http01-staging-issuer spec: acme: # You must replace this email address with your own. # Let's Encrypt will use this to contact you about expiring # certificates, and issues related to your account. email: user@example.com server: https://acme-staging-v02.api.letsencrypt.org/directory privateKeySecretRef: # Secret resource that will be used to store the account's private key. name: letsencrypt-staging-account-key # Add a single challenge solver, HTTP01 using nginx solvers: - http01: ingress: class: nginx --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-http01-prod-issuer spec: acme: # The ACME server URL server: https://acme-v02.api.letsencrypt.org/directory # Email address used for ACME registration email: user@example.com # Name of a secret used to store the ACME account private key privateKeySecretRef: name: letsencrypt-prod-account-key # Enable the HTTP-01 challenge provider solvers: - http01: ingress: class: nginx --- apiVersion: v1 kind: Secret metadata: name: tsig-dyn-update-secret namespace: cert-manager type: Opaque data: key: BASE64 encoded of BASE64 encoded (double-base64) TSIG-key --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-dns01-prod-issuer spec: acme: email: user@example.com server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: # Secret resource that will be used to store the account's private key. name: letsencrypt-dns01-account-key # Add a single challenge solver, HTTP01 using nginx solvers: - dns01: rfc2136: nameserver: ip_address_of_your_authoritative_nameserver:nameserver_port tsigKeyName: name_of_tsig_key_in_your_authoritative_nameserver tsigAlgorithm: HMACSHA512 tsigSecretSecretRef: name: tsig-dyn-update-secret key: key selector: dnsZones: - 'int.example.org' ``` `kubectl apply -f lets-encrypt-cluster-issuers.yaml` ## Deploying a LE-certificate with ingress All you need is an `Ingress` resource of class `nginx` which references a ClusterIssuer (`letsencrypt-http01-prod-issuer`) resource. HTTP-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"`): ``` apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: namespace: name: some-ingress-name annotations: # use the shared ingress-nginx kubernetes.io/ingress.class: "nginx" cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer" spec: tls: - hosts: - some-certificate.name.san secretName: target-certificate-secret-name rules: - host: some-certificate.name.san http: paths: - path: / backend: serviceName: some-target-service servicePort: some-target-service-port ``` DNS-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"`): ``` apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: namespace: name: some-ingress-name annotations: # use the shared ingress-nginx kubernetes.io/ingress.class: "nginx" cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer" spec: tls: - hosts: - some-certificate.name.san secretName: target-certificate-secret-name rules: - host: some-certificate.name.san http: paths: - path: / backend: serviceName: some-target-service servicePort: some-target-service-port ``` ## Deploying a LE-certificate by CRD All you need is a Certificate-CRD (Custom Resource Definition) like this one: ``` apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: some-certificate namespace: staging spec: # Secret names are always required. secretName: some-secret duration: 2160h # 90d renewBefore: 360h # 15d # The use of the common name field has been deprecated since 2000 and is # discouraged from being used. commonName: some.fully.qualified.domain.name isCA: false privateKey: algorithm: RSA encoding: PKCS1 size: 4096 usages: - server auth - client auth # At least one of a DNS Name, URI, or IP address is required. dnsNames: - some.fully.qualified.domain.name # Issuer references are always required. issuerRef: name: # We can reference ClusterIssuers by changing the kind here. # The default value is Issuer (i.e. a locally namespaced Issuer) kind: ClusterIssuer ``` After the certificate was issued, you can reference it as a volume within a deployment: ``` apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-ssl name: nginx-ssl namespace: staging spec: replicas: 1 selector: matchLabels: app: nginx-ssl strategy: type: Recreate template: metadata: labels: app: nginx-ssl spec: volumes: - name: nginx-ssl-volume secret: secretName: some-secret containers: - image: nginx name: nginx-ssl volumeMounts: - mountPath: "/etc/nginx/ssl" name: nginx-ssl-volume readOnly: true ports: - containerPort: 80 restartPolicy: Always ``` ## Troubleshooting Docs: https://cert-manager.io/docs/faq/acme/ ClusterIssuers are *visible* any namespaces: ``` kubectl get clusterissuer kubectl describe clusterissuer ``` All other ingres-specific cert-manager resources are running specific namespaces: ``` kubectl -n get certificaterequest kubectl -n describe certificaterequest kubectl -n get certificate kubectl -n describe certificate kubectl -n get secret kubectl -n describe secret kubectl -n get challenge kubectl -n describe challenge ``` After successfull setup perform a TLS-test: * https://testssl.sh/ (`apt install testssl.sh`) * https://www.ssllabs.com/ssltest/index.html # HELM charts Docs: * https://helm.sh/docs/intro/using_helm/ Prerequisites: * running kubernetes installation * kubectl with ENV[KUBECONFIG] pointing to appropriate config file * helm ## Create a chart `helm create helm-test` ``` ~/kubernetes/helm$ tree helm-test/ helm-test/ ├── charts ├── Chart.yaml ├── templates │   ├── deployment.yaml │   ├── _helpers.tpl │   ├── hpa.yaml │   ├── ingress.yaml │   ├── NOTES.txt │   ├── serviceaccount.yaml │   ├── service.yaml │   └── tests │   └── test-connection.yaml └── values.yaml ``` ## Install local chart without packaging `helm install helm-test-dev helm-test/ --set image.tag=latest --debug --wait` or just a *dry-run*: `helm install helm-test-dev helm-test/ --set image.tag=latest --debug --dry-run` ``` --wait: Waits until all Pods are in a ready state, PVCs are bound, Deployments have minimum (Desired minus maxUnavailable) Pods in ready state and Services have an IP address (and Ingress if a LoadBalancer) before marking the release as successful. It will wait for as long as the --timeout value. If timeout is reached, the release will be marked as FAILED. Note: In scenarios where Deployment has replicas set to 1 and maxUnavailable is not set to 0 as part of rolling update strategy, --wait will return as ready as it has satisfied the minimum Pod in ready condition. ``` ## List deployed helm charts ``` ~/kubernetes/helm$ helm list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION helm-test-dev default 4 2020-08-27 12:30:38.98457042 +0200 CEST deployed helm-test-0.1.0 1.16.0 ``` ## Upgrade local chart without packaging ``` ~/kubernetes/helm$ helm upgrade helm-test-dev helm-test/ --set image.tag=latest --wait --timeout 60s Release "helm-test-dev" has been upgraded. Happy Helming! NAME: helm-test-dev LAST DEPLOYED: Thu Aug 27 12:47:09 2020 NAMESPACE: default STATUS: deployed REVISION: 7 NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=helm-test,app.kubernetes.io/instance=helm-test-dev" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl --namespace default port-forward $POD_NAME 8080:80 ``` `helm upgrade [...] --wait` is synchronous and exit with 0 on success, otherwise with >0 on failure. `helm upgrade` will wait for 5 minutes Setting the `--timeout` (Default 5 minutes) flag makes This can be used in term of CI/CD deployments with Jenkins. ## Get status of deployed chart ``` ~/kubernetes/helm$ helm status helm-test-dev NAME: helm-test-dev LAST DEPLOYED: Thu Aug 27 12:47:09 2020 NAMESPACE: default STATUS: deployed REVISION: 7 NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=helm-test,app.kubernetes.io/instance=helm-test-dev" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl --namespace default port-forward $POD_NAME 8080:80 ``` ## Get deployment history ``` ~/kubernetes/helm$ helm history helm-test-dev REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 10 Thu Aug 27 12:56:33 2020 failed helm-test-0.1.0 1.16.0 Upgrade "helm-test-dev" failed: timed out waiting for the condition 11 Thu Aug 27 13:08:34 2020 superseded helm-test-0.1.0 1.16.0 Upgrade complete 12 Thu Aug 27 13:09:59 2020 superseded helm-test-0.1.0 1.16.0 Upgrade complete 13 Thu Aug 27 13:10:24 2020 superseded helm-test-0.1.0 1.16.0 Rollback to 11 14 Thu Aug 27 13:23:22 2020 failed helm-test-0.1.1 blubb Upgrade "helm-test-dev" failed: timed out waiting for the condition 15 Thu Aug 27 13:26:43 2020 pending-upgrade helm-test-0.1.1 blubb Preparing upgrade 16 Thu Aug 27 13:27:12 2020 superseded helm-test-0.1.1 blubb Upgrade complete 17 Thu Aug 27 14:32:32 2020 superseded helm-test-0.1.1 Upgrade complete 18 Thu Aug 27 14:33:58 2020 superseded helm-test-0.1.1 Upgrade complete 19 Thu Aug 27 14:36:49 2020 failed helm-test-0.1.1 cosmetics Upgrade "helm-test-dev" failed: timed out waiting for the condition ``` ## Rollback `helm rollback helm-test-dev 18 --wait` ``` ~/kubernetes/helm$ helm history helm-test-dev REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 10 Thu Aug 27 12:56:33 2020 failed helm-test-0.1.0 1.16.0 Upgrade "helm-test-dev" failed: timed out waiting for the condition 11 Thu Aug 27 13:08:34 2020 superseded helm-test-0.1.0 1.16.0 Upgrade complete 12 Thu Aug 27 13:09:59 2020 superseded helm-test-0.1.0 1.16.0 Upgrade complete 13 Thu Aug 27 13:10:24 2020 superseded helm-test-0.1.0 1.16.0 Rollback to 11 14 Thu Aug 27 13:23:22 2020 failed helm-test-0.1.1 blubb Upgrade "helm-test-dev" failed: timed out waiting for the condition 15 Thu Aug 27 13:26:43 2020 pending-upgrade helm-test-0.1.1 blubb Preparing upgrade 16 Thu Aug 27 13:27:12 2020 superseded helm-test-0.1.1 blubb Upgrade complete 17 Thu Aug 27 14:32:32 2020 superseded helm-test-0.1.1 Upgrade complete 18 Thu Aug 27 14:33:58 2020 superseded helm-test-0.1.1 Upgrade complete 19 Thu Aug 27 14:36:49 2020 failed helm-test-0.1.1 cosmetics Upgrade "helm-test-dev" failed: timed out waiting for the condition 20 Thu Aug 27 14:37:36 2020 deployed helm-test-0.1.1 Rollback to 18 ``` ``` ~/kubernetes/helm$ helm status helm-test-dev NAME: helm-test-dev LAST DEPLOYED: Thu Aug 27 14:37:36 2020 NAMESPACE: default STATUS: deployed REVISION: 20 NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=helm-test,app.kubernetes.io/instance=helm-test-dev" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl --namespace default port-forward $POD_NAME 8080:80 ``` # Kubernetes in action ## Running DaemonSets with `hostNetwork: true` * [Docs: DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) * [(Security) hints on using `hostNetwork`](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces) * [Pod´s DNS policy](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy) This setup is suitable for scenarios where kubernetes nodes are already running with dual-IP-stack (IPv4 and IPv6) and the Pod needs IPv6 too, but k3s was deployed in ipv4-only mode. In this case the Pod can be deployed in the network namespace of the kubernetes node. ``` kind: DaemonSet apiVersion: apps/v1 metadata: name: netcat-daemonset labels: app: netcat-daemonset spec: selector: matchLabels: app: netcat-daemonset template: metadata: labels: app: netcat-daemonset spec: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet restartPolicy: Always terminationGracePeriodSeconds: 10 containers: - name: alpine-netcat-daemonset image: alpine imagePullPolicy: IfNotPresent command: ["nc", "-lk", "-p", "23456", "-v", "-e", "/bin/true"] ``` ## Running StatefulSet with NFS storage * [Docs: StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) * [NFS dynamic volume provisioning deployed](#pv-nfs) StatefulSets are designed for stateful applications (like databases). To avoid split-brain scenarios StatefulSets behave as static as possible. If a node goes down, the StatefulSet controller will reschedule the pods to another node, that meets the required conditions! If you want to force a re-scheduling: `kubectl delete pod web-1 --grace-period=0 --force` More details on this can be found [here](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/) If you want DaemonSet-like Node-affinity with StatefulSets then read [this](https://medium.com/@johnjjung/building-a-kubernetes-daemonstatefulset-30ad0592d8cb) ``` --- apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: ports: - port: 80 name: web clusterIP: None selector: app: nginx --- apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: nginx serviceName: "nginx" replicas: 2 template: metadata: labels: app: nginx spec: terminationGracePeriodSeconds: 10 containers: - name: nginx image: nginx:alpine ports: - containerPort: 80 name: web volumeMounts: - name: nfs-backend mountPath: /nfs-backend volumeClaimTemplates: - metadata: name: nfs-backend spec: accessModes: [ "ReadWriteMany" ] storageClassName: "nfs-client" resources: requests: storage: 32Mi ``` ## Services ### Client-IP transparency and loadbalancing ``` apiVersion: v1 kind: Service [...] spec: type: NodePort externalTrafficPolicy: <> [...] ``` `externalTrafficPolicy: Cluster` (default) spreads the incoming traffic over all pods evenly. To achieve this the client ip-address must be source-NATted and therefore it´s not *visible* to the PODs. `externalTrafficPolicy: Local` preserves the original client ip-address which is visible to the PODs. In any case (`DaemonSet` or `StatefulSet`) traffic remains on the Node which gets the traffic. In case of `StatefulSet` if more than one POD of a `ReplicaSet` is scheduled on the same Node, the workload gets balanced over all PODs on the same Node. ### Session affinity/persistence ``` apiVersion: v1 kind: Service [...] spec: type: NodePort sessionAffinity: <> sessionAffinityConfig: clientIP: timeoutSeconds: 10 [] ``` Session persistence (`None` by default) is only supported on client ip-address. Cookie-stickiness or stickiness on any other/higher layer is not supported yet. ## What happens if a node goes down? If a node goes down kubernetes marks this node as *NotReady*, but nothing else happens until [Pod tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration) take place, which are per default configured with a *timeout* of `300s` (5 minutes!). ``` $ kubectl get node NAME STATUS ROLES AGE VERSION k3s-node2 Ready 103d v1.19.5+k3s2 k3s-master Ready master 103d v1.19.5+k3s2 k3s-node1 NotReady 103d v1.19.5+k3s2 $ kubectl get pod NAME READY STATUS RESTARTS AGE ds-test-5mlkt 1/1 Running 14 28h my-nfs-client-provisioner-57ff8c84c7-p75ck 1/1 Running 0 31m web-1 1/1 Running 0 26m web-2 1/1 Running 0 26m ds-test-c6xx8 1/1 Running 0 18m ds-test-w45dv 1/1 Running 5 28h ``` Pod tolerations can be determined like this: ``` $ kubectl -n describe pod [...] Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s [...] ``` To be more reactive to node failures Pod tolerations can be adjusted within the Pod template as follows: ``` kind: Deployment or StatefulSet apiVersion: apps/v1 metadata: [...] spec: [...] template: [...] spec: tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 30 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 30 [...] ``` ## Keeping the cluster balanced In first place, Kubernetes takes care of high availability, but not of well balance of pods per node. In case of `Deployment` or `StatefulSet` a [`topologySpreadConstraint`](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/) needs to be specified: ``` kind: Deployment or StatefulSet apiVersion: apps/v1 metadata: [...] spec: [...] template: [...] spec: # Prevent to schedule more than one pod per node topologySpreadConstraints: - labelSelector: matchLabels: app: the-app maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule [...] ``` `DaemonSet` workloads do not support `topologySpreadConstraints` at all. ## Node maintenance *Mark* a node for maintenance: ``` $ kubectl drain k3s-node2 --ignore-daemonsets $ kubectl get node NAME STATUS ROLES AGE VERSION k3s-node1 Ready 105d v1.19.5+k3s2 k3s-master Ready master 105d v1.19.5+k3s2 k3s-node2 Ready,SchedulingDisabled 105d v1.19.5+k3s ``` All Deployment as well as StatefulSet pods have been rescheduled on remaining nodes. DaemonSet pods were not touched! Node maintenance can be performed now. To bring the maintained node back in cluster: ``` $ kubectl uncordon k3s-node2 node/k3s-node2 uncordoned ``` ## Dealing with disruptions * https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ * https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ # Troubleshooting ## Deleting a stuck namespace ``` kubectl get namespace "stucked-namespace" -o json \ | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \ | kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f - ``` ## Deleting stuck CRDs https://github.com/kubernetes/kubernetes/issues/60538#issuecomment-369099998