From 2fc6180697fcc507692d72ba914f3aedc66b0912 Mon Sep 17 00:00:00 2001 From: Dominik Chilla Date: Sat, 27 Nov 2021 20:56:14 +0100 Subject: [PATCH] refinement --- README.md | 189 ++++++++++++++++++++---------------------------------- 1 file changed, 68 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index b375a0d..be4c055 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ * [kubectl - BASH autocompletion](#kubectl-bash-autocompletion) * [Install k3s](#install-k3s) - * [On on-premises](#install-k3s-on-premises) + * [On premises/IaaS](#install-k3s-on-premises) * [Configure upstream DNS-resolver](#upstream-dns-resolver) * [Change NodePort range](#nodeport-range) * [Clustering](#clustering) @@ -8,9 +8,8 @@ * [Namespaces and resource limits](#namespaces-limits) * [Persistent volumes (StorageClass - dynamic provisioning)](#pv) * [Rancher Local](#pv-local) - * [Rancher Longhorn - distributed in local cluster](#pv-longhorn) + * [Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)](#pv-longhorn) * [NFS](#pv-nfs) - * [Seaweedfs](#pv-seaweedfs) * [Ingress controller](#ingress-controller) * [Disable Traefik-ingress](#disable-traefik-ingress) * [Enable NGINX-ingress with OCSP stapling](#enable-nginx-ingress) @@ -31,12 +30,12 @@ * [Get deployment history](#helm-history) * [Rollback](#helm-rollback) * [Kubernetes in action](#kubernetes-in-action) - * [Running DaemonSets on `hostPort`](#running-daemonsets) + * [Running DaemonSets with `hostNetwork: true`](#running-daemonsets) * [Running StatefulSet with NFS storage](#running-statefulset-nfs) * [Services](#services) * [Client-IP transparency and loadbalancing](#services-client-ip-transparency) * [Session affinity/persistence](#services-session-persistence) - * [Keep your cluster balanced](#keep-cluster-balanced) + * [Keeping the cluster balanced](#keep-cluster-balanced) * [Node maintenance](#node-maintenance) * [What happens if a node goes down?](#what-happens-node-down) * [Dealing with disruptions](#disruptions) @@ -55,36 +54,12 @@ echo "source <(kubectl completion bash)" >> ~/.bashrc ``` # Install k3s -## On premises +## On premises/IaaS https://k3s.io/: ``` curl -sfL https://get.k3s.io | sh - ``` -If disired, set a memory consumption limit of the systemd-unit like so: -``` -root#> mkdir /etc/systemd/system/k3s.service.d -root#> vi /etc/systemd/system/k3s.service.d/limits.conf -[Service] -MemoryMax=1024M -root#> systemctl daemon-reload -root#> systemctl restart k3s - -root#> systemctl status k3s -k3s.service - Lightweight Kubernetes - Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled) - Drop-In: /etc/systemd/system/k3s.service.d - └─limits.conf - Active: active (running) since Thu 2020-11-26 10:46:26 CET; 13min ago - Docs: https://k3s.io - Process: 9618 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS) - Process: 9619 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) - Main PID: 9620 (k3s-server) - Tasks: 229 - Memory: 510.6M (max: 1.0G) - CGroup: /system.slice/k3s.service - -``` ### Upstream DNS-resolver Docs: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/ @@ -114,7 +89,7 @@ ExecStart=/usr/local/bin/k3s \ If you want to build a K3s-cluster the default networking model is *overlay@VXLAN*. In this case make sure that * all of your nodes can reach (ping) each other over the underlying network (local, routed/vpn). This is required for the overlay network to work properly. VXLAN spans a mashed network over all K3s-nodes. * if your nodes are spread over public networks (like the internet) use a VPN (like IPSec or OpenVPN) to secure the traffic between the nodes. **VXLAN uses plain UDP for transport!** -* if your nodes are connected through VPN, `flannel` (overlay network daemon) should explicitly communicate over the vpn network interface instead of the public network interface. Following settings should be made on the nodes: +* if your nodes are connected through VPN, `flannel` (overlay network daemon) should explicitly communicate via the vpn network interface instead of the public network interface. Following settings should be made on the nodes: ``` /etc/systemd/system/k3s-agent.service: @@ -164,7 +139,7 @@ and change the content to ``` forward . ipaddr.of.your.dns-resolver ``` -Finally redeploy the CoreDNS deployment with: +Finally re-deploy the CoreDNS deployment with: `kubectl -n kube-system rollout restart deployment coredns` **Note:** If you restart the cluster (`k3d cluster stop your-cluster` and `k3d cluster start your-cluster`), the changes will be gone! @@ -180,7 +155,7 @@ Read more about [AccessModes](https://kubernetes.io/docs/concepts/storage/persis https://rancher.com/docs/k3s/latest/en/storage/ Only supports *AccessMode*: ReadWriteOnce (RWO) -## Longhorn (distributed in local cluster) +## Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-) * Requirements: https://longhorn.io/docs/0.8.0/install/requirements/ * Debian: `apt install open-iscsi` * Install: https://rancher.com/docs/k3s/latest/en/storage/ @@ -239,11 +214,6 @@ spec: requests: storage: 32Mi ``` -## Seaweedfs -Docs: https://github.com/seaweedfs -Docs: https://github.com/seaweedfs/seaweedfs-csi-driver - -In order to use the CSI driver you need to have a working seaweedfs-cluster. As seaweedfs is really lightweight it can be deployed on a bunch (at least three) of raspberries (min. version 3) as well as on the K3s cluster too. # Ingress controller ## Disable Traefik-ingress @@ -251,7 +221,7 @@ edit /etc/systemd/system/k3s.service: ``` [...] ExecStart=/usr/local/bin/k3s \ - server --disable traefik --resolv-conf /etc/resolv.conf \ + server [...] --disable traefik \ [...] ``` Finally `systemctl daemon-reload` and `systemctl restart k3s` @@ -264,36 +234,37 @@ https://kubernetes.github.io/ingress-nginx/deploy/#using-helm ``` kubectl create ns ingress-nginx helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx -helm install my-release ingress-nginx/ingress-nginx -n ingress-nginx +helm install nginx-ingress ingress-nginx/ingress-nginx -n ingress-nginx ``` `kubectl -n ingress-nginx get all`: ``` NAME READY STATUS RESTARTS AGE -pod/svclb-my-release-ingress-nginx-controller-m6gxl 2/2 Running 0 110s -pod/my-release-ingress-nginx-controller-695774d99c-t794f 1/1 Running 0 110s +pod/svclb-nginx-ingress-controller-m6gxl 2/2 Running 0 110s +pod/nginx-ingress-controller-695774d99c-t794f 1/1 Running 0 110s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -service/my-release-ingress-nginx-controller-admission ClusterIP 10.43.116.191 443/TCP 110s -service/my-release-ingress-nginx-controller LoadBalancer 10.43.55.41 192.168.178.116 80:31110/TCP,443:31476/TCP 110s +service/nginx-ingress-controller-admission ClusterIP 10.43.116.191 443/TCP 110s +service/nginx-ingress-controller LoadBalancer 10.43.55.41 192.168.178.116 80:31110/TCP,443:31476/TCP 110s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE -daemonset.apps/svclb-my-release-ingress-nginx-controller 1 1 1 1 1 110s +daemonset.apps/svclb-nginx-ingress-controller 1 1 1 1 1 110s NAME READY UP-TO-DATE AVAILABLE AGE -deployment.apps/my-release-ingress-nginx-controller 1/1 1 1 110s +deployment.apps/nginx-ingress-controller 1/1 1 1 110s NAME DESIRED CURRENT READY AGE -replicaset.apps/my-release-ingress-nginx-controller-695774d99c 1 1 1 110s +replicaset.apps/nginx-ingress-controller-695774d99c 1 1 1 110s ``` -As nginx ingress is hungry for memory, let´s reduce the number of workers to 1: +The nginx ingress global configuration can be modified as follows: ``` -kubectl -n ingress-nginx edit configmap my-release-ingress-nginx-controller +kubectl -n ingress-nginx edit configmap ingress-nginx-controller apiVersion: v1 <<>> data: enable-ocsp: "true" + use-gzip: "true" worker-processes: "1" <<>> kind: ConfigMap @@ -301,13 +272,13 @@ kind: ConfigMap ``` Finally the deployment needs to be restarted: -`kubectl -n ingress-nginx rollout restart deployment my-release-ingress-nginx-controller` +`kubectl -n ingress-nginx rollout restart deployment ingress-nginx-controller` **If you are facing deployment problems like the following one** ``` -Error: UPGRADE FAILED: cannot patch "gitea-ingress-staging" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://my-release-ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded +Error: UPGRADE FAILED: cannot patch "gitea-ingress-staging" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-ingress-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded ``` -A possible fix: `kubectl -n ingress-nginx delete ValidatingWebhookConfiguration my-release-ingress-nginx-admission` +A possible fix: `kubectl -n ingress-nginx delete ValidatingWebhookConfiguration ingress-nginx-admission` # Cert-Manager (references ingress controller) ## Installation @@ -325,7 +296,7 @@ kubectl -n cert-manager get all ## Cluster-internal CA Issuer Docs: https://cert-manager.io/docs/configuration/ca/ -## Let´s Encrypt issuer +## Let´s Encrypt HTTP-01 issuer Docs: https://cert-manager.io/docs/tutorials/acme/ingress/#step-6-configure-let-s-encrypt-issuer ``` ClusterIssuers are a resource type similar to Issuers. They are specified in exactly the same way, @@ -338,7 +309,7 @@ lets-encrypt-cluster-issuers.yaml: apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: - name: letsencrypt-staging-issuer + name: letsencrypt-http01-staging-issuer spec: acme: # You must replace this email address with your own. @@ -358,7 +329,7 @@ spec: apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: - name: letsencrypt-prod-issuer + name: letsencrypt-http01-prod-issuer spec: acme: # The ACME server URL @@ -386,7 +357,7 @@ data: apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: - name: letsencrypt-dns01-issuer + name: letsencrypt-dns01-prod-issuer spec: acme: email: user@example.com @@ -411,9 +382,9 @@ spec: `kubectl apply -f lets-encrypt-cluster-issuers.yaml` ## Deploying a LE-certificate with ingress -All you need is an `Ingress` resource of class `nginx` which references a ClusterIssuer (`letsencrypt-prod-issuer`) resource. +All you need is an `Ingress` resource of class `nginx` which references a ClusterIssuer (`letsencrypt-http01-prod-issuer`) resource. -HTTP-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-prod-issuer"`): +HTTP-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"`): ``` apiVersion: networking.k8s.io/v1beta1 kind: Ingress @@ -423,7 +394,7 @@ metadata: annotations: # use the shared ingress-nginx kubernetes.io/ingress.class: "nginx" - cert-manager.io/cluster-issuer: "letsencrypt-prod-issuer" + cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer" spec: tls: - hosts: @@ -438,7 +409,7 @@ spec: serviceName: some-target-service servicePort: some-target-service-port ``` -DNS-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-dns01-issuer"`): +DNS-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"`): ``` apiVersion: networking.k8s.io/v1beta1 kind: Ingress @@ -448,7 +419,7 @@ metadata: annotations: # use the shared ingress-nginx kubernetes.io/ingress.class: "nginx" - cert-manager.io/cluster-issuer: "letsencrypt-dns01-issuer" + cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer" spec: tls: - hosts: @@ -538,7 +509,7 @@ spec: ## Troubleshooting Docs: https://cert-manager.io/docs/faq/acme/ -ClusterIssuer runs in default namespace: +ClusterIssuers are *visible* any namespaces: ``` kubectl get clusterissuer kubectl describe clusterissuer @@ -555,7 +526,9 @@ kubectl -n get challenge kubectl -n describe challenge ``` -After successfull setup perform a TLS-test: `https://www.ssllabs.com/ssltest/index.html` +After successfull setup perform a TLS-test: +* https://testssl.sh/ (`apt install testssl.sh`) +* https://www.ssllabs.com/ssltest/index.html # HELM charts Docs: @@ -690,13 +663,12 @@ NOTES: ``` # Kubernetes in action -## Running DaemonSets on `hostPort` -* Docs: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ -* Good article: https://medium.com/stakater/k8s-deployments-vs-statefulsets-vs-daemonsets-60582f0c62d4 +## Running DaemonSets with `hostNetwork: true` +* [Docs: DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) +* [(Security) hints on using `hostNetwork`](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces) +* [Pod´s DNS policy](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy) -In this case configuration of networking in context of services is not needed. - -This setup is suitable for legacy scenarios where static IP-address are required and a NodePort service is not an alternative: +This setup is suitable for scenarios where kubernetes nodes are already running with dual-IP-stack (IPv4 and IPv6) and the Pod needs IPv6 too, but k3s was deployed in ipv4-only mode. In this case the Pod can be deployed in the network namespace of the kubernetes node. ``` kind: DaemonSet apiVersion: apps/v1 @@ -704,54 +676,30 @@ metadata: name: netcat-daemonset labels: app: netcat-daemonset -spec: - selector: - matchLabels: - app: netcat-daemonset - template: - metadata: - labels: - app: netcat-daemonset - spec: - containers: - - command: - - nc - - -lk - - -p - - "23456" - - -v - - -e - - /bin/true - env: - - name: DEMO_GREETING - value: Hello from the environment - image: dockreg-zdf.int.zwackl.de/alpine/latest/amd64:prod - imagePullPolicy: IfNotPresent - name: netcat-daemonset - ports: - - containerPort: 23456 - hostPort: 23456 - protocol: TCP - resources: - limits: - cpu: 500m - memory: 64Mi - requests: - cpu: 50m - memory: 32Mi - restartPolicy: Always - securityContext: {} - terminationGracePeriodSeconds: 30 - updateStrategy: - rollingUpdate: - maxUnavailable: 1 - type: RollingUpdate +spec: + selector: + matchLabels: + app: netcat-daemonset + template: + metadata: + labels: + app: netcat-daemonset + spec: + hostNetwork: true + dnsPolicy: ClusterFirstWithHostNet + restartPolicy: Always + terminationGracePeriodSeconds: 10 + containers: + - name: alpine-netcat-daemonset + image: alpine + imagePullPolicy: IfNotPresent + command: ["nc", "-lk", "-p", "23456", "-v", "-e", "/bin/true"] ``` ## Running StatefulSet with NFS storage -* https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ +* [Docs: StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) * [NFS dynamic volume provisioning deployed](#pv-nfs) -**Be careful:** *StatefulSets* are designed for stateful applications (like databases). To avoid split-brain scenarios StatefulSets behave as static as possible. If a node goes down, the StatefulSet controller will reschedule the pods to another nodes, which can meet the requirements! If you want to force a re-scheduling: +StatefulSets are designed for stateful applications (like databases). To avoid split-brain scenarios StatefulSets behave as static as possible. If a node goes down, the StatefulSet controller will reschedule the pods to another node, that meets the required conditions! If you want to force a re-scheduling: `kubectl delete pod web-1 --grace-period=0 --force` More details on this can be found [here](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/) @@ -824,7 +772,6 @@ spec: `externalTrafficPolicy: Local` preserves the original client ip-address which is visible to the PODs. In any case (`DaemonSet` or `StatefulSet`) traffic remains on the Node which gets the traffic. In case of `StatefulSet` if more than one POD of a `ReplicaSet` is scheduled on the same Node, the workload gets balanced over all PODs on the same Node. - ### Session affinity/persistence ``` apiVersion: v1 @@ -841,7 +788,7 @@ spec: Session persistence (`None` by default) is only supported on client ip-address. Cookie-stickiness or stickiness on any other/higher layer is not supported yet. ## What happens if a node goes down? -If a node goes down kubernetes marks this node as *NotReady*, but nothing else: +If a node goes down kubernetes marks this node as *NotReady*, but nothing else happens until [Pod tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration) take place, which are per default configured with a *timeout* of `300s` (5 minutes!). ``` $ kubectl get node NAME STATUS ROLES AGE VERSION @@ -858,7 +805,7 @@ web-2 1/1 Running 0 26 ds-test-c6xx8 1/1 Running 0 18m ds-test-w45dv 1/1 Running 5 28h ``` -Kubernetes supports [Pod-tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration), which are per default configured with a *timeout* of `300s` (5 minutes!). This means, that affected Pods will *remain* for a timespan of 300s on a *broken* node before eviction takes place +Pod tolerations can be determined like this: ``` $ kubectl -n describe pod @@ -868,7 +815,7 @@ Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists fo [...] ``` -To be more reactive Pod-tolerations can be configured as follows: +To be more reactive to node failures Pod tolerations can be adjusted within the Pod template as follows: ``` kind: Deployment or StatefulSet apiVersion: apps/v1 @@ -891,10 +838,10 @@ spec: [...] ``` -## Keep your cluster balanced -Kubernetes, in first place, takes care of high availability, but not of well balance of pod/node. +## Keeping the cluster balanced +In first place, Kubernetes takes care of high availability, but not of well balance of pods per node. -In case of `Deployment` or `StatefulSet` a `topologySpreadConstraint` needs to be specified: +In case of `Deployment` or `StatefulSet` a [`topologySpreadConstraint`](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/) needs to be specified: ``` kind: Deployment or StatefulSet apiVersion: apps/v1 @@ -909,7 +856,7 @@ spec: topologySpreadConstraints: - labelSelector: matchLabels: - app: {{ .Chart.Name }}-{{ .Values.stage }} + app: the-app maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule