* [kubectl - BASH autocompletion](#kubectl-bash-autocompletion)
* [Install k3s](#install-k3s)
* [On premises/IaaS](#install-k3s-on-premises)
* [Configure upstream DNS-resolver](#upstream-dns-resolver)
* [Change NodePort range](#nodeport-range)
* [Clustering](#clustering)
* [Upgrade manually](#upgrade-manually)
* [On Docker with k3d](#install-k3s-on-docker-k3d)
* [Namespaces and resource limits](#namespaces-limits)
* [Persistent volumes (StorageClass - dynamic provisioning)](#pv)
* [Rancher Local (k3s default)](#pv-local)
* [Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)](#pv-longhorn)
* [Custom StorageClass](#pv-longhorn-custom-storageclass)
* [Volume backups with S3 (compatible) storage](#pv-longhorn-s3-backup)
* [Ingress controller](#ingress-controller)
* [Disable Traefik-ingress](#disable-traefik-ingress)
* [Enable NGINX-ingress with OCSP stapling](#enable-nginx-ingress)
* [Installation](#install-nginx-ingress)
* [Cert-Manager (references ingress controller)](#cert-manager)
* [Installation](#cert-manager-install)
* [Cluster-internal CA issuer](#cert-manager-cluster-ca-issuer)
* [Let´s Encrypt (HTTP-01/DNS-01) issuer](#cert-manager-le-issuer)
* [Deploying a LE-certificate with ingress](#cert-manager-ingress)
* [Deploying a LE-certificate by CRD](#cert-manager-crd)
* [Troubleshooting](#cert-manager-troubleshooting)
* [Cluster monitoring](#cluster-monitoring)
* [Log correlation with Loki-stack](#loki-stack)
* [Metrics with Prometheus-stack + Grafana](#prometheus-grafana)
* [Nginx ingress controller](#prometheus-nginx-ingress)
* [Grafana dashboard for Node exporter full](#prometheus-node-exporter)
* [HELM charts](#helm)
* [Create a chart](#helm-create)
* [Install local chart without packaging](#helm-install-without-packaging)
* [List deployed helm charts](#helm-list)
* [Upgrade local chart without packaging](#helm-upgrade)
* [Get status of deployed chart](#helm-status)
* [Get deployment history](#helm-history)
* [Rollback](#helm-rollback)
* [Kubernetes in action](#kubernetes-in-action)
* [Running DaemonSets with `hostNetwork: true`](#running-daemonsets)
* [Services](#services)
* [Client-IP transparency and loadbalancing](#services-client-ip-transparency)
* [Session affinity/persistence](#services-session-persistence)
* [Keeping the cluster balanced](#keep-cluster-balanced)
* [Node maintenance](#node-maintenance)
* [What happens if a node goes down?](#what-happens-node-down)
* [Dealing with disruptions](#disruptions)
* [Troubleshooting](#troubleshooting)
* [Deleting a stuck namespace](#ts-delete-stuck-namespace)
* [Deleting stuck CRDs](#ts-delete-stuck-crd)
* [cgroup support with Odroid-M1 and Ubuntu 22.04 LTS](#ts-odroidm1-ubuntu2204-cgroups)
# kubectl - BASH autocompletion
For current shell only:
```
source <(kubectl completion bash)
```
Persistent:
```
echo "source <(kubectl completion bash)" >> ~/.bashrc
```
# Install k3s
## On premises/IaaS
https://k3s.io/:
```
curl -sfL https://get.k3s.io | sh -
```
### Upstream DNS-resolver
Docs: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/
Default: 8.8.8.8 => does not resolve local domains!
1. local /etc/resolv.k3s.conf -> ip-of-dnsresolver (127.0.0.1 **does not work!**)
2. vi /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --resolv-conf /etc/resolv.k3s.conf \
```
3. Re-load systemd config: `systemctl daemon-reload`
4. Re-start k3s: `systemctl restart k3s.service`
5. Re-deploy coredns-pods: `kubectl -n kube-system delete pod name-of-coredns-pods`
### Change NodePort range to 1 - 65535
1. vi /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --kube-apiserver-arg service-node-port-range=1-65535 \
```
2. Re-load systemd config: `systemctl daemon-reload`
3. Re-start k3s: `systemctl restart k3s.service`
### Clustering
If you want to build a K3s-cluster the default networking model is *overlay@VXLAN*. In this case make sure that
* all of your nodes can reach (ping) each other over the underlying network (local, routed/vpn). This is required for the overlay network to work properly. VXLAN spans a mashed network over all K3s-nodes.
* if your nodes are spread over public networks (like the internet) use a VPN (like IPSec or OpenVPN) to secure the traffic between the nodes. **VXLAN uses plain UDP for transport!**
* if your nodes are connected through VPN, `flannel` (overlay network daemon) should explicitly communicate via the vpn network interface instead of the public network interface. Following settings should be made on the nodes:
```
/etc/systemd/system/k3s-agent.service:
[...]
ExecStartPre=sleep 60
ExecStart=/usr/local/bin/k3s \
agent \
--flannel-iface \
```
* if your public/external nodes are connected through VPN and you have configured [canal](https://github.com/projectcalico/canal) to manage NetworkPolicies you will need to edit node config and change the public IP-addresses (in this example: `1.2.3.4`) of your nodes to internal VPN-IPs (in this example: `172.16.1.2`). Otherwise canal will bypass VPN and route VXLAN traffic through public IP addresses:
```
kubectl edit node
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 172.16.1.2
[...]
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ce:09:ce:de:4d:36"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
>> DEL >> flannel.alpha.coreos.com/public-ip: 1.2.3.4
>> ADD >> flannel.alpha.coreos.com/public-ip: 172.16.1.2
[...]
```
### Upgrade cluster manually
Check out version you want to upgrade to: https://github.com/k3s-io/k3s/releases
On master node:
```
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION= sh -
```
On any worker nodes:
```
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION= K3S_URL=https://:6443 K3S_TOKEN= sh -
```
## On Docker with K3d
K3d is a terraforming orchestrator which deploys a K3s cluster (masters and nodes) directly on docker without the need for virtual machines for each node (master/worker).
* Prerequisites: a local docker installation **without user-namespaces enabled**.
* **Warning**: K3d deploys privileged containers!
https://k3d.io/:
```
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
```
Create a K3s cluster without `traefik``
```
k3d cluster create cluster1 \
--agents 2 \
--k3s-server-arg '--disable=traefik' \
--k3s-server-arg '--kube-apiserver-arg=service-node-port-range=1-65535'
```
If you encounter `helm` throwing errors like this one:
```
Error: Kubernetes cluster unreachable
```
... just do:
```
$ kubectl config view --raw > ~/kubeconfig-k3d.yaml
$ export KUBECONFIG=~/kubeconfig-k3d.yaml
```
If you need to change the upstream DNS-resolver:
```
kubectl -n kube-system edit configmap coredns
```
Find the line containing
```
forward . /etc/resolv.conf
```
and change the content to
```
forward . ipaddr.of.your.dns-resolver
```
Finally re-deploy the CoreDNS deployment with:
`kubectl -n kube-system rollout restart deployment coredns`
**Note:** If you restart the cluster (`k3d cluster stop your-cluster` and `k3d cluster start your-cluster`), the changes will be gone!
# Namespaces and resource limits
```
kubectl apply -f https://gitea.zwackl.de/dominik/k3s/raw/branch/master/namespaces_limits.yaml
```
# Persistent Volumes (StorageClass - dynamic provisioning)
Read more about [AccessModes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)
## Rancher Local (k3s default)
https://rancher.com/docs/k3s/latest/en/storage/
Only supports *AccessMode*: ReadWriteOnce (RWO)
If you want to disable the local
## Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)
* Requirements: https://longhorn.io/docs/0.8.0/install/requirements/
* Debian/Ubuntu: `apt install open-iscsi`
* Ubuntu: uninstall the multipathd as it can interfere with iscsid: `apt purge multipath-tools`
* Install: https://rancher.com/docs/k3s/latest/en/storage/
### Custom StorageClass
The following storageClass `longhorn-2r` will define 2 `replicas`, no `dataLocality` and EXT4 as filesystem:
```
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
longhorn.io/last-applied-configmap: |
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-2r
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: "Delete"
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "30"
fromBackup: ""
fsType: "ext4"
dataLocality: "disabled"
storageclass.kubernetes.io/is-default-class: "true"
name: longhorn-2r
parameters:
fromBackup: ""
fsType: ext4
numberOfReplicas: "2"
staleReplicaTimeout: "30"
dataLocality: "disabled"
provisioner: driver.longhorn.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
```
### Volume backups with S3 (compatible) storage
If you do not want to expose your volume backups to public cloud (e.g. AWS), you need to provide a local S3 storage server. This can be easily done with [minio](https://longhorn.io/docs/1.2.2/snapshots-and-backups/backup-and-restore/set-backup-target/#set-up-a-local-testing-backupstore):
```
apiVersion: v1
kind: Secret
metadata:
name: s3-backup-secret
namespace: longhorn-system
type: Opaque
data:
AWS_ACCESS_KEY_ID: bG9uZ2hvcm4= # Base64:
AWS_SECRET_ACCESS_KEY: Qmx1YmIxMjM0IQ== # Base64:
AWS_ENDPOINTS: aHR0cHM6Ly95b3VyLnMzLmVuZHBvaW50 # Base64:
```
# Ingress controller
## Disable Traefik-ingress
edit /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --disable traefik \
[...]
```
Finally `systemctl daemon-reload` and `systemctl restart k3s`
## Enable K8s own NGINX-ingress with OCSP stapling
### Installation
This is the helm chart of the K8s own nginx ingress controller:
https://kubernetes.github.io/ingress-nginx/deploy/#using-helm
```
$ kubectl create ns ingress-nginx
$ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
$ helm -n ingress-nginx install ingress-nginx ingress-nginx/ingress-nginx
```
```
$ kubectl -n ingress-nginx get all
NAME READY STATUS RESTARTS AGE
pod/svclb-nginx-ingress-controller-m6gxl 2/2 Running 0 110s
pod/nginx-ingress-controller-695774d99c-t794f 1/1 Running 0 110s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx-ingress-controller-admission ClusterIP 10.43.116.191 443/TCP 110s
service/nginx-ingress-controller LoadBalancer 10.43.55.41 192.168.178.116 80:31110/TCP,443:31476/TCP 110s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/svclb-nginx-ingress-controller 1 1 1 1 1 110s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-ingress-controller 1/1 1 1 110s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-ingress-controller-695774d99c 1 1 1 110s
```
The nginx ingress global configuration can be modified as follows:
```
$ kubectl -n ingress-nginx edit configmap ingress-nginx-controller
apiVersion: v1
<<>>
data:
enable-ocsp: "true"
use-gzip: "true"
worker-processes: "1"
<<>>
kind: ConfigMap
[...]
```
Finally the deployment needs to be restarted:
`kubectl -n ingress-nginx rollout restart deployment ingress-nginx-controller`
**If you are facing deployment problems like the following one**
```
Error: UPGRADE FAILED: cannot patch "gitea-ingress-staging" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-ingress-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded
```
A possible fix: `kubectl -n ingress-nginx delete ValidatingWebhookConfiguration ingress-nginx-admission`
# Cert-Manager (references ingress controller)
## Installation
Docs: https://hub.helm.sh/charts/jetstack/cert-manager
**Note on split-horizon DNS**: If you are planning to use DNS-01 validation in term of [split-horizon-DNS](https://en.wikipedia.org/wiki/Split-horizon_DNS), you will need to specify an external DNS-resolver (Google, Cloudflare or your ISPs resolver) instead of your internal upstream DNS-resolver for DNS self-checks! Read [this](https://cert-manager.io/docs/configuration/acme/dns01/#setting-nameservers-for-dns01-self-check) for further details.
```
helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.0.2/cert-manager.crds.yaml
kubectl create namespace cert-manager
helm upgrade --install cert-manager \
--namespace cert-manager \
--set 'extraArgs={--dns01-recursive-nameservers-only,--dns01-recursive-nameservers=8.8.8.8:53\,1.1.1.1:53}' \
-f https://gitea.zwackl.de/dominik/k3s/raw/branch/master/cert-manager-values.yaml \
jetstack/cert-manager
kubectl -n cert-manager get all
```
**Note:** The [values file](https://gitea.zwackl.de/dominik/k3s/raw/branch/master/cert-manager-values.yaml) enables prometheus metrics. The values file references prometheus with it´s instance name `prom-stack`. If you want to go without prometheus metrics, just use the upper helm command without the `-f ` argument. Further information regarding cert-manager helm chart values can be found [here](https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/README.template.md#configuration)
## Cluster-internal CA Issuer
Docs: https://cert-manager.io/docs/configuration/ca/
## Let´s Encrypt (HTTP-01/DNS-01) issuer
Docs: https://cert-manager.io/docs/tutorials/acme/ingress/#step-6-configure-let-s-encrypt-issuer
```
ClusterIssuers are a resource type similar to Issuers. They are specified in exactly the same way,
but they do not belong to a single namespace and can be referenced by Certificate resources from
multiple different namespaces.
```
lets-encrypt-cluster-issuers.yaml:
```
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http01-staging-issuer
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: user@example.com
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: letsencrypt-staging-account-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http01-prod-issuer
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: user@example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod-account-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: v1
kind: Secret
metadata:
name: tsig-dyn-update-secret
namespace: cert-manager
type: Opaque
data:
key: BASE64 encoded of BASE64 encoded (double-base64) TSIG-key
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns01-prod-issuer
spec:
acme:
email: user@example.com
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: letsencrypt-dns01-account-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- dns01:
rfc2136:
nameserver: ip_address_of_your_authoritative_nameserver:nameserver_port
tsigKeyName: name_of_tsig_key_in_your_authoritative_nameserver
tsigAlgorithm: HMACSHA512
tsigSecretSecretRef:
name: tsig-dyn-update-secret
key: key
selector:
dnsZones:
- 'int.example.org'
```
`kubectl apply -f lets-encrypt-cluster-issuers.yaml`
## Deploying a LE-certificate with ingress
All you need is an `Ingress` resource of class `nginx` which references a ClusterIssuer (`letsencrypt-http01-prod-issuer`) resource.
HTTP-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"`):
```
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace:
name: some-ingress-name
annotations:
# use the shared ingress-nginx
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"
spec:
tls:
- hosts:
- some-certificate.name.san
secretName: target-certificate-secret-name
rules:
- host: some-certificate.name.san
http:
paths:
- path: /
backend:
serviceName: some-target-service
servicePort: some-target-service-port
```
DNS-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"`):
```
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace:
name: some-ingress-name
annotations:
# use the shared ingress-nginx
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"
spec:
tls:
- hosts:
- some-certificate.name.san
secretName: target-certificate-secret-name
rules:
- host: some-certificate.name.san
http:
paths:
- path: /
backend:
serviceName: some-target-service
servicePort: some-target-service-port
```
## Deploying a LE-certificate by CRD
All you need is a Certificate-CRD (Custom Resource Definition) like this one:
```
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: some-certificate
namespace: staging
spec:
# Secret names are always required.
secretName: some-secret
duration: 2160h # 90d
renewBefore: 360h # 15d
# The use of the common name field has been deprecated since 2000 and is
# discouraged from being used.
commonName: some.fully.qualified.domain.name
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 4096
usages:
- server auth
- client auth
# At least one of a DNS Name, URI, or IP address is required.
dnsNames:
- some.fully.qualified.domain.name
# Issuer references are always required.
issuerRef:
name:
# We can reference ClusterIssuers by changing the kind here.
# The default value is Issuer (i.e. a locally namespaced Issuer)
kind: ClusterIssuer
```
After the certificate was issued, you can reference it as a volume within a deployment:
```
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-ssl
name: nginx-ssl
namespace: staging
spec:
replicas: 1
selector:
matchLabels:
app: nginx-ssl
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx-ssl
spec:
volumes:
- name: nginx-ssl-volume
secret:
secretName: some-secret
containers:
- image: nginx
name: nginx-ssl
volumeMounts:
- mountPath: "/etc/nginx/ssl"
name: nginx-ssl-volume
readOnly: true
ports:
- containerPort: 80
restartPolicy: Always
```
## Troubleshooting
Docs: https://cert-manager.io/docs/faq/acme/
ClusterIssuers are *visible* any namespaces:
```
kubectl get clusterissuer
kubectl describe clusterissuer