* [kubectl - BASH autocompletion](#kubectl-bash-autocompletion)
* [Install k3s](#install-k3s)
* [On premises/IaaS](#install-k3s-on-premises)
* [Configure upstream DNS-resolver](#upstream-dns-resolver)
* [Change NodePort range](#nodeport-range)
* [Clustering](#clustering)
* [On Docker with k3d](#install-k3s-on-docker-k3d)
* [Namespaces and resource limits](#namespaces-limits)
* [Persistent volumes (StorageClass - dynamic provisioning)](#pv)
* [Rancher Local](#pv-local)
* [Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)](#pv-longhorn)
* [NFS](#pv-nfs)
* [Ingress controller](#ingress-controller)
* [Disable Traefik-ingress](#disable-traefik-ingress)
* [Enable NGINX-ingress with OCSP stapling](#enable-nginx-ingress)
* [Installation](#install-nginx-ingress)
* [Cert-Manager (references ingress controller)](#cert-manager)
* [Installation](#cert-manager-install)
* [Cluster-internal CA issuer](#cert-manager-cluster-ca-issuer)
* [Let´s Encrypt issuer](#cert-manager-le-issuer)
* [Deploying a LE-certificate with ingress](#cert-manager-ingress)
* [Deploying a LE-certificate by CRD](#cert-manager-crd)
* [Troubleshooting](#cert-manager-troubleshooting)
* [HELM charts](#helm)
* [Create a chart](#helm-create)
* [Install local chart without packaging](#helm-install-without-packaging)
* [List deployed helm charts](#helm-list)
* [Upgrade local chart without packaging](#helm-upgrade)
* [Get status of deployed chart](#helm-status)
* [Get deployment history](#helm-history)
* [Rollback](#helm-rollback)
* [Kubernetes in action](#kubernetes-in-action)
* [Running DaemonSets with `hostNetwork: true`](#running-daemonsets)
* [Running StatefulSet with NFS storage](#running-statefulset-nfs)
* [Services](#services)
* [Client-IP transparency and loadbalancing](#services-client-ip-transparency)
* [Session affinity/persistence](#services-session-persistence)
* [Keeping the cluster balanced](#keep-cluster-balanced)
* [Node maintenance](#node-maintenance)
* [What happens if a node goes down?](#what-happens-node-down)
* [Dealing with disruptions](#disruptions)
* [Troubleshooting](#troubleshooting)
* [Deleting a stuck namespace](#ts-delete-stuck-namespace)
* [Deleting stuck CRDs](#ts-delete-stuck-crd)
# kubectl - BASH autocompletion
For current shell only:
```
source <(kubectl completion bash)
```
Persistent:
```
echo "source <(kubectl completion bash)" >> ~/.bashrc
```
# Install k3s
## On premises/IaaS
https://k3s.io/:
```
curl -sfL https://get.k3s.io | sh -
```
### Upstream DNS-resolver
Docs: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/
Default: 8.8.8.8 => does not resolve local domains!
1. local /etc/resolv.k3s.conf -> ip-of-dnsresolver (127.0.0.1 **does not work!**)
2. vi /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --resolv-conf /etc/resolv.k3s.conf \
```
3. Re-load systemd config: `systemctl daemon-reload`
4. Re-start k3s: `systemctl restart k3s.service`
5. Re-deploy coredns-pods: `kubectl -n kube-system delete pod name-of-coredns-pods`
### Change NodePort range to 1 - 65535
1. vi /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --kube-apiserver-arg service-node-port-range=1-65535 \
```
2. Re-load systemd config: `systemctl daemon-reload`
3. Re-start k3s: `systemctl restart k3s.service`
### Clustering
If you want to build a K3s-cluster the default networking model is *overlay@VXLAN*. In this case make sure that
* all of your nodes can reach (ping) each other over the underlying network (local, routed/vpn). This is required for the overlay network to work properly. VXLAN spans a mashed network over all K3s-nodes.
* if your nodes are spread over public networks (like the internet) use a VPN (like IPSec or OpenVPN) to secure the traffic between the nodes. **VXLAN uses plain UDP for transport!**
* if your nodes are connected through VPN, `flannel` (overlay network daemon) should explicitly communicate via the vpn network interface instead of the public network interface. Following settings should be made on the nodes:
```
/etc/systemd/system/k3s-agent.service:
[...]
ExecStartPre=sleep 60
ExecStart=/usr/local/bin/k3s \
agent \
--flannel-iface \
```
## On Docker with K3d
K3d is a terraforming orchestrator which deploys a K3s cluster (masters and nodes) directly on docker without the need for virtual machines for each node (master/worker).
* Prerequisites: a local docker installation **without user-namespaces enabled**.
* **Warning**: K3d deploys privileged containers!
https://k3d.io/:
```
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
```
Create a K3s cluster without `traefik` as well as `metrics-server`
```
k3d cluster create cluster1 \
--agents 2 \
--k3s-server-arg '--disable=traefik' \
--k3s-server-arg '--disable=metrics-server' \
--k3s-server-arg '--kube-apiserver-arg=service-node-port-range=1-65535'
```
If you encounter `helm` throwing errors like this one:
```
Error: Kubernetes cluster unreachable
```
... just do:
```
$ kubectl config view --raw > ~/kubeconfig-k3d.yaml
$ export KUBECONFIG=~/kubeconfig-k3d.yaml
```
If you need to change the upstream DNS-resolver:
```
kubectl -n kube-system edit configmap coredns
```
Find the line containing
```
forward . /etc/resolv.conf
```
and change the content to
```
forward . ipaddr.of.your.dns-resolver
```
Finally re-deploy the CoreDNS deployment with:
`kubectl -n kube-system rollout restart deployment coredns`
**Note:** If you restart the cluster (`k3d cluster stop your-cluster` and `k3d cluster start your-cluster`), the changes will be gone!
# Namespaces and resource limits
```
kubectl apply -f https://gitea.zwackl.de/dominik/k3s/raw/branch/master/namespaces_limits.yaml
```
# Persistent Volumes (StorageClass - dynamic provisioning)
Read more about [AccessModes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)
## Rancher Local
https://rancher.com/docs/k3s/latest/en/storage/
Only supports *AccessMode*: ReadWriteOnce (RWO)
## Rancher Longhorn (distributed in local cluster) - MY FAVOURITE :-)
* Requirements: https://longhorn.io/docs/0.8.0/install/requirements/
* Debian: `apt install open-iscsi`
* Install: https://rancher.com/docs/k3s/latest/en/storage/
## NFS
For testing purposes as well as simplicity you may use following [NFS container image](https://hub.docker.com/r/itsthenetwork/nfs-server-alpine):
```
mkdir -p
docker run -d --name nfs-server \
--net=host \
--privileged \
-v /data/docker/nfs-server/data/:/nfsshare \
-e SHARED_DIRECTORY=/nfsshare \
itsthenetwork/nfs-server-alpine:latest
```
**All Nodes need to have the NFS-client package (Ubuntu: `nfs-common`) installed**
```
helm repo add ckotzbauer https://ckotzbauer.github.io/helm-charts
helm install my-nfs-client-provisioner --set nfs.server= --set nfs.path= ckotzbauer/nfs-client-provisioner
```
Check if NFS *StorageClass* is available:
```
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 101d
nfs-client cluster.local/my-nfs-client-provisioner Delete Immediate true 172m
```
Now you can use `nfs-client` as StorageClass like so:
```
apiVersion: apps/v1
kind: StatefulSet
[...]
volumeClaimTemplates:
- metadata:
name: nfs-backend
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: "nfs-client"
resources:
requests:
storage: 32Mi
```
or so:
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc-1
namespace:
spec:
storageClassName: "nfs-client"
accessModes:
- ReadWriteMany
resources:
requests:
storage: 32Mi
```
# Ingress controller
## Disable Traefik-ingress
edit /etc/systemd/system/k3s.service:
```
[...]
ExecStart=/usr/local/bin/k3s \
server [...] --disable traefik \
[...]
```
Finally `systemctl daemon-reload` and `systemctl restart k3s`
## Enable K8s own NGINX-ingress with OCSP stapling
### Installation
This is the helm chart of the K8s own nginx ingress controller:
https://kubernetes.github.io/ingress-nginx/deploy/#using-helm
```
kubectl create ns ingress-nginx
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx -n ingress-nginx
```
`kubectl -n ingress-nginx get all`:
```
NAME READY STATUS RESTARTS AGE
pod/svclb-nginx-ingress-controller-m6gxl 2/2 Running 0 110s
pod/nginx-ingress-controller-695774d99c-t794f 1/1 Running 0 110s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx-ingress-controller-admission ClusterIP 10.43.116.191 443/TCP 110s
service/nginx-ingress-controller LoadBalancer 10.43.55.41 192.168.178.116 80:31110/TCP,443:31476/TCP 110s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/svclb-nginx-ingress-controller 1 1 1 1 1 110s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-ingress-controller 1/1 1 1 110s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-ingress-controller-695774d99c 1 1 1 110s
```
The nginx ingress global configuration can be modified as follows:
```
kubectl -n ingress-nginx edit configmap ingress-nginx-controller
apiVersion: v1
<<>>
data:
enable-ocsp: "true"
use-gzip: "true"
worker-processes: "1"
<<>>
kind: ConfigMap
[...]
```
Finally the deployment needs to be restarted:
`kubectl -n ingress-nginx rollout restart deployment ingress-nginx-controller`
**If you are facing deployment problems like the following one**
```
Error: UPGRADE FAILED: cannot patch "gitea-ingress-staging" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-ingress-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded
```
A possible fix: `kubectl -n ingress-nginx delete ValidatingWebhookConfiguration ingress-nginx-admission`
# Cert-Manager (references ingress controller)
## Installation
Docs: https://hub.helm.sh/charts/jetstack/cert-manager
**Note on split-horizon DNS**: If you are planning to use DNS-01 validation in term of [split-horizon-DNS](https://en.wikipedia.org/wiki/Split-horizon_DNS), you will need to specify an external DNS-resolver (Google, Cloudflare or your ISPs resolver) instead of your internal upstream DNS-resolver for DNS self-checks! Read [this](https://cert-manager.io/docs/configuration/acme/dns01/#setting-nameservers-for-dns01-self-check) for further details.
```
helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.0.2/cert-manager.crds.yaml
kubectl create namespace cert-manager
helm install cert-manager --namespace cert-manager --set 'extraArgs={--dns01-recursive-nameservers-only,--dns01-recursive-nameservers=8.8.8.8:53\,1.1.1.1:53}' jetstack/cert-manager
kubectl -n cert-manager get all
```
## Cluster-internal CA Issuer
Docs: https://cert-manager.io/docs/configuration/ca/
## Let´s Encrypt HTTP-01 issuer
Docs: https://cert-manager.io/docs/tutorials/acme/ingress/#step-6-configure-let-s-encrypt-issuer
```
ClusterIssuers are a resource type similar to Issuers. They are specified in exactly the same way,
but they do not belong to a single namespace and can be referenced by Certificate resources from
multiple different namespaces.
```
lets-encrypt-cluster-issuers.yaml:
```
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http01-staging-issuer
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: user@example.com
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: letsencrypt-staging-account-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http01-prod-issuer
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: user@example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod-account-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: v1
kind: Secret
metadata:
name: tsig-dyn-update-secret
namespace: cert-manager
type: Opaque
data:
key: BASE64 encoded of BASE64 encoded (double-base64) TSIG-key
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns01-prod-issuer
spec:
acme:
email: user@example.com
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: letsencrypt-dns01-account-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- dns01:
rfc2136:
nameserver: ip_address_of_your_authoritative_nameserver:nameserver_port
tsigKeyName: name_of_tsig_key_in_your_authoritative_nameserver
tsigAlgorithm: HMACSHA512
tsigSecretSecretRef:
name: tsig-dyn-update-secret
key: key
selector:
dnsZones:
- 'int.example.org'
```
`kubectl apply -f lets-encrypt-cluster-issuers.yaml`
## Deploying a LE-certificate with ingress
All you need is an `Ingress` resource of class `nginx` which references a ClusterIssuer (`letsencrypt-http01-prod-issuer`) resource.
HTTP-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"`):
```
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace:
name: some-ingress-name
annotations:
# use the shared ingress-nginx
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-http01-prod-issuer"
spec:
tls:
- hosts:
- some-certificate.name.san
secretName: target-certificate-secret-name
rules:
- host: some-certificate.name.san
http:
paths:
- path: /
backend:
serviceName: some-target-service
servicePort: some-target-service-port
```
DNS-01 solver (`cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"`):
```
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace:
name: some-ingress-name
annotations:
# use the shared ingress-nginx
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-dns01-prod-issuer"
spec:
tls:
- hosts:
- some-certificate.name.san
secretName: target-certificate-secret-name
rules:
- host: some-certificate.name.san
http:
paths:
- path: /
backend:
serviceName: some-target-service
servicePort: some-target-service-port
```
## Deploying a LE-certificate by CRD
All you need is a Certificate-CRD (Custom Resource Definition) like this one:
```
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: some-certificate
namespace: staging
spec:
# Secret names are always required.
secretName: some-secret
duration: 2160h # 90d
renewBefore: 360h # 15d
# The use of the common name field has been deprecated since 2000 and is
# discouraged from being used.
commonName: some.fully.qualified.domain.name
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 4096
usages:
- server auth
- client auth
# At least one of a DNS Name, URI, or IP address is required.
dnsNames:
- some.fully.qualified.domain.name
# Issuer references are always required.
issuerRef:
name:
# We can reference ClusterIssuers by changing the kind here.
# The default value is Issuer (i.e. a locally namespaced Issuer)
kind: ClusterIssuer
```
After the certificate was issued, you can reference it as a volume within a deployment:
```
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx-ssl
name: nginx-ssl
namespace: staging
spec:
replicas: 1
selector:
matchLabels:
app: nginx-ssl
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx-ssl
spec:
volumes:
- name: nginx-ssl-volume
secret:
secretName: some-secret
containers:
- image: nginx
name: nginx-ssl
volumeMounts:
- mountPath: "/etc/nginx/ssl"
name: nginx-ssl-volume
readOnly: true
ports:
- containerPort: 80
restartPolicy: Always
```
## Troubleshooting
Docs: https://cert-manager.io/docs/faq/acme/
ClusterIssuers are *visible* any namespaces:
```
kubectl get clusterissuer
kubectl describe clusterissuer