Preface
OpenShift was first launched in 2011 and relied on Linux containers to deploy and run user applications. In 2014, Kubernetes was open-sourced by Google, pitched as" a system for automating deployment, scaling, and operations of application containers."
The release of OpenShift V3 was quite substantial. OpenShift began using containers and images, and to orchestrate those images, V3 introduced using Kubernetes.
Red Hat proved to be at the forefront of container technology, second only to Google in contributions to Cloud Native Computing Foundation (CNCF) projects. Moreover, the acquisition of CoreOS in January 2018. The CoreOS flagship product was a lightweight Linux operating system designed to run containerized applications, and Red Hat made available in OpenShift V4 as "Red Hat Enterprise Linux CoreOS".
Red Hat OpenShift is one of those technologies causing a lot of noise and demand for skills in the Information Technology industry. Still relatively young, it is under massive development and evolving faster than professionals can keep up.
The pragmatic OpenShift guide aims to provide a hands-on approach to deploying and configuring OpenShift 4.6.
OpenShift 4.6
Introduction
This document aims to remove the ambiguity sometimes found in the official documentation and using clear examples, demonstrate how to deploy an OpenShift cluster and complete the most common post-installation, configuration tasks.
The primary purpose of the information detailed is for learning and building a personal lab environment. Therefore the scenarios are not intended for production use. However, it is the know-how what counts.
OpenShift is a rapidly moving target, with minor releases often incrementing weekly, this document is focused on OpenShift 4.6.
A challenge for a would-be OpenShift administrator is the accessibility to technology. Minimum requirements are enormous, and let us remember OpenShift (based on Kubernetes) is a cloud-native platform first. Evident from when OpenShift 4.1 was first released. Initially, it only supported Amazon Web Services (AWS) using the Installer Provisioned Infrastructure (IPI).
Today, OpenShift now supports AWS, Azure, GCP, IBM, OpenStack, RHV, vSphere and bare metal. All of these have their nuances, and for a home lab, most are too costly for learning and experimenting.
Bare metal allows us to provision infrastructure, and the User Provisioned Infrastructure (UPI) installation enables customisations. The process of doing a UPI bare-metal installation is far more involved than say an AWS IPI. However, the knowledge gained is invaluable, and the result is a local cluster, albeit a minimal three-node cluster.
INFRASTRUCTURE
Infrastructure is the compute resources that software platforms get deployed on to. Traditionally and as demonstrated in this documentation that can be physical hardware. It might mean virtualisation using hypervisors such as VMware, Red Hat Virtualisation or Hyper-V for example. Most commonly, it will likely mean cloud infrastructure. Cloud infrastructure builds on virtualisation providing on-demand abstracted resources such as hardware, storage, and network resources.
In reality, cloud infrastructure resources are costly and move capital expenditure (CapEx) to operation expenditure (OpEx). For a lab environment, a monthly bill using cloud resources would equate to the outright purchase of sufficient hardware to keep forever. For production enterprise environments, resilience, scalability and flexibility of OpEx infrastructure are highly appealing and make sense. However, for learning, experimentation, and keeping costs down with the purchase of hardware for the long term is appealing.
This guide demonstrates using Intel® NUC’s for master nodes, which are affordable, compact with low power consumption. Furthermore, a Raspberry Pi used for the core utility services including DNS, Load Balancer and Apache webserver.
Architecture overview
The following diagram is a high-level overview of the lab environment deployed in this document. It depicts both the physical hosts and virtual hosts that make up a hybrid cluster. The virtual hosts include the temporary bootstrap host, only needed during the initial deployment of the three master nodes.
Three master nodes make up a minimal cluster. The nodes will play the role of "master", "worker" and "infra" nodes.
Further scaling of the cluster is optional and done using virtual machines (VM) with network bridging. Using VMs provides flexibility where resources are limited. Temporarily adding and removing worker and infrastructure nodes is excellent for trying various activities while keeping the three core physical master nodes permanently.

The following table includes the details of the environment used throughout this document:
DNS Name | IP Address | Description |
---|---|---|
|
|
RHEL8/Fedora32 client laptop |
|
|
Raspberry Pi 3 Model B, 4 core 1GB RAM, 8GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
Intel NUC i5, 4 core, 16GB RAM, 120GB storage |
|
|
Intel NUC i5, 4 core, 16GB RAM, 120GB storage |
|
|
Intel NUC i5, 4 core, 16GB RAM, 120GB storage |
DNS Name | IP Address | Description |
---|---|---|
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
|
|
KVM VM, 4 core, 16GB RAM, 120GB storage |
PREREQUISITES
Getting the prerequisites right is the essential part of deploying OpenShift. Mistakes with DNS, load balancing or networking in general, will only lead problems with the deployment of the cluster. Troubleshooting OpenShift deployments is notoriously challenging and often misleading towards the real root cause of issues.
With OpenShift 4, begin with a minimal working deployment, adding subsequent nodes and performing any cluster configuration post-deployment. |
Once bootstrapping is completed, the minimal cluster will look like this:

In this guide, a Raspberry Pi is used, but it does not need to be, any host either physical or virtual (providing it is either on the same subnet or sufficient routing configured) with CentOS 7 or 8 install will do.
Regardless of the device, Domain Name System (DNS) needs to be in place and the provisioning of two load balancers (LB). One LB for the Application Programming Interface (API) and another LB for ingress application traffic flowing in from outside the cluster. A web server is also needed to serve files and images used for provisioning hosts.
All steps documented assume a Linux client computer either Fedora, CentOS or Red Hat Enterprise Linux. |
Raspberry Pi
Refer to the following for information regarding CentOS for Raspberry Pi: wiki.centos.org
In this document CentOS-Userland-7-armv7hl-RaspberryPI-Minimal-2009-sda.raw.xz
was used.
xz
is a lossless compression program, if not already installed, install it on your client:
dnf install xz -y
Decompress the file:
unxz CentOS-Userland-7-armv7hl-RaspberryPI-Minimal-2009-sda.raw.xz
Use fdisk
to identify existing storage devices on your system, then insert the MicroSD card, using fdisk
again to identify the card:
fdisk -l
[ ... output omitted ... ]
Disk /dev/sda: 14.9 GiB, 15931539456 bytes, 31116288 sectors
[ ... output omitted ... ]
Using dd
write the image to the SD card:
sudo dd if=CentOS-Userland-7-armv7hl-RaspberryPI-Minimal-2009-sda.raw of=/dev/sda bs=8192 status=progress; sync
Insert the SD card and power on the Raspberry Pi, logging in as root
with the default password of centos
:
Username: root
Password: centos
Expand the filesystem with /usr/bin/rootfs-expand
.
/usr/bin/rootfs-expand
Set the hostname:
hostnamectl set-hostname utilities.cluster.lab.com
Remove NetworkManger:
yum remove NetworkManager
Edit /etc/sysconfig/network-scripts/ifcfg-eth0
and configure it as a static IP, I’m setting it to 192.168.0.101
:
DEVICE=eth0
TYPE=Ethernet
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.0.101
PREFIX=24
GATEWAY=192.168.0.1
DNS1=192.168.0.1
As a rule of thumb, take the time and effort to manage SELinux and firewalld correctly, in this case, to save time and focus on the prerequisites and deployment of OpenShift, disable both:
|
Disable SELinux:
vi /etc/sysconfig/selinux
SELINUX=disabled
Disable firewall:
systemctl stop firewalld
systemctl disable firewalld
These can be configured these later, but the less potential cause of issues the better because troubleshooting can be tricky if numerous problems are in the equation. Get the most basic working deployment complete, then introduce things one at a time making the process of troubleshooting "cause and effect" easier.
Install any updates and reboot:
yum update -y
reboot
Assuming the changes made are correct, test the static IP address using SSH from a client:
DNS
Install dnsmasq
:
yum install dnsmasq -y
Backup the original configuration:
cp /etc/dnsmasq.conf /etc/dnsmasq.conf.bak
The following dnsmasq.conf
configuration file includes a few things but a key line to point out is the apps.cluster.lab.com
line which provides the wildcard DNS resolution such as foo.apps.cluster.lab.com
or bar.apps.cluster.lab.com
:
Edit dnsmasq.conf:
vi /etc/dnsmasq.conf
server=192.168.0.1
server=8.8.8.8
server=8.8.4.4
local=/cluster.lab.com/
address=/apps.cluster.lab.com/192.168.0.101
interface=eth0
listen-address=::1,127.0.0.1,192.168.0.101
expand-hosts
domain=cluster.lab.com
addn-hosts=/etc/dnsmasq.openshift.hosts
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig
srv-host=_etcd-server-ssl._tcp.cluster,master1.cluster.lab.com,2380,0,10
srv-host=_etcd-server-ssl._tcp.cluster,master2.cluster.lab.com,2380,0,10
srv-host=_etcd-server-ssl._tcp.cluster,master3.cluster.lab.com,2380,0,10
Next I’m adding all the DNS entire I might ever need for a cluster:
vi /etc/dnsmasq.openshift.hosts
192.168.0.101 utilities.cluster.lab.com dns.cluster.lab.com lb.cluster.lab.com api.cluster.lab.com api-int.cluster.lab.com
192.168.0.102 bootstrap.cluster.lab.com
192.168.0.111 master1.cluster.lab.com etcd-0.cluster.lab.com
192.168.0.112 master2.cluster.lab.com etcd-1.cluster.lab.com
192.168.0.113 master3.cluster.lab.com etcd-2.cluster.lab.com
192.168.0.121 worker1.cluster.lab.com
192.168.0.122 worker2.cluster.lab.com
192.168.0.123 worker3.cluster.lab.com
192.168.0.131 infra1.cluster.lab.com
192.168.0.132 infra2.cluster.lab.com
192.168.0.133 infra3.cluster.lab.com
Next configure this host to use itself for DNS resolution:
vi /etc/resolv.conf
search Home cluster.lab.com
nameserver 192.168.0.101
Lock resolv.conf
from being modified:
chattr +i /etc/resolv.conf
Start and enable the service:
systemctl enable dnsmasq.service --now
Install bind-utils:
yum install bind-utils -y
Test some lookups, both IP Addresses and DNS entries should be resolvable, including anything.apps.cluster.lab.com
:
nslookup www.google.com
nslookup master1.cluster.lab.com
nslookup 192.168.0.111
nslookup foo.apps.cluster.lab.com
nslookup bar.apps.cluster.lab.com
HAProxy
Install HAProxy:
yum install haproxy -y
Back up the original configuration file:
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
And add the following configuration (changing IPs for your environment)
vi /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 30s
timeout queue 1m
timeout connect 30s
timeout client 1m
timeout server 1m
timeout http-keep-alive 30s
timeout check 30s
maxconn 4000
frontend api
bind 0.0.0.0:6443
option tcplog
mode tcp
default_backend api
backend api
option httpchk GET /healthz
http-check expect status 200
mode tcp
balance roundrobin
server bootstrap bootstrap.cluster.lab.com:6443 check check-ssl verify none
server master1 master1.cluster.lab.com:6443 check check-ssl verify none
server master2 master2.cluster.lab.com:6443 check check-ssl verify none
server master3 master3.cluster.lab.com:6443 check check-ssl verify none
frontend api-int
bind 0.0.0.0:22623
option tcplog
mode tcp
default_backend api-int
backend api-int
mode tcp
balance roundrobin
server bootstrap 192.168.0.102:22623 check
server master1 192.168.0.111:22623 check
server master2 192.168.0.112:22623 check
server master3 192.168.0.113:22623 check
frontend apps-http
bind 192.168.0.101:80
option tcplog
mode tcp
default_backend apps-http
backend apps-http
mode tcp
balance roundrobin
server master1 master1.cluster.lab.com:80 check
server master2 master2.cluster.lab.com:80 check
server master3 master3.cluster.lab.com:80 check
frontend apps-https
bind 192.168.0.101:443
option tcplog
mode tcp
default_backend apps-https
backend apps-https
mode tcp
balance roundrobin
option ssl-hello-chk
server master1 192.168.0.111:443 check
server master2 192.168.0.112:443 check
server master3 192.168.0.113:443 check
listen stats
bind 0.0.0.0:9000
mode http
balance
timeout client 5000
timeout connect 4000
timeout server 30000
stats uri /stats
stats refresh 5s
stats realm HAProxy\ Statistics
stats auth admin:changeme
stats admin if TRUE
This haproxy.conf
example purposely uses inconsistent methods of configuration between the load balancers to provide good working examples. The configuration here is correct for serving OpenShift requirements. Notice the configuration includes an HAProxy Statistics page that auto-refreshes, and that the apps-http
excludes the SSL check.
Enable and start HAProxy:
systemctl enable haproxy.service --now
View the graphical statistics report at http://192.168.0.101:9000/stats. In this example the username is admin
and password is changeme
. If you’ve pointed your local client to use 192.168.0.101
for its DNS, try http://lb.cluster.lab.com:9000/stats.
Apache web server
Install and configure httpd
on port 8080
(because port 80 is already used by HAProxy)
yum install httpd -y
Edit httpd.conf
:
vi /etc/httpd/conf/httpd.conf
Listen 8080
Enable and start the service:
systemctl enable httpd.service --now
Remember to append port 8080
when referring to this service, for example: http://192.168.0.101:8080/
For OpenShift bare metal installations, files can be copied into /var/www/html
on this utilities server.
INSTALLATION
At the time of writing the latest version is 4.6.4. The downloads necessary are publicly available however download a legitimate pull secret from the Red Hat cluster portal. Vising https://cloud.redhat.com/openshift/install/metal/user-provisioned for the latest versions and obtaining your pull secret.
Client tools
Download the OpenShift installer program on your client computer:
Download the OpenShift command-line tools:
Checksums:
Check the integrity of the downloaded files:
sha256sum openshift-client-linux.tar.gz
sha256sum openshift-install-linux.tar.gz
c1f39a966fc0dbd4f8f0bfec0196149d54e0330de523bf906bbe2728b10a860b openshift-client-linux.tar.gz
b81e1d25d77a05eaae8c0f154ed563c2caee21ed63401d655a7ad3206fdfd53c openshift-install-linux.tar.gz
Make a bin
directory in your home directory:
mkdir ~/bin
You may prefer to extract to /usr/local/bin/
|
Extract the CLI tools:
tar -xvf openshift-client-linux.tar.gz -C ~/bin
tar -xvf openshift-install-linux.tar.gz -C ~/bin
Check the oc
version:
oc version
Client Version: 4.6.4
Check the openshift-install
version:
openshift-install version
openshift-install 4.6.4
Install preparation
Create a working directory, for example:
mkdir -p ~/ocp4/cluster && cd ~/ocp4
Generate a new SSH key pair that will be embedded into the OpenShift deployment, this will enable you to SSH to OpenShift nodes.
ssh-keygen -t rsa -b 4096 -N '' -f cluster_id_rsa
Create an installation configuration file, the compute replicas is always set to zero for bare metal, this refers to worker nodes which are manually added post-deployment. The key option here is the controlPlane: replicas
needs to be either 1
for a single node cluster or 3
for the minimal three node cluster. The bootstrap process does not complete until this defined critical is met, so plan ahead!
vi install-config.yaml.orig
apiVersion: v1
baseDomain: lab.com
compute:
- hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: cluster
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: '{"auths": ...}'
sshKey: 'ssh-ed25519 AAAA...'
Copy the configuration file into the cluster directory, it’s important to have the original copy because the installation process destroys it! It’s handy to keep for reference and because in reality it usually takes a few attempts to get right.
Remember to paste in your real pull secret and public key. |
cp install-config.yaml.orig cluster/install-config.yaml
The following two commands create and initiates the installation process. The first create manifests step gives you an opportunity to make further tweak to the deployment. The create ignition-configs uses those manifest to create the ignition files.
openshift-install create manifests --dir=cluster
openshift-install create ignition-configs --dir=cluster
The files in the cluster
directory should now look like this:
auth
bootstrap.ign
master.ign
metadata.json
worker.ign
Dependencies
Download the installer ISO image and the compressed metal RAW:
Checksums:
Check the integrity of the downloaded files:
sha256sum rhcos-installer.x86_64.iso
sha256sum rhcos-metal.x86_64.raw.gz
d15bd7ae942573eece34ba9c59e110e360f15608f36e9b83ab9f2372d235bef2 rhcos-installer.x86_64.iso
7e61bbe56735bc26d0808d4fffc4ccac25554df7d3c72c7b678e83e56c7ac5ba rhcos-metal.x86_64.raw.gz
Copy the three ignition files and the Red Hat CoreOS image to the utilities.cluster.lab.com
, to be served by Apache:
scp cluster/*.ign [email protected]:/var/www/html/
Copy the Red Hat CoreOS image:
scp rhcos-metal.x86_64.raw.gz [email protected]:/var/www/html/
On utilities.cluster.lab.com
ensure file permissions are correct:
chmod 644 /var/www/html/*
From the client computer test these files are available to download via HTTP:
wget http://192.168.0.101:8080/bootstrap.ign
Bootstrap node
Using either using Virtual machine manager to create a KVM VM or VirtualBox, create a Virtual Machine with 4 cores, 16GB RAM (16384) and 120GB of storage. This VM will is destroyed after the cluster installation is complete.
Using the rhcos-installer.x86_64.iso
boot the VM up, until you arrive at a command prompt:
$[[email protected] ~]$
The VM will have an IP Address assigned via DHCP, we need to set a static IP.
View current interface IP Address:
ip a
View the connection:
nmcli con show
Connection
Set the IPAddress for the bootstrap node:
nmcli con mod 'Wired Connection' ipv4.method manual ipv4.addresses 192.168.0.102/24 ipv4.gateway 192.168.0.1 ipv4.dns 192.168.0.101 connection.autoconnect yes
Restart Network Manager and bring up the connection:
sudo systemctl restart NetworkManager
nmcli con up 'Wired Connection'
Start the CoreOS installer, providing the bootstrap.ign
ignition file:
sudo coreos-installer install --ignition-url=http://192.168.0.101:8080/bootstrap.ign /dev/sda --insecure-ignition --copy-network
Reboot the VM with reboot
, make sure the VM boots from the hard disk storage (eject the ISO before it boots) or shutdown
the VM and remove the CD-ROM from the boot order.
Make sure the VM boots up with the correct IP Address previously assigned:

Once the bootstrap node is up and at the login prompt with the correct IP Address, the VM should provision itself, and eventually come up in the load balancer http://192.168.0.101:9000/stats:

From a Linux client you should be able to SSH to it using the private key generated earlier:
ssh -i cluster_id_rsa [email protected]
Check the progress on the bootstrap node with:
journalctl -b -f -u release-image.service -u bootkube.service
The following pods should eventually be up and running:
sudo crictl pods
...Ready bootstrap-kube-scheduler-bootstrap.cluster.lab.com...
...Ready bootstrap-kube-controller-manager-bootstrap.cluster.lab.com...
...Ready bootstrap-kube-apiserver-bootstrap.cluster.lab.com...
...Ready cloud-credential-operator-bootstrap.cluster.lab.com...
...Ready bootstrap-cluster-version-operator-bootstrap.cluster.lab.com...
...Ready bootstrap-machine-config-operator-bootstrap.cluster.lab.com...
...Ready etcd-bootstrap-member-bootstrap.cluster.lab.com...
List the running containers and tail the logs of any one:
sudo crictl ps
sudo crictl logs <CONTAINER_ID>
From the the Linux client the following command should return ok
:
curl -X GET https://api.cluster.lab.com:6443/healthz -k
Export the kubeconfig
and test getting cluster operators with oc get co
:
export KUBECONFIG=cluster/auth/kubeconfig
oc get co
You’ll only see the cloud-credential
operator is available at this stage:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication
cloud-credential True False False 26m
cluster-autoscaler
config-operator
console
csi-snapshot-controller
dns
etcd
...
All of these tests MUST work as documented else it’s pointless continuing any further. |
Any other responses or errors mean there are issues with either networking, DNS or Load Balancing configurations. Go back and troubleshoot any issues until you get the expected results at his stage.
On your client you can see the progress of the installation and that it’s moved on a step because api.cluster.lab.com is up and working:
openshift-install --dir=cluster wait-for bootstrap-complete
INFO Waiting up to 20m0s for the Kubernetes API at https://api.cluster.lab.com:6443...
INFO API v1.19.0+9f84db3 up
INFO Waiting up to 30m0s for bootstrapping to complete...
The bootstrapping process will not complete until all three master nodes have been provisioned.
Master nodes
For physical host installations, write the rhcos-installer.x86_64.iso
image to a USB pen drive.
Use fdisk
to identify existing storage devices on your system, then insert the USB pen drive, using fdisk
again to identify the device:
fdisk -l
[ ... output omitted ... ]
Disk /dev/sda: 58.5 GiB, 62763565056 bytes, 122585088 sectors
[ ... output omitted ... ]
Write the image to the device:
sudo dd if=rhcos-installer.x86_64.iso of=/dev/sda status=progress; sync
The next steps repeat the process of booting the three physical nodes using the Red Hat CoreOS ISO. Make sure to use master.ign
, and the right IP Address and hostname for each master node. In the case of an Intel NUC, F10
is used to interrupt the host BIOS and select a boot device.
master1.cluster.lab.com
Using the rhcos-installer.x86_64.iso
USB device, boot the VM up, until you arrive at a command prompt:
$[[email protected] ~]$
The VM will have an IP Address assigned via DHCP, we need to set a static IP.
View current interface IP Address:
ip a
View the connection:
nmcli con show
Set the IP Address for the bootstrap node:
nmcli con mod 'Wired Connection' ipv4.method manual ipv4.addresses 192.168.0.111/24 ipv4.gateway 192.168.0.1 ipv4.dns 192.168.0.101 connection.autoconnect yes
Restart Network Manager and bring up the connection:
sudo systemctl restart NetworkManager
nmcli con up 'Wired Connection'
Start the CoreOS installer, providing the master.ign
ignition file:
sudo coreos-installer install --ignition-url=http://192.168.0.101:8080/master.ign /dev/sda --insecure-ignition --copy-network
Reboot the VM with reboot
, make sure the VM boots from the hard disk storage (remove the USB/ISO before it boots) or shutdown
the VM and remove the CD-ROM from the boot order and power it back on.
Hit tab
at the RHCOS GRUB menu and add the following:
ip=192.168.0.111::192.168.0.1:255.255.255.0:master1.cluster.lab.com:ens3:none nameserver=192.168.0.101
Unable to provide a screenshot of a physical host GRUB configuration, here is the example when repeating this process for an infra node:

It’s unclear why this step is needed but with nodes other than the bootstrap node, this intervention was required. There are better methods for provisioning nodes but this documentation is focused on the most fundamental approach.
|
Prior to OCP 4.6, all the CoreOS parameters where added at the GRUB stage, for reference here are the original parameters:
coreos.inst=yes
coreos.inst.install_dev=sda
coreos.inst.image_url=http://192.168.0.101:8080/rhcos-metal.raw.gz
coreos.inst.ignition_url=http://192.168.0.101:8080/master.ign
ip=192.168.0.111::192.168.0.1:255.255.255.0:master1.cluster.lab.com:eno1:none:192.168.0.101
nameserver=192.168.0.101
master2.cluster.lab.com
Repeat the process for the second master node:
nmcli con mod 'Wired Connection' ipv4.method manual ipv4.addresses 192.168.0.112/24 ipv4.gateway 192.168.0.1 ipv4.dns 192.168.0.101 connection.autoconnect yes
sudo systemctl restart NetworkManager
nmcli con up 'Wired Connection'
sudo coreos-installer install --ignition-url=http://192.168.0.101:8080/master.ign /dev/sda --insecure-ignition
ip=192.168.0.112::192.168.0.1:255.255.255.0:master2.cluster.lab.com:ens3:none nameserver=192.168.0.101
Original bootstrap parameters:
coreos.inst=yes
coreos.inst.install_dev=sda
coreos.inst.image_url=http://192.168.0.101:8080/rhcos-metal.raw.gz
coreos.inst.ignition_url=http://192.168.0.101:8080/master.ign
ip=192.168.0.112::192.168.0.1:255.255.255.0:master2.cluster.lab.com:eno1:none:192.168.0.101
nameserver=192.168.0.101
master3.cluster.lab.com
Repeat the process for the third master node:
nmcli con mod 'Wired Connection' ipv4.method manual ipv4.addresses 192.168.0.113/24 ipv4.gateway 192.168.0.1 ipv4.dns 192.168.0.101 connection.autoconnect yes
sudo systemctl restart NetworkManager
nmcli con up 'Wired Connection'
sudo coreos-installer install --ignition-url=http://192.168.0.101:8080/master.ign /dev/sda --insecure-ignition
ip=192.168.0.113::192.168.0.1:255.255.255.0:master3.cluster.lab.com:ens3:none nameserver=192.168.0.101
Original bootstrap parameters:
coreos.inst=yes
coreos.inst.install_dev=sda
coreos.inst.image_url=http://192.168.0.101:8080/rhcos-metal.raw.gz
coreos.inst.ignition_url=http://192.168.0.101:8080/master.ign
ip=192.168.0.113::192.168.0.1:255.255.255.0:master3.cluster.lab.com:eno1:none:192.168.0.101
nameserver=192.168.0.101
Completion
Once all three master nodes are provisioning the process can take some time to complete. As indicated by the installer INFO "Waiting up to 40m0s for bootstrapping to complete".
The two key things to watch are the load balancers and cluster operators. Once the master node boots up to the login prompt, it will download a bunch of images and do some initial installation, and the host will perform a reboot and come back to the login prompt during this process.
Once all load balancers are showing up, and all cluster operators are "Available" the openshift-install
should complete and advise removing the bootstrap node.
openshift-install --dir=cluster wait-for bootstrap-complete
INFO Waiting up to 20m0s for the Kubernetes API at https://api.cluster.lab.com:6443...
INFO API v1.19.0+9f84db3 up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO It is now safe to remove the bootstrap resources
INFO Time elapsed: 0s
Check all nodes are "Ready":
oc get nodes
NAME STATUS ROLES AGE VERSION
master1.cluster.lab.com Ready master,worker 14h v1.19.0+9f84db3
master2.cluster.lab.com Ready master,worker 13h v1.19.0+9f84db3
master3.cluster.lab.com Ready master,worker 13h v1.19.0+9f84db3
Check all operators are available:
oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.6.4 True False False 8m34s
cloud-credential 4.6.4 True False False 15h
cluster-autoscaler 4.6.4 True False False 13h
config-operator 4.6.4 True False False 13h
console 4.6.4 True False False 7m35s
csi-snapshot-controller 4.6.4 True False False 13h
dns 4.6.4 True False False 13h
etcd 4.6.4 True False False 11h
image-registry 4.6.4 True False False 11h
ingress 4.6.4 True False False 8m40s
insights 4.6.4 True False False 13h
kube-apiserver 4.6.4 True False False 11h
kube-controller-manager 4.6.4 True False False 13h
kube-scheduler 4.6.4 True False False 13h
kube-storage-version-migrator 4.6.4 True False False 13h
machine-api 4.6.4 True False False 13h
machine-approver 4.6.4 True False False 13h
machine-config 4.6.4 True False False 13h
marketplace 4.6.4 True False False 8m23s
monitoring 4.6.4 True False False 8m18s
network 4.6.4 True False False 13h
node-tuning 4.6.4 True False False 13h
openshift-apiserver 4.6.4 True False False 8m55s
openshift-controller-manager 4.6.4 True False False 13h
openshift-samples 4.6.4 True False False 8m21s
operator-lifecycle-manager 4.6.4 True False False 13h
operator-lifecycle-manager-catalog 4.6.4 True False False 13h
operator-lifecycle-manager-packageserver 4.6.4 True False False 8m57s
service-ca 4.6.4 True False False 13h
storage 4.6.4 True False False 13h
Power off the bootstrap node (and destroy it) and comment out the node in both the api
and api-int
load balancers in /etc/haproxy/haproxy.cfg
.
The load balancers should look like the following screenshots, note that the ingress LB only has two replicas, therefore will show down on one of the nodes.


Login
During installation and from your client you can access the cluster using the system:admin
account:
export KUBECONFIG=cluster/auth/kubeconfig
oc whoami
system:admin
Login is as kubeadmin
:
cat cluster/auth/kubeadmin-password
oc login -u kubeadmin -p kLsUd-GkkRt-GwPI7-n2cku https://api.cluster.lab.com:6443
Login to the OpenShift web console:
oc project openshift-console
oc get routes
For example, in a browser https://console-openshift-console.apps.cluster.lab.com
Troubleshooting
Single master
It is possible to deploy a single node "cluster" if defined in the install-config.yaml
, however the installation never completes, with operators pending. Apply the following patch for the installation to complete with a single master configuration:
oc patch etcd cluster -p='{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}' --type=merge
Unknown authority
The following error can sometimes occur when attempting to login to the API via the command line:
error: x509: certificate signed by unknown authority
Switch projects:
oc project openshift-authentication
List the pods in the openshift-authentication
project:
oc get pods
Using one of the pod names export the ingress certificate:
oc rsh -n openshift-authentication oauth-openshift-568bcc5d8f-84zh2 cat /run/secrets/kubernetes.io/serviceaccount/ca.crt > ingress-ca.crt
Copy and update your certificate authority certificates on your client host:
sudo cp ingress-ca.crt /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
Missing Console
If both the openshift-samples
and console
operators were absent during deployment of a cluster. Powering off all three masters and powering them back on brought all the operators up.
NODES
For a home lab, it might be common to leave the cluster as a minimal three-node cluster. Resources can be tight; however, in the real world, it is likely to provision dedicated infra nodes and many worker nodes. It is then possible to label nodes accordingly and mark specific applications to only run on designated infrastructure.
Whatever the labelling conventions used, all nodes from this point are "worker" nodes with labels. |
Infra nodes
To avoid duplication just the key details are included, in this case three infra VMs are provisioned with bridged networking, each with 8GB Memory, 2 cores and 50GB storage. Pay close attention to the IP Addresses and the use of the worker.ign
ignition file.
infra1.cluster.lab.com
nmcli con mod 'Wired Connection' ipv4.method manual ipv4.addresses 192.168.0.131/24 ipv4.gateway 192.168.0.1 ipv4.dns 192.168.0.101 connection.autoconnect yes
sudo systemctl restart NetworkManager
nmcli con up 'Wired Connection'
sudo coreos-installer install --ignition-url=http://192.168.0.101:8080/worker.ign /dev/sda --insecure-ignition --copy-network
ip=192.168.0.131::192.168.0.1:255.255.255.0:infra1.cluster.lab.com:ens3:none nameserver=192.168.0.101
The node should deploy as seen while doing master nodes but worker nodes get provisioned by the masters. Once a worker node has booted up off its hard drive and arrived at the login prompt, they will be a reboot.
Check for certificate signing requests, as a cluster administrator view any pending csr:
oc get csr
Refined:
oc get csr | grep -i pending
Approve them:
oc adm certificate approve csr-xyz
Typically, approving the first pending CSR, will cause a second one to appear shortly afterwards.
Once both CRS are approved you’ll see the node with a status NotReady:
oc get nodes
NAME STATUS ROLES AGE VERSION
infra1.cluster.lab.com NotReady worker 25s v1.19.0+9f84db3
master1.cluster.lab.com Ready master,worker 16h v1.19.0+9f84db3
master2.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
master3.cluster.lab.com Ready master,worker 14h v1.19.0+9f84db3
The deployment of the node is still in progress, eventually the node with change status to "Ready".
Example when all three infra nodes are initially added to the cluster:
NAME STATUS ROLES AGE VERSION
infra1.cluster.lab.com Ready worker 68s v1.19.0+9f84db3
infra2.cluster.lab.com Ready worker 32m v1.19.0+9f84db3
infra3.cluster.lab.com Ready worker 11m v1.19.0+9f84db3
master1.cluster.lab.com Ready master,worker 16h v1.19.0+9f84db3
master2.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
master3.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
You repeat this for any other infra nodes desired, most commonly at least three to achieve high availability, as depicted in the example above.
Notice the ROLE for the infra node is currently set to worker .
|
Label infra nodes
Create the infra machine config pool infra-mcp.yaml
:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
maxUnavailable: 1
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
paused: false
oc create -f infra-mcp.yaml
Label infra nodes and remove worker label:
The adding of infra label forces a reboot of the node, wait for reboot before removing the worker label |
oc label node infra1.cluster.lab.com node-role.kubernetes.io/infra=
oc label node infra2.cluster.lab.com node-role.kubernetes.io/infra=
oc label node infra3.cluster.lab.com node-role.kubernetes.io/infra=
Each node will be marked as non-schedulable and reboot in turn:
watch oc get nodes
infra1.cluster.lab.com Ready,SchedulingDisabled infra,worker 5m3s v1.19.0+9f84db3
Once completed, each infra node will be labelled both infra,worker
, remove the worker
label from each:
oc label node infra1.cluster.lab.com node-role.kubernetes.io/worker-
oc label node infra2.cluster.lab.com node-role.kubernetes.io/worker-
oc label node infra3.cluster.lab.com node-role.kubernetes.io/worker-
Node should look like:
oc get nodes
NAME STATUS ROLES AGE VERSION
infra1.cluster.lab.com Ready infra 11m v1.19.0+9f84db3
infra2.cluster.lab.com Ready infra 42m v1.19.0+9f84db3
infra3.cluster.lab.com Ready infra 21m v1.19.0+9f84db3
master1.cluster.lab.com Ready master,worker 16h v1.19.0+9f84db3
master2.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
master3.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
Worker nodes
Add worker nodes by repeating the same process as adding infra nodes without any labelling. Making sure IP Addresses and host names are correct during the process.
A typical deployment might look like this:
NAME STATUS ROLES AGE VERSION
infra1.cluster.lab.com Ready infra 11m v1.19.0+9f84db3
infra2.cluster.lab.com Ready infra 42m v1.19.0+9f84db3
infra3.cluster.lab.com Ready infra 21m v1.19.0+9f84db3
master1.cluster.lab.com Ready master,worker 16h v1.19.0+9f84db3
master2.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
master3.cluster.lab.com Ready master,worker 15h v1.19.0+9f84db3
worker1.cluster.lab.com Ready worker 68s v1.19.0+9f84db3
worker2.cluster.lab.com Ready worker 32m v1.19.0+9f84db3
worker3.cluster.lab.com Ready worker 11m v1.19.0+9f84db3
Move ingress router
It’s common practice to move the ingress router off the master nodes and run three replicas on the infra nodes. With the correctly labelled infranodes in place this is done with the following two patches.
oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"nodePlacement": {"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/infra": "" }}}}}' --type=merge
oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge
Update the backend ingress load balancer, in this case, HAProxy, to now point at the three infra nodes. |
Disable master scheduler
Disabling the master scheduler removes the "worker" label from master nodes, preventing unwanted applications from running on master nodes, reserving their resources for running the cluster.
oc edit scheduler
Set mastersSchedulable to false:
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
creationTimestamp: null
name: cluster
spec:
mastersSchedulable: false
policy:
name: ""
status: {}
Validate:
oc get nodes
NAME STATUS ROLES AGE VERSION
infra1.cluster.lab.com Ready infra 11m v1.19.0+9f84db3
infra2.cluster.lab.com Ready infra 42m v1.19.0+9f84db3
infra3.cluster.lab.com Ready infra 21m v1.19.0+9f84db3
master1.cluster.lab.com Ready master 16h v1.19.0+9f84db3
master2.cluster.lab.com Ready master 15h v1.19.0+9f84db3
master3.cluster.lab.com Ready master 15h v1.19.0+9f84db3
worker1.cluster.lab.com Ready worker 68s v1.19.0+9f84db3
worker2.cluster.lab.com Ready worker 32m v1.19.0+9f84db3
worker3.cluster.lab.com Ready worker 11m v1.19.0+9f84db3
Delete nodes
oc get nodes
oc delete node infra1.cluster.lab.com
NTP/CHRONY
In a lab environment, chronyd
will be already configured on nodes with the default pool 2.rhel.pool.ntp.org iburst
. Should the configuration need to be changed, the process involves adding MachineConfig. Machine Configs are found via the Web Console under Compute → Machine Config. Think of Machine Configs as configuration management for the cluster.
SSH to a master node and switch to root, for example:
ssh -i cluster_id_rsa [email protected]
sudo su -
Get a working minimalistic chrony.conf
:
grep -v -e '^#' -e '^$' /etc/chrony.conf > chrony.conf
On a client make a copy of the chrony.conf
configuration file:
vi chrony.conf
pool 2.rhel.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
keyfile /etc/chrony.keys
leapsectz right/UTC
logdir /var/log/chrony
Encoded the file:
base64 chrony.conf > chrony.conf.encoded
Create a MachineConfig for worker nodes, pasting in the chrony.conf.encoded content:
vi worker-chrony.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: worker-chrony
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,cG9vbCAyLnJoZWwucG9vbC5udHAub3JnIGlidXJzdApkcmlmdGZpbGUgL3Zhci9saWIvY2hyb255L2RyaWZ0Cm1ha2VzdGVwIDEuMCAzCnJ0Y3N5bmMKa2V5ZmlsZSAvZXRjL2Nocm9ueS5rZXlzCmxlYXBzZWN0eiByaWdodC9VVEMKbG9nZGlyIC92YXIvbG9nL2Nocm9ueQo=
verification: {}
filesystem: root
mode: 0644
path: /etc/chrony.conf
Applying the configuration causes nodes to schedule a reboot, expect each worker node to bounce in sequence. |
watch oc get nodes
oc create -f worker-chrony.yaml
And repeat for master:
vi master-chrony.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: master-chrony
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,cG9vbCAyLnJoZWwucG9vbC5udHAub3JnIGlidXJzdApkcmlmdGZpbGUgL3Zhci9saWIvY2hyb255L2RyaWZ0Cm1ha2VzdGVwIDEuMCAzCnJ0Y3N5bmMKa2V5ZmlsZSAvZXRjL2Nocm9ueS5rZXlzCmxlYXBzZWN0eiByaWdodC9VVEMKbG9nZGlyIC92YXIvbG9nL2Nocm9ueQo=
verification: {}
filesystem: root
mode: 0644
path: /etc/chrony.conf
Remeber, applying the configuration causes nodes to schedule a reboot, expect each master node to bounce in sequence. |
oc create -f master-chrony.yaml
IMAGE REGISTRY
In typical OpenShift deployments, using a cloud provider with credentials, storage classes will be made available for the target infrastructure. In a bare-metal situation, this is not a luxury. It is simple to define NFS for shared storage, which is OK for specific services and tasks but something like logging; performance will take a hit. It is possible to add physical block devices to nodes and use the "Local Storage Operator". However, local storage volumes are fixed to nodes, so moving pods that depend on block storage will run into difficulties, not being able to mount storage to different nodes on demand.
For the image registry, NFS shared storage is the right choice.
NFS server
Using a RHEL/CentOS 8.2 host on the same network as your OpenShift Cluster install nfs-utils
:
sudo dnf install nfs-utils -y
systemctl start nfs-server
systemctl enable nfs-server
systemctl status nfs-server
sudo mkdir -p /mnt/openshift/registry
vi /etc/exports
Add the following, including the options for rw,no_wdelay,no_root_squash
:
/mnt/openshift/registry 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
Export the new share with:
exportfs -arv
And confirm the share is visible:
exportfs -s
showmount -e 127.0.0.1
If required, open up the firewall ports needed:
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --permanent --add-service=mountd
firewall-cmd --reload
NFS storage class
Add a storage class with the no-provisioner option, making it a manual process:
vi nfs-storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs
provisioner: no-provisioner
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
oc create -f nfs-storage-class.yaml
See storage classes via the web console Storage → Storage Classes
Configuration
You can now add persistent volume(s) (PV) using the nfs
storage class:
vi registry-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: registry-pv
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
nfs:
path: /mnt/openshift/registry
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
oc create -f registry-pv.yaml
And view the result:
oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
registry-pv 50Gi RWX Retain Available nfs 3s
To update the registry storage, you can add a persistent volume claim (PVC) using the new NFS storage class:
vi registry-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: image-registry-storage
namespace: openshift-image-registry
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: nfs
oc create -f registry-pvc.yaml
The default name for the pvc created is image-registry-storage which is a known here.
|
Switch to the openshift-image-registry
project and view the pending PVC:
oc project openshift-image-registry
oc get pvc
It will be currently pending:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
image-registry-storage Pending nfs 15s
And edit the image registry configuration:
oc edit configs.imageregistry.operator.openshift.io
Three things need changing under spec:
include managementState
, replicas
and storage
:
...
managementState: Managed
...
...
replica: 3
...
...
storage:
pvc:
claim: image-registry-storage
You can check the state/progress of these changes by viewing the pods:
oc project openshift-image-registry
oc get pods
NAME READY STATUS RESTARTS AGE
cluster-image-registry-operator-6c55f65c7d-sst5g 2/2 Running 0 18h
image-pruner-1605225600-cpm8d 0/1 Completed 0 10h
image-registry-659c75894d-28mp4 1/1 Running 0 18h
image-registry-659c75894d-5mx25 1/1 Running 0 18h
image-registry-659c75894d-zqxcq 1/1 Running 0 18h
node-ca-vj6ql 1/1 Running 0 3d
node-ca-wjk57 1/1 Running 0 3d
node-ca-ww946 1/1 Running 0 3d
And see the PVC has been claimed:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
image-registry-storage Bound registry-pv 50Gi RWX nfs
Expose registry
Finally, you can expose the OpenShift image registry to enable you to work with it using Docker or Podman to tag and push images, make sure your in the openshift-image-registry
project or add -n openshift-image-registry
to include namespace with the command:
oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
oc get routes
Migrate registry
To move the image registry to run on infra nodes apply the follwoing patch:
oc patch configs.imageregistry.operator.openshift.io/cluster -n openshift-image-registry --type=merge --patch '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
Check where pods are running by adding -o wide
to the following command:
oc get pods -o wide
Troubleshooting
No route to host
If pods never get past ContainerCreating
, use oc describe pod
to see details:
oc project openshift-image-registry
oc get pods
oc describe pod image-registry-5cc87cc5b8-4k6l6
If you see:
mount.nfs: No route to host
It’s either the PV is is configured incorrectly, pointing to a wrong NFS server or the NFS server/share is being blocked by a firewall or unavailable.
Unexpected status
If you see errors with OpenShift deployments later like this:
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: [email protected]
Registry server Password: <<non-empty>>
error: build error: Failed to push image: error copying la... received unexpected HTTP status: 500 Internal Server Error
The permissions on the share directory need fixing:
chmod 775 /mnt/openshift/registry
Undo storage config
If you need to revert back to a known working configuration, you can make it ephemeral by replacing the registry storage with:
oc edit configs.imageregistry.operator.openshift.io
storage:
emptyDir: {}
Delete PVC:
oc delete pvc image-registry-storage
Delete PV:
oc delete pv registry-pv
LOCAL STORAGE
Using the local storage operator deals with adding real disks, block storage to nodes.
Prior to OCP 4.6, a project needed to be created manually:
oc new-project local-storage
To enable local storage on all nodes including masters and infras:
oc annotate project local-storage openshift.io/node-selector=''
Navigate to Operators → OperatorHub, type "Local Storage" into the filter box to locate the Local Storage Operator which now creates openshift-local-storage
project namespace.
Local storage class
Like with NFS, next create a new custom storage class for local block storage:
vi local-storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-sc
provisioner: no-provisioner
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
parameters:
diskformat: thin
oc create -f local-storage-class.yaml
See storage classes via the web console Storage → Storage Classes
Adding block storage
Adding disks to servers is either physical activity, or a simple case of adding disks to VMs in which ever virtualization in use.
Once block storage is added, determine the device paths for the new devices, SSH to each node and use fdisk -l
to see devices.
ssh -i cluster_id_rsa [email protected]
sudo fdisk -l
Commonly, new devices will begin with /dev/sdb
(sda
used by RHCOS).
Its good practice to manage each node because paths might differ depending on the environment.
Assuming three infra nodes, each with a new 50GB disk attached, here is infra1:
vi local-disks-infra1.yaml
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks-infra1"
namespace: "openshift-local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- infra1.cluster.lab.com
storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sdb
oc create -f local-disks-infra1.yaml
In a short time, see the PVs appear:
oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-297ca047 50Gi RWO Retain Available local-sc 53s
local-pv-dc609890 50Gi RWO Retain Available local-sc 63s
local-pv-fe609342 50Gi RWO Retain Available local-sc 70s
Navigate to Administration → Custom Resource Definitions → LocalVolume → Instances to view Local Volumes.
Repeat this process for each node in the cluster that needs local block storage made available.
LOGGING
OpenShift comes with its native logging stack using Elasticsearch, Fluentd, and Kibana (EFK). It can be very resource-intensive, in a production environment dedicate resource plentiful "infra" nodes to run Elasticsearch and use decent block storage.
Making do with limited resources for home lab environments.
Install the "Elastic Search" and "Cluster Logging" operators via the Web Console, See https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-deploying.html
Make sure you select operators provided by Red Hat, Inc and not proprietary or community versions. |
Check the current state of the cluster-logging project:
oc project openshift-logging
oc get pods
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-f58c98989-2jrxx 1/1 Running 0 28m
Ephemeral logging
To run logging with ephemeral storage, meaning all of a pod’s data is lost upon restart because no real storage is provided, perfect for a lab.
Via the web console, Administration → Custom Resource Definitions → ClusterLogging → Instances → Create ClusterLogging.
Make note of the resources: limits:
, The following example has reduced memory defined and emptyDir: {}
for storage, nodeSelector
can be omitted if no infra nodes are defined.
Paste in the following logging instance:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
retentionPolicy:
application:
maxAge: 1d
infra:
maxAge: 7d
audit:
maxAge: 7d
elasticsearch:
nodeCount: 3
nodeSelector:
node-role.kubernetes.io/infra: ''
storage:
emptyDir: {}
redundancyPolicy: "SingleRedundancy"
resources:
limits:
memory: 3Gi
requests:
memory: 3Gi
visualization:
type: "kibana"
kibana:
replicas: 1
curation:
type: "curator"
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}
A working deployment should look something like this:
oc project openshift-logging
oc get pods
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-f58c98989-2jrxx 1/1 Running 0 53m
elasticsearch-cdm-2p9fwrm5-1-8fff599cb-j7xh2 2/2 Running 0 3m4s
elasticsearch-cdm-2p9fwrm5-2-944758ff6-zrv9p 2/2 Running 0 3m1s
elasticsearch-cdm-2p9fwrm5-3-68bfc4b584-vbdqp 2/2 Running 0 2m58s
fluentd-bzmw4 1/1 Running 0 3m11s
fluentd-msv2p 1/1 Running 0 3m11s
fluentd-sglqw 1/1 Running 0 3m12s
kibana-86f69c8b84-62b7r 2/2 Running 0 3m7s
Fluentd runs an instance on every node in the cluster.
Local storage logging
Assuming the PVs are available with a storage class of local-sc
as described in the https://www.richardwalker.dev/pragmatic-openshift/#_local_storage section of this document. The following logging instance includes the storage class definition. Both the storageClassName
and size
are added:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
retentionPolicy:
application:
maxAge: 1d
infra:
maxAge: 7d
audit:
maxAge: 7d
elasticsearch:
nodeCount: 3
nodeSelector:
node-role.kubernetes.io/infra: ''
storage:
storageClassName: local-sc
size: 50G
redundancyPolicy: "SingleRedundancy"
resources:
limits:
memory: 3Gi
requests:
memory: 3Gi
visualization:
type: "kibana"
kibana:
replicas: 1
curation:
type: "curator"
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}
oc project openshift-logging
oc get pods
If successful, the PVC should be claimed and bound:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
elasticsearch-elasticsearch-cdm-lk4f9958-1 Bound local-pv-d4d267e6 50Gi RWO local-sc 22s
elasticsearch-elasticsearch-cdm-lk4f9958-2 Bound local-pv-297ca047 50Gi RWO local-sc 22s
elasticsearch-elasticsearch-cdm-lk4f9958-3 Bound local-pv-6317e505 50Gi RWO local-sc 22s
Log Forwarding
To test log forwarding in a lab environment, external services need deploying and configuring to receive them.
Elasticsearch
Deploy External EFK
Create a Virtual Machine, this example uses 4 CPU cores, 8GB of memory and 60GB storage with bridge networking so the IP Address of the EFK VM is on the same network as my OpenShift 4.6 home lab.
Assuming CentOS 8.2 is installed on the VM, make sure all is up-to-date:
dnf update -y
reboot
Install Java:
dnf install java-11-openjdk-devel -y
Add EPEL:
dnf install epel-release -y
Reducing steps in this document and to remove potential issues, disabling both SELinux and firewalld
:
vi /etc/sysconfig/selinux
SELINUX=disabled
systemctl stop firewalld
systemctl disable firewalld
Elasticsearch
Add the Elasticsearch repository:
vi /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
Import the key:
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
Install Eleasticsearch:
dnf install elasticsearch -y
Back up the original configuration:
cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.original
Strip out the noise:
grep -v -e '^#' -e '^$' /etc/elasticsearch/elasticsearch.yml.original > /etc/elasticsearch/elasticsearch.yml
Add the following settings to expose Elasticsearch to the network:
cluster.name: my-efk
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
network.host: 0.0.0.0
cluster.initial_master_nodes: node-1
Start and enable the service:
systemctl enable elasticsearch.service --now
Kibana
Install Kibana:
dnf install kibana -y
Back up the original configuration:
cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.original
Update the configuration for the Elasticsearch host:
vi /etc/kibana/kibana.yml
elasticsearch.hosts: [“http://localhost:9200"]
Start and enable Kibana:
systemctl enable kibana.service --now
NGINX
Install NGINX:
dnf install nginx -y
Create a user name and password for Kibana:
echo "kibana:`openssl passwd -apr1`" | tee -a /etc/nginx/htpasswd.kibana
Back up the original configuration:
cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.original
Add the following configuration:
vi /etc/kibana/kibana.yml
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
}
http {
log_format main '$remote_addr — $remote_user [$time_local] "$request"'
'$status $body_bytes_sent "$http_referer"'
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
include /etc/nginx/conf.d/*.conf;
server {
listen 80;
server_name _;
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/htpasswd.kibana;
location / {
proxy_pass http://localhost:5601;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection ‘upgrade’;
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
}
Start and enable NGINX:
systemctl enable nginx.service --now
Smoke testing
With all that in place, test Elasticsearch is up and running, the following should return a JSON response:
curl http://127.0.0.1:9200/_cluster/health?pretty
You should be able access Kibana via a browser at the IP Address of your instance, in my case http://192.168.0.70
Once in there, navigate to "Management" → "Stack Management", Under "Kibana" → "Index Patterns" and click "Create Index Pattern". This is where you will see various sources to index.
From a command line PUT an example data:
curl -X PUT "192.168.0.70:9200/characters/_doc/1?pretty" -H 'Content-Type: application/json' -d '{"name": "Mickey Mouse"}
curl -X PUT "192.168.0.70:9200/characters/_doc/2?pretty" -H 'Content-Type: application/json' -d '{"name": "Daffy Duck"}
curl -X PUT "192.168.0.70:9200/characters/_doc/3?pretty" -H 'Content-Type: application/json' -d '{"name": "Donald Duck"}
curl -X PUT "192.168.0.70:9200/characters/_doc/4?pretty" -H 'Content-Type: application/json' -d '{"name": "Bugs Bunny"}
In Kibana, when you go to "Create Index Pattern" as described before, you should now see characters
has appeared, type characters*
and click "Next step" and create the index pattern. Navigate to "Kibana" → "Discover" and if you have more than one "Index Pattern" select the characters*
index from the drop-down menu (near top left) and you should see the data you PUT into Elasticsearch.
This pattern is what I use to see and add indexes to Kibana when adding forwarders.
For reference you can return individual results using:
curl -X GET "localhost:9200/characters/_doc/1?pretty"
Forwarding
Example of OCP 4.6 log forwarding of application logs to an external Elasticsearch stack:
vi log-forwarding.yaml
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: elasticsearch-insecure
type: "elasticsearch"
url: http://192.168.0.70:9200
pipelines:
- name: application-logs
inputRefs:
- application
outputRefs:
- elasticsearch-insecure
labels:
logs: application
oc create -f log-forwarding.yaml
oc project openshift-logging
oc get pods
Rsyslog
Rsyslog receiver
To test rsyslog forwarding, configure rsyslog on a RHEL/CentOS 8 host. In this example, UDP with a DNS name of syslog.cluster.lab.com
.
Rsyslog should already be enabled and running:
systemctl status rsyslog
vi /etc/rsyslog.conf
Uncomment the lines:
module(load="imudp")
input(type="imudp" port="514")
Add a rule for local0, something like:
local0.* /var/log/openshift.log
Either stop and disable firewalld
or add the follwoing rule:
firewall-cmd --add-port=514/udp --zone=public --permanent
firewall-cmd --reload
Restart rsyslog:
systemctl restart rsyslog
Test the receiving
From any other Linux host, configure rsyslog to forward UDP:
vi /etc/rsyslog.conf
*.* @syslog.cluster.lab.com:514 # Use @ for UDP protocol
systemctl restart rsyslog
Send a test message:
logger -p local0.notice "Hello, this is test!"
Forwarding
Here is an example of creating a syslog forwarder for just a single project:
vi rsyslog-forwarder.yaml
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
inputs:
- application:
namespaces:
- my-project
name: django-logger-logs
outputs:
- name: rsyslog-test
syslog:
appName: cluster-apps
facility: local0
msgID: cluster-id
procID: cluster-proc
rfc: RFC5424
severity: debug
type: syslog
url: 'udp://192.168.0.145:514'
pipelines:
- inputRefs:
- django-logger-logs
labels:
syslog: rsyslog-test
name: syslog-test
outputRefs:
- rsyslog-test
oc create -f rsyslog-forwarder.yaml
Testing app
The following application was written to trigger event in log files for testing:
Create a new project:
oc new-project logging-project
Import my s2i-python38-container image:
oc import-image django-s2i-base-img --from quay.io/richardwalkerdev/s2i-python38-container --confirm
Deploy the application:
oc new-app --name django-logger django-s2i-base-img~https://github.com/richardwalkerdev/django-logger.git
And expose the route:
oc expose service/django-logger
Forwarding
With the testing application deployed the following example combines forwarding to the external Elasticsearch (v7) and Rsyslog. This example also includes "forwarding" to the EFK stack (v6) deployed on OpenShift by specifying default
in the outputRefs
. Moreover, the forwarding is limited to just the logging-project
project/namespace.
vi log-forwarding.yaml
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
inputs:
- application:
namespaces:
- logging-project
name: project-logs
outputs:
- name: elasticsearch-insecure
type: elasticsearch
url: 'http://192.168.0.70:9200'
- name: rsyslog-insecure
syslog:
appName: cluster-apps
facility: local0
msgID: cluster-id
procID: cluster-proc
rfc: RFC5424
severity: debug
type: syslog
url: 'udp://192.168.0.145:514'
pipelines:
- inputRefs:
- project-logs
labels:
logs: application
name: application-logs
outputRefs:
- elasticsearch-insecure
- rsyslog-insecure
- default
oc create -f log-forwarding.yaml
Going to the application, for example: http://django-logger-logging-project.apps.cluster.lab.com/

Generate some logs by clicking the buttons.
Example from OCP EFK - Kibana v6:

Example from External - Kibana v7:

Example of rsyslog:

Troubleshooting
oc project openshift-logging
oc get pods
Insufficient memory
oc describe pod elasticsearch-cdm-uz12dkcd-1-6cf9ff6cb9-945gg
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 33m default-scheduler 0/6 nodes are available: 3 Insufficient memory, 3 node(s) didn't match node selector.
Resource limits set for elasticsearch must be available on the nodes, either increase memory on the hosts or decrease the memory in the settings.
Delete cluster logging
Administration → Custom Resource Definitions → ClusterLogging → Instances → Create ClusterLogging
Delete the cluster logging instance.
MONITORING
Configuring the cluster monitoring stack on OpenShift Container Platform.
The document demonstrates deploying the monitoring stack using NFS storage.
NFS is NOT recommended and decent block storage should be used, refer to the official documentation and substitute the storage class for a recommend one. |
NFS requirements
Refer to https://www.richardwalker.dev/pragmatic-openshift/#_installation for setting up NFS.
Prepare the following NFS shares:
/mnt/openshift/alertmanager1 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/mnt/openshift/alertmanager2 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/mnt/openshift/alertmanager3 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/mnt/openshift/prometheus1 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/mnt/openshift/prometheus2 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
Ensure permissions on the share directories:
chmod 775 /mnt/openshift/*
Exported the new shares with:
exportfs -arv
And confirmed the shares are visible:
exportfs -s
showmount -e 127.0.0.1
Alert manager
These steps demonstrate how to add persistent storage for Alert Manager.
Create PVs
With shares available, add the alert manager PVs, NOTE: the accessMode
is set to ReadWriteOnce
:
vi alert-manager-nfs-pv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv1
spec:
capacity:
storage: 40Gi
accessModes:
- ReadWriteOnce
nfs:
path: /mnt/openshift/alertmanager1
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv2
spec:
capacity:
storage: 40Gi
accessModes:
- ReadWriteOnce
nfs:
path: /mnt/openshift/alertmanager2
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager-pv3
spec:
capacity:
storage: 40Gi
accessModes:
- ReadWriteOnce
nfs:
path: /mnt/openshift/alertmanager3
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
oc create -f alert-manager-nfs-pv.yaml
Use oc get pv
to display PVs.
Configure
First, check whether the cluster-monitoring-config
ConfigMap object exists:
oc -n openshift-monitoring get configmap cluster-monitoring-config
Error from server (NotFound): configmaps "cluster-monitoring-config" not found
If not, create it:
vi cluster-monitoring-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
oc apply -f cluster-monitoring-config.yaml
oc -n openshift-monitoring get configmap cluster-monitoring-config
NAME DATA AGE
cluster-monitoring-config 1 3s
This step is easier via the web console, amend Workloads → Config Maps (Select Project openshift-monitoring)
→ "cluster-monitoring-config" → YAML
Add the following:
data:
config.yaml: |+
alertmanagerMain:
volumeClaimTemplate:
metadata:
name: alertmanager-claim
spec:
storageClassName: nfs
resources:
requests:
storage: 40Gi
Take note of the storage size, storage class name and node selector (if applicable) for your environment.
Make sure you in the right project:
oc project openshift-monitoring
You should see the three alertmanager-main
pods recreating:
oc get pods
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 0/5 ContainerCreating 0 36s
alertmanager-main-1 0/5 ContainerCreating 0 36s
alertmanager-main-2 0/5 ContainerCreating 0 36s
And the that PVCs have been claimed:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager-claim-alertmanager-main-0 Bound alertmanager-pv1 40Gi RWO nfs 26h
alertmanager-claim-alertmanager-main-1 Bound alertmanager-pv3 40Gi RWO nfs 26h
alertmanager-claim-alertmanager-main-2 Bound alertmanager-pv2 40Gi RWO nfs 26h
Prometheus
Create PVs
Add the prometheus PVs:
vi prometheus-nfs-pv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv1
spec:
capacity:
storage: 40Gi
accessModes:
- ReadWriteOnce
nfs:
path: /mnt/openshift/prometheus1
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv2
spec:
capacity:
storage: 40Gi
accessModes:
- ReadWriteOnce
nfs:
path: /mnt/openshift/prometheus2
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
oc create -f prometheus-nfs-pv.yaml
Configure
Again via the web console, amend Workloads → Config Maps (Select Project openshift-monitoring
) → "cluster-monitoring-config" → YAML
And add the following:
prometheusK8s:
volumeClaimTemplate:
metadata:
name: prometheus-claim
spec:
storageClassName: nfs
resources:
requests:
storage: 40Gi
Note, this is appended so the whole configuration should look like this:
data:
config.yaml: |+
alertmanagerMain:
volumeClaimTemplate:
metadata:
name: alertmanager-claim
spec:
storageClassName: nfs
resources:
requests:
storage: 40Gi
prometheusK8s:
volumeClaimTemplate:
metadata:
name: prometheus-claim
spec:
storageClassName: nfs
resources:
requests:
storage: 40Gi
Once completed, you should see all the PVCs have been claimed:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager-claim-alertmanager-main-0 Bound alertmanager-pv1 40Gi RWO nfs 4m32s
alertmanager-claim-alertmanager-main-1 Bound alertmanager-pv3 40Gi RWO nfs 4m32s
alertmanager-claim-alertmanager-main-2 Bound alertmanager-pv2 40Gi RWO nfs 4m32s
prometheus-claim-prometheus-k8s-0 Bound prometheus-pv1 40Gi RWO nfs 14s
prometheus-claim-prometheus-k8s-1 Bound prometheus-pv2 40Gi RWO nfs 14s
And everything running correctly:
oc get pods
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 5/5 Running 0 26h
alertmanager-main-1 5/5 Running 0 26h
alertmanager-main-2 5/5 Running 0 26h
cluster-monitoring-operator-75f6b78475-4f4s9 2/2 Running 3 2d2h
grafana-74564f7ff4-sqw8g 2/2 Running 0 2d2h
kube-state-metrics-b6fb95865-hzsst 3/3 Running 0 2d2h
node-exporter-ccmbm 2/2 Running 0 2d2h
node-exporter-n5sdt 2/2 Running 0 2d2h
node-exporter-psbt4 2/2 Running 0 2d2h
openshift-state-metrics-5894b6c4df-fv9km 3/3 Running 0 2d2h
prometheus-adapter-58d9999987-9lltc 1/1 Running 0 27h
prometheus-adapter-58d9999987-lhtvc 1/1 Running 0 27h
prometheus-k8s-0 7/7 Running 1 26h
prometheus-k8s-1 7/7 Running 1 26h
prometheus-operator-68f6b4f6bb-4mxcn 2/2 Running 0 47h
telemeter-client-79d6fc74c-wjqgw 3/3 Running 0 2d2h
thanos-querier-66f4b4c758-2z4f6 4/4 Running 0 2d2h
thanos-querier-66f4b4c758-fsqfz 4/4 Running 0 2d2h
Node selectors
If using infra nodes, add node selectors to the configuration, here is a complete example for OCP 4.6:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |+
alertmanagerMain:
nodeSelector:
node-role.kubernetes.io/infra: ""
volumeClaimTemplate:
metadata:
name: alertmanager-claim
spec:
storageClassName: local-sc
resources:
requests:
storage: 40Gi
prometheusK8s:
nodeSelector:
node-role.kubernetes.io/infra: ""
volumeClaimTemplate:
metadata:
name: prometheus-claim
spec:
storageClassName: local-sc
resources:
requests:
storage: 40Gi
prometheusOperator:
nodeSelector:
node-role.kubernetes.io/infra: ""
grafana:
nodeSelector:
node-role.kubernetes.io/infra: ""
k8sPrometheusAdapter:
nodeSelector:
node-role.kubernetes.io/infra: ""
kubeStateMetrics:
nodeSelector:
node-role.kubernetes.io/infra: ""
telemeterClient:
nodeSelector:
node-role.kubernetes.io/infra: ""
openshiftStateMetrics:
nodeSelector:
node-role.kubernetes.io/infra: ""
thanosQuerier:
nodeSelector:
nodename: worker1.cluster.lab.com
nodename: worker2.cluster.lab.com
CERTIFICATES
Replicate a local Certificate Authority (CA) for generating SSL certificates and applying them to OpenShift. The first step is to generate a root certificate and a private key. Then add the root certificate to any host, and then certificates generated and signed will be inherently trusted.
Local CA
On a local Linux client, Generate a private key, you’ll be prompted for a pass phrase:
openssl genrsa -des3 -out local_ca.key 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
.............+++++
.......................................................................................................................+++++
e is 65537 (0x010001)
Enter pass phrase for local_ca.key: changeme
Verifying - Enter pass phrase for local_ca.key: changeme
Which generated a private key file local_ca.key
.
Generate a root certificate:
openssl req -x509 -new -nodes -key local_ca.key -sha256 -days 1825 -out local_ca.pem
Enter the password you just set and I used the following bogus details:
Country Name (2 letter code) [XX]:UK
State or Province Name (full name) []:CA County
Locality Name (eg, city) [Default City]:CA City
Organization Name (eg, company) [Default Company Ltd]:Local Certificate Authority
Organizational Unit Name (eg, section) []:CA Unit
Common Name (eg, your name or your server's hostname) []:ca.local
Email Address []:[email protected]
View root certificate:
openssl x509 -in local_ca.pem --text
Install root certificate
On a local Linux RHEL 8/CentOS 8 client:
sudo cp local_ca.pem /etc/pki/ca-trust/source/anchors/
update-ca-trust extract
Signed certificate
Create a private key:
openssl genrsa -out cluster.lab.com.key 2048
Create a CSR, with a Common Name (CN) in this case c`luster.lab.com
:
openssl req -new -key cluster.lab.com.key -out cluster.lab.com.csr
This example relies on the alt_names , you might wish to create two certificates with the Common Names *.apps.cluster.lab.com and api.cluster.lab.com .
|
Country Name (2 letter code) [XX]:UK
State or Province Name (full name) []:OCP County
Locality Name (eg, city) [Default City]:OCP City
Organization Name (eg, company) [Default Company Ltd]:OpenShift Container Platform
Organizational Unit Name (eg, section) []:OCP Unit
Common Name (eg, your name or your server's hostname) []:cluster.lab.com
Email Address []:[email protected]
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
Create a configuration file needed to define the Subject Alternative Name (SAN) extension, this allows multiple, alternative DNS validations. For OpenShift there are two required, one for the ingress traffic for users to access application deployed. The second is for the API. This method means the one certificate can be used for both cases.
vi cluster.lab.com.ext
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = *.apps.cluster.lab.com
DNS.2 = api.cluster.lab.com
Create the certificate, you’ll be prompted for the root certificate password again:
openssl x509 -req -in cluster.lab.com.csr -CA local_ca.pem -CAkey local_ca.key -CAcreateserial -out cluster.lab.com.crt -days 825 -sha256 -extfile cluster.lab.com.ext
You should have all these files:
cluster.lab.com.crt
cluster.lab.com.csr
cluster.lab.com.ext
cluster.lab.com.key
local_ca.key
local_ca.pem
local_ca.srl
An addition recommended, yet optional step is the add the root certificate at the end of the new client certificate file:
cat local_ca.pem >> cluster.lab.com.crt
View the final certificate:
openssl x509 -in cluster.lab.com.crt -text
Verify the certificate:
openssl verify -CAfile local_ca.pem cluster.lab.com.crt
cluster.lab.com.crt: OK
Ingress certificate
Create a secret in the openshift-ingress
name-space containing both the certificate and private key:
oc create secret tls apps-cert --cert=cluster.lab.com.crt --key=cluster.lab.com.key -n openshift-ingress
And apply the patch, make sure the name matches the name of the secret just added, in this case apps-cert
:
oc patch ingresscontroller.operator default --type=merge -p '{"spec":{"defaultCertificate": {"name": "apps-cert"}}}' -n openshift-ingress-operator
To view the changes taking place:
oc project openshift-ingress
You should see the two route pods rebuild:
oc get pods
NAME READY STATUS RESTARTS AGE
router-default-8d9fbbfb7-55xpt 1/1 Running 0 102s
router-default-8d9fbbfb7-w6zhn 1/1 Running 0 118s
API certificate
Create a secret in the openshift-config
name-space containing both the certificate and private key:
oc create secret tls api-cert --cert=cluster.lab.com.crt --key=cluster.lab.com.key -n openshift-config
Again, apply the patch, making sure the name matches the name of the secret just added, in this case api-cert
, and the domain matches you API URL in this case api.cluster.lab.com
:
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates":[{"names": ["api.cluster.lab.com"], "servingCertificate": {"name": "api-cert"}}]}}}'
To see the effect of the previous patch:
oc get apiserver cluster -o yaml
spec:
servingCerts:
namedCertificates:
- names:
- api.cluster.lab.com
servingCertificate:
name: api-cert
To view the changes taking place:
oc project openshift-kube-apiserver
oc get pods
You should see three kube-apiserver
pods redeploy (this took a while for me):
kube-apiserver-master1.cluster.lab.com 4/4 Running 0 3m58s
kube-apiserver-master2.cluster.lab.com 4/4 Running 0 10m
kube-apiserver-master3.cluster.lab.com 4/4 Running 0 7m11s
Once all three pods have complete redeployment, check and validate the certificate has been applied:
openssl s_client -connect api.cluster.lab.com:6443
And/or:
curl -vvI https://api.cluster.lab.com:6443
Sometimes the trusted certs on a client doesn’t take full effect, you can provide the CA certificate explicitly:
curl --cacert local_ca.pem -vvI https://api.cluster.lab.com:6443
Test logging in:
oc login -u admin -p changme https://api.cluster.lab.com:6443
Another trick is to specify your certificate-authority certificate:
oc login --certificate-authority=ca.crt https://api.cluster.lab.com:6443
Replace certificates
To replace certificates the following commands can be used:
Example for ingress (*.apps):
oc create secret tls apps-cert --cert=api.cluster.lab.com.crt --key=api.cluster.lab.com.key -n openshift-ingress --dry-run=client -o yaml| oc replace -f -
Example for api:
oc create secret tls api-cert --cert=api.cluster.lab.com.crt --key=api.cluster.lab.com.key -n openshift-config --dry-run=client -o yaml| oc replace -f -
Add CA Bundle
Using your CA certificate:
vi user-ca-bundle.yaml
apiVersion: v1
data:
ca-bundle.crt: |
-----BEGIN CERTIFICATE-----
MIIEKTCCAxGgAwIBAgIUTO5Cn1LKQtoaWrfcOnHSdRBmpvwwDQYJKoZIhvcNAQEL
BQAwgaMxCzAJBgNVBAYTAlVLMRMwEQYDVQQIDApPQ1AgQ291bnR5MREwDwYDVQQH
DAhPQ1AgQ2l0eTElMCMGA1UECgwcT3BlblNoaWZ0IENvbnRhaW5lciBQbGF0Zm9y
bTERMA8GA1UECwwIT0NQIFVuaXQxETAPBgNVBAMMCGNhLmxvY2FsMR8wHQYJKoZI
hvcNAQkBFhBub3JlcGx5QGNhLmxvY2FsMB4XDTIwMTExMjE1NDYxNVoXDTI1MTEx
MTE1NDYxNVowgaMxCzAJBgNVBAYTAlVLMRMwEQYDVQQIDApPQ1AgQ291bnR5MREw
DwYDVQQHDAhPQ1AgQ2l0eTElMCMGA1UECgwcT3BlblNoaWZ0IENvbnRhaW5lciBQ
bGF0Zm9ybTERMA8GA1UECwwIT0NQIFVuaXQxETAPBgNVBAMMCGNhLmxvY2FsMR8w
HQYJKoZIhvcNAQkBFhBub3JlcGx5QGNhLmxvY2FsMIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEAuSidKVFVoKFv3QBHTTgjfhPyvsL4O8H530ehb7iap71b
Bw2bzxSnrB84Vh4EeZ+pF4cAfK8jquvq2kJjPOGzuflc0aAVWzq6DYJLGRP5T6Sw
v8Zzlnf0EwSBQRxKs3MNlfM36uRkJMsTxxlKYsBsMP51bT9PNYzPqQ6WcDZyclf+
OGhnb2uUDud9oGLVapeHfibiyfSahgnnds3UyjWtYUP3sgWDPCKOpXIqFGcCqdfs
rgRndEq6Leu3/yxnxNwQmB5v3+XAUybUSU8U+cJDYrsyxu5wtYDI75Eo6ocIbVxx
T+waMwQPLhzMv8YfhNn91l4S0lHR5GL1c7RY3ms+xQIDAQABo1MwUTAdBgNVHQ4E
FgQURK6H+pSQSQqce0NyZiEbVjbCdukwHwYDVR0jBBgwFoAURK6H+pSQSQqce0Ny
ZiEbVjbCdukwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAhMRz
F+e6pV7eQXyyiExIMoTI3hqubsRTANmNbXkNBrCRswUoe7T1F3146G9B2wAFQAtH
vda4NcS+i1yW4QG0cgnfJRcPsRnTSEmezia4aHn7vUW3oA8HGL47zc+tlQPV6EKd
hjtdH8R2GIB5CeBhEp1I9DuX2owWEemAnrZxfGjQJxTvCEOprkCJBzozNumwMhZZ
gmzBUeKYbQHVH0oGATGqKph8X36NGPtUdIDY80INThMS0XvvH7fndX1HOEuB37mn
UW7CPsnoMWXf8SsPon4g6aMuKsDpKUqsuvT3RNFHofZIBnqXdCYzbdYbzrZ5ppBH
sx3KXS6+lZijVVMwoA==
-----END CERTIFICATE-----
kind: ConfigMap
metadata:
name: user-ca-bundle
namespace: openshift-config
oc create -f user-ca-bundle.yaml
Now edit the cluster proxy configuration (even though a proxy might not be in use):
oc edit proxy/cluster
This causes a scheduled reboot of all your nodes. |
Replace the spec:
with:
spec:
trustedCA:
name: user-ca-bundle
This change is a Machine Config and adds the bundle to each nodes ca-trust
:
SSH to a node, for example:
ssh -i cluster_id_rsa [email protected]
sudo su -
The following file gets updated with your certificates:
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
openssl x509 -in openshift-config-user-ca-bundle.crt --text
Once all the nodes reboot your ca-bundle is included.
IDENTITY PROVIDERS
It is essential to break down OpenShift components and concepts into digestible chunks and avoid the risk of being overwhelmed with complexity.
-
An Identity provider deals with the authentication layer and is responsible for identifying a user.
-
The authorisation layer determines if requests are honoured, Role-based access control (RBAC) policy determines what a user is authorised to do.
The combination of groups and roles deals with authorisation.
HTPasswd
On a Linux client install the tools:
dnf install httpd-tools -y
Create an HTPasswd file containing users:
htpasswd -c -B -b users.htpasswd admin changeme
htpasswd -b users.htpasswd tom changeme
htpasswd -b users.htpasswd dick changeme
htpasswd -b users.htpasswd harry changeme
Which should look something like this:
cat users.htpasswd
admin:$2y$05$GTvOfcm2An9XdAIyDtwzGOvjGrroac78.NHrDdySO0KOBKAPaYGgi
tom:$apr1$kouuYCYa$wlB2AB4.Ykxn/4QgHUtD9.
dick:$apr1$IETeTG0v$g0P0gqR6aQJTCaGS15QWa0
harry:$apr1$qhyrJZzc$HBCYSf9OFHRpM6he0LJ9k.
The following command runs locally and generates the needed yaml file for OpenShift:
oc create secret generic htpasswd-secret --from-file=htpasswd=users.htpasswd -n openshift-config -o yaml --dry-run=client > htpasswd-secret.yaml
Which can then be used to create or replace the secret:
oc create -f htpasswd-secret.yaml
oc replace -f htpasswd-secret.yaml
For reference, is you wish to extract an existing htpasswd file out of OpenShift use the following:
oc get secret htpasswd-secret -ojsonpath={.data.htpasswd} -n openshift-config | base64 -d > test.htpasswd
Next, either via the web console, Administration → Cluster Settings → Global Configuration → OAuth → YAML
Or via the command line:
vi oauth.yaml
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
name: cluster
spec:
identityProviders:
- name: htpasswd_provider
mappingMethod: claim
type: HTPasswd
htpasswd:
fileData:
name: htpasswd-secret
oc replace -f oauth.yaml
Cluster admin
Using RBAC to add the role cluster-admin
to the new admin account:
oc adm policy add-cluster-role-to-user cluster-admin admin
If the account has not being used to log into the cluster a warning will display:
Warning: User 'admin' not found
clusterrole.rbac.authorization.k8s.io/cluster-admin added: "admin"
Log into OpenShift for each user first to "create the user" in OpenShift and avoid the warning.
oc adm policy add-cluster-role-to-user edit tom
clusterrole.rbac.authorization.k8s.io/edit added: "tom"
You can also add users limited to projects:
oc adm policy add-role-to-user edit harry -n logging-sensitive-data
Remove kubeadmin
OpenShift clusters are deployed with an install generated kubeadmin
account. Once identity providers are fully configured it is recommend security best practice to remove this default account.
The kubeadmin
password is stored in cluster/auth/kubeadmin-password
.
Ensuring you have added at least one other user with cluster-admin
role, the kubeadmin
account can be removed using:
oc delete secrets kubeadmin -n kube-system
LDAP
Deploy LDAP
This process covers deploying an application using container images. In this case deploying a basic LDAP service for testing identity providers. Such a service would always be external to the cluster. Using this image by Rafael Römhild here https://github.com/rroemhild/docker-test-openldap
On another host on the same subnet as the cluster and load balancer, pull the image:
podman pull docker.io/rroemhild/test-openldap
Create a pod:
podman pod create -p 389 -p 636 -n ldappod
If you see an error error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed] you need to add the following kernel parameter
|
vi /etc/sysctl.conf
net.ipv4.ip_unprivileged_port_start = 0
sudo sysctl -p
Launch the container:
podman run --privileged -d --pod ldappod rroemhild/test-openldap
Open the firewall ports for LDAP for accessing it directly from external hosts:
firewall-cmd --permanent --add-port=389/tcp
firewall-cmd --permanent --add-port=636/tcp
firewall-cmd --reload
Test the service locally:
ldapsearch -h 127.0.0.1 -p 389 -D cn=admin,dc=planetexpress,dc=com -w GoodNewsEveryone -b "dc=planetexpress,dc=com" -s sub "(objectclass=*)"
Optionally, add a DNS entry ldap.cluster.lab.com
and a load balancer in HAProxy:
frontend ldap
bind 0.0.0.0:389
option tcplog
mode tcp
default_backend ldap
backend ldap
mode tcp
balance roundrobin
server ldap 192.168.0.15:389 check
List erveything:
ldapsearch -h ldap.cluster.lab.com -p 389 -D cn=admin,dc=planetexpress,dc=com -w GoodNewsEveryone -b "dc=planetexpress,dc=com" -s sub "(objectclass=*)"
List only users returning only the common names and uid:
ldapsearch -h ldap.cluster.lab.com -p 389 -D cn=admin,dc=planetexpress,dc=com -w GoodNewsEveryone -x -s sub -b "ou=people,dc=planetexpress,dc=com" "(objectclass=inetOrgPerson)" cn uid
List only groups:
ldapsearch -h ldap.cluster.lab.com -p 389 -D cn=admin,dc=planetexpress,dc=com -w GoodNewsEveryone -x -s sub -b "ou=people,dc=planetexpress,dc=com" "(objectclass=Group)"
LDAP Identity Provider
Add a secret to OpenShift that contains the LDAP bind password:
Admin account:
cn=admin,dc=planetexpress,dc=com
Bind password:
GoodNewsEveryone
Create a secret called ldap-bind-password
in the openshift-config
name-space:
oc create secret generic ldap-bind-password --from-literal=bindPassword=GoodNewsEveryone -n openshift-config
Either use the web console to append the LDAP identity by navigating to Administration → Cluster Settings → Global Configuration → OAuth.
Or via the CLI:
oc project openshift-authentication
oc get OAuth
oc edit OAuth cluster
Below spec:`
add the -ldap
part, for example:
spec:
identityProviders:
- htpasswd:
fileData:
name: htpass-secret
mappingMethod: claim
name: htpasswd_provider
type: HTPasswd
- ldap:
attributes:
email:
- mail
id:
- dn
name:
- cn
preferredUsername:
- uid
bindDN: 'cn=admin,dc=planetexpress,dc=com'
bindPassword:
name: ldap-bind-password
insecure: true
url: 'ldap://ldap.cluster.lab.com/DC=planetexpress,DC=com?uid?sub?(memberOf=cn=admin_staff,ou=people,dc=planetexpress,dc=com)'
mappingMethod: claim
name: ldap
type: LDAP
In the openshift-authentication
project, there will be two pods oauth-openshift-xxxxxxxxxx-xxxxx
. These we be terminated and recreated every time you make a change to the configuration. Once saving changes, expect to see something like this:
oc get pods
NAME READY STATUS RESTARTS AGE
oauth-openshift-7f95bc7996-5vl2z 1/1 Terminating 0 13m
oauth-openshift-7f95bc7996-854xb 1/1 Terminating 0 13m
oauth-openshift-ccd6bc654-mrbc6 1/1 Running 0 17s
oauth-openshift-ccd6bc654-qh29m 1/1 Running 0 7s
For troubleshooting issues you can tail the logs for each of the running pods, for example:
oc logs oauth-openshift-ccd6bc654-mrbc6 -f
The web console will now have a "Log in with" option for LDAP, and in this case, the user hermes
(with password hermes
) should be able to log in because that user is a member of the admin_staff
group. Trying the user fry
(with password fry
) fails because they are NOT a member of the admin_staff
group.
The example in this document is fundamental. In the real world, there is often trial and error, the key is being able to search LDAP an understand the directory information tree (DIT).
For including more that one group in the LDAP identity provider, you can use the following syntax:
ldap://ldap.cluster.lab.com/DC=planetexpress,DC=com?uid?sub??(|(memberOf=cn=admin_staff,ou=people,dc=planetexpress,dc=com)(memberOf=cn=ship_crew,ou=people,dc=planetexpress,dc=com))
This will allow any user from either admin_staff
or ship_crew
group to login.
Where TLS is in use, add a configmap
in the openshift-config
namespace:
oc create configmap ldap-ca-bundle --from-file=ca.crt=/root/ocp4/ssl/ca.crt -n openshift-config
Include the following options and use the ldaps
syntax for port 636:
ca:
name: ldap-ca-bundle
insecure: false
url: >-
ldaps://ldap.cluster.lab.com/...
LDAP Group Sync
Two groups exist in the testing directory admin_staff
and ship_crew
. To add groups in OpenShift that match those two groups in LDAP, automate this within OpenShift at regular internals using a Cron Job. The Cron job needs a Service Account, A Cluster Role, a Cluster Role binding and a ConfigMap.
Pre-testing
Before we create anything in OpenShift, try things out from the CLI first and make sure that the data for ldap-group-sync.yaml
to be stored in the ConfigMap is correct and returns the desired results.
Create a file called ldap_sync_config.yaml:
kind: LDAPSyncConfig
apiVersion: v1
url: ldap://ldap.cluster.lab.com:389
insecure: true
bindDN: "cn=admin,dc=planetexpress,dc=com"
bindPassword: "GoodNewsEveryone"
rfc2307:
groupsQuery:
baseDN: "ou=people,dc=planetexpress,dc=com"
scope: sub
filter: "(objectClass=Group)"
derefAliases: never
groupUIDAttribute: dn
groupNameAttributes: [ cn ]
groupMembershipAttributes: [ member ]
usersQuery:
baseDN: "ou=people,dc=planetexpress,dc=com"
scope: sub
derefAliases: never
userUIDAttribute: dn
userNameAttributes: [ uid ]
tolerateMemberNotFoundErrors: true
tolerateMemberOutOfScopeErrors: true
Experiment with ldap_sync_config.yaml
using this safe "dry-run" command to get your desired results:
oc adm groups sync --sync-config=ldap_sync_config.yaml
Nothing is final or committed until you add --confirm
to the command:
oc adm groups sync --sync-config=ldap_sync_config.yaml --confirm
The example provided should return the two groups admin_staff
and ship_crew
.
You could just run it the once and create the groups in OpenShift as a one-off task, but in the real-world, directories can be huge and often changes with starters and leavers etc.
Cron Job
This requires the bind password in the project openshift-authentication
:
oc create secret generic ldap-sync-bind-password --from-literal=bindPassword=GoodNewsEveryone -n openshift-authentication
The next three steps are generic, adding a service account, Cluster role and Cluster role binding. They can be applied individually or amalgamated into one file to create all three in one go, I’ve split them out for clarity of each component:
Service Account:
vi ldap_sync_sa.yaml
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: ldap-group-syncer
namespace: openshift-authentication
labels:
app: cronjob-ldap-group-sync
oc create -f ldap_sync_sa.yaml
Cluster Role:
vi ldap_sync_cr.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ldap-group-syncer
labels:
app: cronjob-ldap-group-sync
rules:
- apiGroups:
- ''
- user.openshift.io
resources:
- groups
verbs:
- get
- list
- create
- update
oc create -f ldap_sync_cr.yaml
Cluster Role Binding:
vi ldap_sync_crb.yaml
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: ldap-group-syncer
labels:
app: cronjob-ldap-group-sync
subjects:
- kind: ServiceAccount
name: ldap-group-syncer
namespace: openshift-authentication
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ldap-group-syncer
oc create -f ldap_sync_crb.yaml
ConfigMap:
This ConfigMap adds the file ldap-group-sync.yaml
used earlier for testing things out or synchronizing groups manually from the CLI. The ConfigMap
is a resource made available in OpenShift that the final Cron Job will utilise:
vi ldap_sync_cm.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: ldap-group-syncer
namespace: openshift-authentication
labels:
app: cronjob-ldap-group-sync
data:
ldap-group-sync.yaml: |
kind: LDAPSyncConfig
apiVersion: v1
url: ldap://ldap.cluster.lab.com:389
insecure: true
bindDN: "cn=admin,dc=planetexpress,dc=com"
bindPassword:
file: "/etc/secrets/bindPassword"
rfc2307:
groupsQuery:
baseDN: "ou=people,dc=planetexpress,dc=com"
scope: sub
filter: "(objectClass=Group)"
derefAliases: never
pageSize: 0
groupUIDAttribute: dn
groupNameAttributes: [ cn ]
groupMembershipAttributes: [ member ]
usersQuery:
baseDN: "ou=people,dc=planetexpress,dc=com"
scope: sub
derefAliases: never
pageSize: 0
userUIDAttribute: dn
userNameAttributes: [ uid ]
tolerateMemberNotFoundErrors: true
tolerateMemberOutOfScopeErrors: true
oc create -f ldap_sync_cm.yaml
Add the Cron Job
vi ldap_sync_cj.yaml
kind: CronJob
apiVersion: batch/v1beta1
metadata:
name: ldap-group-syncer
namespace: openshift-authentication
labels:
app: cronjob-ldap-group-sync
spec:
schedule: '*/2 * * * *'
concurrencyPolicy: Forbid
suspend: false
jobTemplate:
metadata:
creationTimestamp: null
labels:
app: cronjob-ldap-group-sync
spec:
backoffLimit: 0
template:
metadata:
creationTimestamp: null
labels:
app: cronjob-ldap-group-sync
spec:
restartPolicy: Never
activeDeadlineSeconds: 500
serviceAccountName: ldap-group-syncer
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- name: ldap-group-sync
image: 'openshift/origin-cli:latest'
command:
- /bin/bash
- '-c'
- >-
oc adm groups sync
--sync-config=/etc/config/ldap-group-sync.yaml --confirm
resources: {}
volumeMounts:
- name: ldap-sync-volume
mountPath: /etc/config
- name: ldap-sync-bind-password
mountPath: /etc/secrets
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
serviceAccount: ldap-group-syncer
volumes:
- name: ldap-sync-volume
configMap:
name: ldap-group-syncer
defaultMode: 420
- name: ldap-sync-bind-password
secret:
secretName: ldap-sync-bind-password
defaultMode: 420
dnsPolicy: ClusterFirst
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
oc create -f ldap_sync_cj.yaml
You can now pick out the key lines in this file to make sense of how it ties together, and it uses the service account created:
serviceAccountName: ldap-group-syncer
Mounts a volume for the ldap-group-sync.yaml
file:
--sync-config=/etc/config/ldap-group-sync.yaml --confirm
And mounts password as a file:
bindPassword:
file: "/etc/secrets/bindPassword"
Study the volumeMounts
, volumes
and the command
, it should be clear how all the components fit together.
The first schedule run will kick in after the designated time, in this case, two minutes, and takes a little longer to complete the run because it has to pull the image openshift/origin-cli:latest
. Subsequent runs will be much quicker.
Testing
Test the schedule by deleting one of the groups User Management → Groups, wait for the Cron Job to run, and the group should get successfully recreated. Monitor the events to following the status.
oc project openshift-authentication
oc get events --watch
List the cronjob
oc get cronjobs.batch
Trigger a job run:
oc create job --from=cronjob/ldap-group-syncer test-sync-1
RBAC
Bind, for example, the cluster-admin
OpenShift role to the admin_staff
group:
oc adm policy add-cluster-role-to-group cluster-admin admin_staff
And for example basic-user
to the ship_crew
group:
oc adm policy add-cluster-role-to-group basic-user ship_crew
Logging into OpenShift with different accounts to test out the results.
For example, cluster administrators:
hermes/hermes
professor/professor
And basic users:
fry/fry
leela/leela
Make sure the LDAP Identity provider is configured to include both groups for basic users:
ldap://ldap.cluster.lab.com/DC=planetexpress,DC=com?uid?sub??(|(memberOf=cn=admin_staff,ou=people,dc=planetexpress,dc=com)(memberOf=cn=ship_crew,ou=people,dc=planetexpress,dc=com))
ETCD
Encryption
oc edit apiserver
spec:
encryption:
type: aescbc
Check status.progress
of OpenShift API:
oc get openshiftapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
Check status.progress of Kubernetes API:
oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
Backups
Change project:
oc project openshift-config
Create Service Account:
oc create sa approver
Make service account cluster-admin
:
oc adm policy add-role-to-user cluster-admin system:serviceaccount:approver
Add service account to scc "privileged":
oc edit scc privileged
Example, under users
:
users: - system:admin - system:serviceaccount:openshift-infra:build-controller - system:serviceaccount:approver
Provision an NFS share for backups Ref. https://www.richardwalker.dev/pragmatic-openshift/#_nfs_server
Example for /etc/exports
on NFS server:
/mnt/openshift/backups 192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
Create directory on NFS server:
mkdir /mnt/openshift/backups
chmod 775 /mnt/openshift/backups
Create a PV using nfs storage class for backups:
vi backups-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: backups-pv
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
nfs:
path: /mnt/openshift/backups
server: 192.168.0.15
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
oc create -f backups-pv.yaml
oc get pv
Create a PVC for backups:
vi backup-nfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: etcd-backup
namespace: openshift-config
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: nfs
mountOptions:
- nfsvers=4.2
oc create -f backup-nfs-pvc.yaml
oc get pvc
Create a ConfigMap containing cluster-backup.sh
.
At the end of this script is a modification, On each master node the cluster-backup.sh is unique, containing a hardcoded reference to its own hosts IP Address, so this is tweaked to obtain the IP address.
|
Take notice of the following line:
IP_ADDR=$(hostname -i)
ETCDCTL_ENDPOINTS="https://${IP_ADDR}:2379" etcdctl snapshot save "${SNAPSHOT_FILE}"
Add the config map continer customised cluster-backup.sh script:s
Create the ConfigMap
:
vi backup-configmap.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: etcd-backup-script
namespace: openshift-config
data:
etcd-backup.sh: |
#!/usr/bin/env bash
### Created by cluster-etcd-operator. DO NOT edit.
set -o errexit
set -o pipefail
set -o errtrace
# example
# cluster-backup.sh $path-to-snapshot
if [[ $EUID -ne 0 ]]; then
echo "This script must be run as root"
exit 1
fi
function usage {
echo 'Path to backup dir required: ./cluster-backup.sh <path-to-backup-dir>'
exit 1
}
# If the first argument is missing, or it is an existing file, then print usage and exit
if [ -z "$1" ] || [ -f "$1" ]; then
usage
fi
if [ ! -d "$1" ]; then
mkdir -p "$1"
fi
# backup latest static pod resources
function backup_latest_kube_static_resources {
RESOURCES=("[email protected]")
LATEST_RESOURCE_DIRS=()
for RESOURCE in "${RESOURCES[@]}"; do
# shellcheck disable=SC2012
LATEST_RESOURCE=$(ls -trd "${CONFIG_FILE_DIR}"/static-pod-resources/"${RESOURCE}"-[0-9]* | tail -1) || true
if [ -z "$LATEST_RESOURCE" ]; then
echo "error finding static-pod-resource ${RESOURCE}"
exit 1
fi
echo "found latest ${RESOURCE}: ${LATEST_RESOURCE}"
LATEST_RESOURCE_DIRS+=("${LATEST_RESOURCE#${CONFIG_FILE_DIR}/}")
done
# tar latest resources with the path relative to CONFIG_FILE_DIR
tar -cpzf "$BACKUP_TAR_FILE" -C "${CONFIG_FILE_DIR}" "${LATEST_RESOURCE_DIRS[@]}"
chmod 600 "$BACKUP_TAR_FILE"
}
function source_required_dependency {
local path="$1"
if [ ! -f "${path}" ]; then
echo "required dependencies not found, please ensure this script is run on a node with a functional etcd static pod"
exit 1
fi
# shellcheck disable=SC1090
source "${path}"
}
BACKUP_DIR="$1"
DATESTRING=$(date "+%F_%H%M%S")
BACKUP_TAR_FILE=${BACKUP_DIR}/static_kuberesources_${DATESTRING}.tar.gz
SNAPSHOT_FILE="${BACKUP_DIR}/snapshot_${DATESTRING}.db"
BACKUP_RESOURCE_LIST=("kube-apiserver-pod" "kube-controller-manager-pod" "kube-scheduler-pod" "etcd-pod")
trap 'rm -f ${BACKUP_TAR_FILE} ${SNAPSHOT_FILE}' ERR
source_required_dependency /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-scripts/etcd.env
source_required_dependency /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-scripts/etcd-common-tools
# TODO handle properly
if [ ! -f "$ETCDCTL_CACERT" ] && [ ! -d "${CONFIG_FILE_DIR}/static-pod-certs" ]; then
ln -s "${CONFIG_FILE_DIR}"/static-pod-resources/etcd-certs "${CONFIG_FILE_DIR}"/static-pod-certs
fi
IP_ADDR=$(hostname -i)
#dl_etcdctl
backup_latest_kube_static_resources "${BACKUP_RESOURCE_LIST[@]}"
ETCDCTL_ENDPOINTS="https://${IP_ADDR}:2379" etcdctl snapshot save "${SNAPSHOT_FILE}"
echo "snapshot db and kube resources are successfully saved to ${BACKUP_DIR}"
oc create -f backup-configmap.yaml
Before creating the cronjob, SSH to master node and create a directory /mnt/backup
on master node/s:
ssh -i cluster_id_rsa [email protected]
sudo su -
mkdir /mnt/backup
SSH to a master node and get the correct quay.io/openshift-release-dev/ocp-v4.0-art-dev
image from master in file /etc/kubernetes/manifests/etcd-pod.yaml
ssh -i cluster_id_rsa [email protected]
sudo su -
cat /etc/kubernetes/manifests/etcd-pod.yaml | grep quay.io/openshift-release-dev/ocp-v4.0-art-dev
Example:
spec:
initContainers:
- name: etcd-ensure-env-vars
image: quay.io/openshift-release-dev/[email protected]:326516b79a528dc627e5a5d84c986fd35e5f8ff5cbd74ff0ef802473efccd285
Adjust the schedule as needed, use the right image as found in previous step and assuming you have created the /mnt/backup
directory on each master node:
vi backup-cronjob.yaml
kind: CronJob
apiVersion: batch/v1beta1
metadata:
name: cronjob-etcd-backup
namespace: openshift-config
labels:
purpose: etcd-backup
spec:
schedule: "10 10 * * *"
startingDeadlineSeconds: 200
concurrencyPolicy: Forbid
suspend: false
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
nodeSelector:
node-role.kubernetes.io/master: ''
restartPolicy: Never
activeDeadlineSeconds: 200
serviceAccountName: approver
hostNetwork: true
containers:
- resources:
requests:
cpu: 300m
memory: 250Mi
terminationMessagePath: /dev/termination-log
name: etcd-backup
command:
- /bin/sh
- '-c'
- >-
/usr/local/bin/etcd-backup.sh /mnt/backup
securityContext:
privileged: true
imagePullPolicy: IfNotPresent
volumeMounts:
- name: certs
mountPath: /etc/ssl/etcd/
- name: conf
mountPath: /etc/etcd/
- name: kubeconfig
mountPath: /etc/kubernetes/
- name: etcd-backup-script
mountPath: /usr/local/bin/etcd-backup.sh
subPath: etcd-backup.sh
- name: etcd-backup
mountPath: /mnt/backup
- name: scripts
mountPath: /usr/local/bin
terminationMessagePolicy: FallbackToLogsOnError
image: >-
quay.io/openshift-release-dev/[email protected]:c9487f25868eafe55b72932010afa4b2728955a3a326b4823a56b185dd10ec50
serviceAccount: approver
tolerations:
- operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoExecute
volumes:
- name: certs
hostPath:
path: /etc/kubernetes/static-pod-resources/etcd-member
type: ''
- name: conf
hostPath:
path: /etc/etcd
type: ''
- name: kubeconfig
hostPath:
path: /etc/kubernetes
type: ''
- name: scripts
hostPath:
path: /usr/local/bin
type: ''
- name: etcd-backup
persistentVolumeClaim:
claimName: etcd-backup
- name: etcd-backup-script
configMap:
name: etcd-backup-script
defaultMode: 493
oc create -f backup-cronjob.yaml
List the cronjob:
oc get cronjobs.batch
Run the cronjob on-demand:
oc create job --from=cronjob/cronjob-etcd-backup test-backup-001
The PVC should now be claimed:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
etcd-backup Bound backups-pv 50Gi RWX nfs 14m
The job should run and the backup create in the file share, on the NFS server (or mount the NFS share):
cd /mnt/openshift/backups/
-rw-------. 1 root root 140324896 Nov 27 11:18 snapshot_2020-11-28_111840.db
-rw-------. 1 root root 70093 Nov 27 11:18 static_kuberesources_2020-11-28_111840.tar.gz
SUMMARY
This document is likely to be updated and evolve. I hope the information and examples serve you well.