Kafka 3.8 with Zookeeper SASL_SCRAM

Posted on May 13, 2025May 13, 2025 by Malinda RathnayakeLeave a Comment

Transport Encryption Methods:

SASL/SSL (Solid Teal/Green Lines):

Used for securing communication between producers/consumers and Kafka brokers.
- SASL (Simple Authentication and Security Layer): Authenticates clients (producers/consumers) to brokers, using SCRAM .
- SSL/TLS (Secure Sockets Layer/Transport Layer Security): Encrypts the data in transit, ensuring confidentiality and integrity during transmission.

Digest-MD5 (Dashed Yellow Lines):

Secures communication between Kafka brokers and the Zookeeper cluster.
- Digest-MD5: A challenge-response authentication mechanism providing basic encryption

Notes:

While functional, Digest-MD5 is an older algorithm. we opted for this to reduce complexity and the fact the zookeepers have issues with connecting with Brokers via SSL/TLS

We need to test and switch over KRAFT Protocol, this removes the use of Zookeeper altogether
Add IP ACLs for Zookeeper connections using firewalld to limit traffic between the nodes for replication

PKI and Certificate Signing

CA cert for local PKI,

We need to share this PEM file(without the private key) with the customer to authenticate

Internal applications the CA file must be used for authentication – Refer to the Configuration example documents

# Generate CA Key
openssl genrsa -out multicastbits_CA.key 4096
# Generate CA Certificate
openssl req -x509 -new -nodes -key multicastbits_CA.key -sha256 -days 3650 -out multicastbits_CA.crt -subj "/CN=multicastbits_CA"

Kafka Broker Certificates

# For Node1 - Repeat for other nodes

openssl req -new -nodes -out node1.csr -newkey rsa:2048 -keyout node1.key -subj "/CN=kafka01.multicastbits.com"

openssl x509 -req -CA multicastbits_CA.crt -CAkey multicastbits_CA.key -CAcreateserial -in node1.csr -out node1.crt -days 3650 -sha256

Create the kafka and zookeeper users

⚠️ Important: Do not skip this step. we need these users to setup Authentication in JaaS configuration

Before configuring the cluster with SSL and SASL, let’s start up the cluster without authentication and SSL to create the users. This allows us to:

Verify basic dependencies and confirm the zookeeper and Kafka clusters are coming up without any issues “make sure the car starts”
Create necessary user accounts for SCRAM
Test for any inter-node communication issues (Blocked Ports 9092, 9093 ,2181 etc)

Here’s how to set up this initial configuration:

Zookeeper Configuration (No SSL or Auth)

Create the following file: /opt/kafka/kafka_2.13-3.8.0/config/zookeeper-NOSSL_AUTH.properties

# Zookeeper Configuration without Auth
dataDir=/Data_Disk/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.166.110:2888:3888
server.2=192.168.166.111:2888:3888
server.3=192.168.166.112:2888:3888

Kafka Broker Configuration (No SSL or Auth)

Create the following file: /opt/kafka/kafka_2.13-3.8.0/config/server-NOSSL_AUTH.properties

# Kafka Broker Configuration without Auth/SSL
broker.id=1
listeners=PLAINTEXT://kafka01.multicastbits.com:9092
advertised.listeners=PLAINTEXT://kafka01.multicastbits.com:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT
zookeeper.connect=kafka01.multicastbits.com:2181,kafka02.multicastbits.com:2181,kafka03.multicastbits.com:2181

Open a new shell to the server Start Zookeeper:

/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-start.sh -daemon /opt/kafka/kafka_2.13-3.8.0/config/zookeeper-NOSSL_AUTH.properties

Open a new shell to start Kafka:

/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh -daemon /opt/kafka/kafka_2.13-3.8.0/config/server-NOSSL_AUTH.properties

Create the users:

Open a new shell and run the following commands:

kafka-configs.sh --bootstrap-server ext-kafka01.fleetcam.io:9092 --alter --add-config 'SCRAM-SHA-512=[password=zookeeper-password]' --entity-type users --entity-name ftszk

kafka-configs.sh --zookeeper ext-kafka01.fleetcam.io:2181 --alter --add-config 'SCRAM-SHA-512=[password=kafkaadmin-password]' --entity-type users --entity-name ftskafkaadminAfter the users are created without errors, press Ctrl+C to shut down the services we started earlier.

SASL_SSL configuration with SCRAM

Zookeeper configuration Notes

Zookeeper is configured with SASL/MD5 due to the SSL issues we faced during the initial setup
Zookeeper Traffic is isolated with in the Broker nodes to maintain security

dataDir=/Data_Disk/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.166.110:2888:3888
server.2=192.168.166.111:2888:3888
server.3=192.168.166.112:2888:3888
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
requireClientAuthScheme=sasl

/Data_Disk/zookeeper/myid file is updated corresponding to the zookeeper nodeID

cat /Data_Disk/zookeeper/myid
1

Jaas configuration

Create the Jaas configuration for zookeeper authentication, it has the follow this syntax

/opt/kafka/kafka_2.13-3.8.0/config/zookeeper-jaas.conf

Server {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   user_multicastbitszk="zkpassword";
};

KafkaOPTS

KafkaOPTS Java varible need to be passed when the zookeeper is started to point to the correct JaaS file

export KAFKA_OPTS="-Djava.security.auth.login.config="Path to the zookeeper-jaas.conf"

export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/kafka_2.13-3.8.0/config/zookeeper-jaas.conf"

There are few ways to handle this, you can add a script under profile.d or use a custom Zookeeper launch script for the systemd service

Systemd service

Create the launch shell script for Zookeeper

/opt/kafka/kafka_2.13-3.8.0/bin/zk-start.s

#!/bin/bash
#export the env variable
export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/kafka_2.13-3.8.0/config/zookeeper-jaas.conf"
#Start the zookeeper service
/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/zookeeper.properties
#debug - launch config with no SSL - we need this for initial setup and debug
#/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/zookeeper-NOSSL_AUTH.properties

After you save the file

chomod +x /opt/kafka/kafka_2.13-3.8.0/bin/zk-start.s

sudo chown -R multicastbitskafka:multicastbitskafka /opt/kafka/kafka_2.13-3.8.0

Create the systemd service file

/etc/systemd/system/zookeeper.service

[Unit]
Description=Apache Zookeeper Service
After=network.target
[Service]
User=multicastbitskafka
Group=multicastbitskafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/zk-start.sh
Restart=on-failure
[Install]

WantedBy=multi-user.target

After the file is saved, start the service

sudo systemctl daemon-reload.
sudo systemctl enable zookeeper
sudo systemctl start zookeeper

Kafka Broker configuration Notes

/opt/kafka/kafka_2.13-3.8.0/config/server.properties

broker.id=1
listeners=SASL_SSL://kafka01.multicastbits.com:9093
advertised.listeners=SASL_SSL://kafka01.multicastbits.com:9093
listener.security.protocol.map=SASL_SSL:SASL_SSL
authorizer.class.name=kafka.security.authorizer.AclAuthorizer
ssl.keystore.location=/opt/kafka/secrets/kafkanode1.keystore.jks
ssl.keystore.password=keystorePassword
ssl.truststore.location=/opt/kafka/secrets/kafkanode1.truststore.jks
ssl.truststore.password=truststorePassword
#SASL/SCRAM Authentication
sasl.enabled.mechanisms=SCRAM-SHA-256, SCRAM-SHA-512
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
sasl.mechanism.client=SCRAM-SHA-512
security.inter.broker.protocol=SASL_SSL
#zookeeper
zookeeper.connect=kafka01.multicastbits.com:2181,kafka02.multicastbits.com:2181,kafka03.multicastbits.com:2181
zookeeper.sasl.client=true
zookeeper.sasl.clientconfig=ZookeeperClient

zookeeper connect options

Define the zookeeper servers the broker will connect to

zookeeper.connect=kafka01.multicastbits.com:2181,kafka02.multicastbits.com:2181,kafka03.multicastbits.com:2181

Enable SASL

zookeeper.sasl.client=true

Tell the broker to use the creds defined under ZookeeperClient section on the JaaS file used by the kafka service

zookeeper.sasl.clientconfig=ZookeeperClient

Broker and listener configuration

Define the broker id

broker.id=1

Define the servers listener name and port

listeners=SASL_SSL://kafka01.multicastbits.com:9093

Define the servers advertised listener name and port

advertised.listeners=SASL_SSL://kafka01.multicastbits.com:9093

Define the SASL_SSL for security protocol

listener.security.protocol.map=SASL_SSL:SASL_SSL

Enable ACLs

authorizer.class.name=kafka.security.authorizer.AclAuthorizer

Define the Java Keystores

ssl.keystore.location=/opt/kafka/secrets/kafkanode1.keystore.jks

ssl.keystore.password=keystorePassword

ssl.truststore.location=/opt/kafka/secrets/kafkanode1.truststore.jks

ssl.truststore.password=truststorePassword

Jaas configuration

/opt/kafka/kafka_2.13-3.8.0/config/kafka_server_jaas.conf

KafkaServer {
  org.apache.kafka.common.security.scram.ScramLoginModule required
  username="multicastbitskafkaadmin"
  password="kafkaadmin-password";
};
ZookeeperClient {
  org.apache.zookeeper.server.auth.DigestLoginModule required
  username="multicastbitszk"
  password="Zookeeper_password";
};

SASL and SCRAM configuration Notes

Enable SASL SCRAM for authentication

org.apache.kafka.common.security.scram.ScramLoginModule required

Use MD5 for Zookeeper authentication

org.apache.zookeeper.server.auth.DigestLoginModule required

KafkaOPTS

KafkaOPTS Java variable need to be passed and must point to the correct JaaS file, when the kafka service is started

export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/kafka_2.13-3.8.0/config/kafka_server_jaas.conf"

Systemd service

Create the launch shell script for kafka

/opt/kafka/kafka_2.13-3.8.0/bin/multicastbitskafka-server-start.sh

#!/bin/bash
#export the env variable
export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/kafka_2.13-3.8.0/config/kafka_server_jaas.conf"
#Start the kafka service
/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/server.properties
#debug - launch config with no SSL - we need this for initial setup and debug
#/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/server-NOSSL_AUTH.properties

Create the systemd service

/etc/systemd/system/kafka.service

[Unit]
Description=Apache Kafka Broker Service
After=network.target zookeeper.service
[Service]
User=multicastbitskafka
Group=multicastbitskafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/multicastbitskafka-server-start.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target

Connect authenticate and use Kafka CLI tools

Requirements

multicastbitsadmin.keystore.jks
multicastbitsadmin.truststore.jks
WSL2 with java-11-openjdk-devel wget nano
Kafka 3.8 folder extracted locally

Setup your environment

Setup WSL2

You can use any Linux environment with JDK17 or 11

install dependencies

dnf install -y wget nano java-11-openjdk-devel

Download Kafka and extract it (in going to extract it to the home DIR under kafka)

# 1. Download Kafka (Choose a version compatible with your server)
wget https://dlcdn.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz
# 2. Extract
tar xzf kafka_2.13-3.8.0.tgz

Copy the jks files (You should generate them with the CA JKS, or use one from one of the nodes) to ~/

cp multicastbitsadmin.keystore.jks ~/

cp multicastbitsadmin.truststore.jks ~/

Create your admin client properties file

change the path to fit your setup

nano ~/kafka-adminclient.properties

# Security protocol and SASL/SSL configuration
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
# SSL Configuration
ssl.keystore.location=/opt/kafka/secrets/multicastbitsadmin.keystore.jks
ssl.keystore.password=keystorepw
ssl.truststore.location=/opt/kafka/secrets/multicastbitsadmin.truststore.jks
ssl.truststore.password=truststorepw
# SASL Configuration
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required 
    username="#youradminUser#" 
		password="#your-admin-PW#";

Create the JaaS file for the admin client

nano ~/kafka_client_jaas.conf

Some kafka-cli tools still look for the jaas.conf under KAFKA_OPTS environment variable

KafkaClient {
  org.apache.kafka.common.security.scram.ScramLoginModule required
  username="#youradminUser#"
  password="#your-admin-PW#";
};

Export the Kafka environment variables

export KAFKA_HOME=/opt/kafka/kafka_2.13-3.8.0
export PATH=$PATH:$KAFKA_HOME/bin
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export KAFKA_OPTS="-Djava.security.auth.login.config=~/kafka_client_jaas.conf"
source ~/.bashrc

Kafka CLI Usage Examples

Create a user

kafka-configs.sh --bootstrap-server kafka01.multicastbits.com:9093 --alter --add-config 'SCRAM-SHA-512=[password=#password#]' --entity-type users --entity-name %username%--command-config ~/kafka-adminclient.properties

Create a topic

kafka-topics.sh --bootstrap-server kafka01.multicastbits.com:9093 --create --topic %topicname% --partitions 10 --replication-factor 3 --command-config ~/kafka-adminclient.properties

Create ACLs

External customer user with READ DESCRIBE privileges to a single topic

kafka-acls.sh --bootstrap-server kafka01.multicastbits.com:9093 
  --command-config ~/kafka-adminclient.properties 
  --add --allow-principal User:customer-user01 
  --operation READ --operation DESCRIBE --topic Customer_topic

Troubleshooting

Here are some common issues you might encounter when setting up and using Kafka with SASL_SCRAM authentication, along with their solutions:

1. Connection refused errors

Issue: Clients unable to connect to Kafka brokers.

Solution:

Verify that the Kafka brokers are running and listening on the correct ports.
Check firewall settings to ensure the Kafka ports are open and accessible.
Confirm that the bootstrap server addresses in client configurations are correct.

2. Authentication failures

Issue: Clients fail to authenticate with Kafka brokers.

Solution:

Double-check username and password in the JAAS configuration file.
Ensure the SCRAM credentials are properly set up on the Kafka brokers.
Verify that the correct SASL mechanism (SCRAM-SHA-512) is specified in client configurations.

3. SSL/TLS certificate issues

Issue: SSL handshake failures or certificate validation errors.

Solution:

Confirm that the keystore and truststore files are correctly referenced in configurations.
Verify that the certificates in the truststore are up-to-date and not expired.
Ensure that the hostname in the certificate matches the broker’s advertised listener.

4. Zookeeper connection issues

Issue: Kafka brokers unable to connect to Zookeeper ensemble.

Solution:

Verify Zookeeper connection string in Kafka broker configurations.
Ensure Zookeeper servers are running and accessible and the ports are open
Check Zookeeper client authentication settings in JAAS configuration file

NFS Provisioner Setup and Testing Guide for Rancher RKE2/Kubernetes

Posted on May 13, 2025May 13, 2025 by Malinda RathnayakeLeave a Comment

This guide covers how to add an NFS StorageClass and a dynamic provisioner to Kubernetes using the nfs-subdir-external-provisioner Helm chart. This enables us to mount NFS shares dynamically for PersistentVolumeClaims (PVCs) used by workloads.

Example use cases:

Database migrations
Apache Kafka clusters
Data processing pipelines

Requirements:

An accessible NFS share exported with: rw,sync,no_subtree_check,no_root_squash
NFSv3 or NFSv4 protocol
Kubernetes v1.31.7+ or RKE2 with rke2r1 or later

lets get to it

1. NFS Server Export Setup

Ensure your NFS server exports the shared directory correctly:

# /etc/exports
/rke-pv-storage  worker-node-ips(rw,sync,no_subtree_check,no_root_squash)

Replace worker-node-ips with actual IPs or CIDR blocks of your worker nodes.
Run sudo exportfs -r to reload the export table.

2. Install NFS Subdir External Provisioner

Add the Helm repo and install the provisioned:

helm repo add nfs-subdir-external-provisioner \
  https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update

helm install nfs-client-provisioner \
  nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
  --namespace kube-system \
  --set nfs.server=192.168.162.100 \
  --set nfs.path=/rke-pv-storage \
  --set storageClass.name=nfs-client \
  --set storageClass.defaultClass=false

Notes:

If you want this to be the default storage class, change storageClass.defaultClass=true.
nfs.server should point to the IP of your NFS server.
nfs.path must be a valid exported directory from that NFS server.
storageClass.name can be referenced in your PersistentVolumeClaim YAMLs using storageClassName: nfs-client.

3. PVC and Pod Test

Create a test PVC and pod using the following YAML:

# test-nfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-nfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: test-nfs-pod
spec:
  containers:
  - name: shell
    image: busybox
    command: [ "sh", "-c", "sleep 3600" ]
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: test-nfs-pvc

Apply it:

kubectl apply -f test-nfs-pvc.yaml
kubectl get pvc test-nfs-pvc -w

Expected output:

NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test-nfs-pvc   Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   1Gi        RWX            nfs-client     30s

4. Troubleshooting

If the PVC remains in Pending, follow these steps:

Check the provisioner pod status:

kubectl get pods -n kube-system | grep nfs-client-provisioner

Inspect the provisioner pod:

kubectl describe pod -n kube-system <pod-name>
kubectl logs -n kube-system <pod-name>

Common Issues:

Broken State: Bad NFS mount
```
mount.nfs: access denied by server while mounting 192.168.162.100:/pl-elt-kakfka
```
- This usually means the NFS path is misspelled or not exported properly.

Broken State: root_squash enabled

failed to provision volume with StorageClass "nfs-client": unable to create directory to provision new pv: mkdir /persistentvolumes/…: permission denied

Fix by changing the export to use no_root_squash or chown the directory to nobody:nogroup.

ImagePullBackOff
- Ensure nodes have internet access and can reach registry.k8s.io.
RBAC errors
- Make sure the ServiceAccount used by the provisioner has permissions to watch PVCs and create PVs.

5. Healthy State Example

kubectl get pods -n kube-system | grep nfs-client-provisioner-nfs-subdir-external-provisioner
nfs-client-provisioner-nfs-subdir-external-provisioner-7992kq7m   1/1     Running     0          3m39s

kubectl describe pod -n kube-system nfs-client-provisioner-nfs-subdir-external-provisioner-7992kq7m
# Output shows pod is Running with Ready=True

kubectl logs -n kube-system nfs-client-provisioner-nfs-subdir-external-provisioner-7992kq7m
...
I0512 21:46:03.752701       1 controller.go:1420] provision "default/test-nfs-pvc" class "nfs-client": volume "pvc-73481f45-3055-4b4b-80f4-e68ffe83802d" provisioned
I0512 21:46:03.752763       1 volume_store.go:212] Trying to save persistentvolume "pvc-73481f45-3055-4b4b-80f4-e68ffe83802d"
I0512 21:46:03.772301       1 volume_store.go:219] persistentvolume "pvc-73481f45-3055-4b4b-80f4-e68ffe83802d" saved
I0512 21:46:03.772353       1 event.go:278] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Name:"test-nfs-pvc"}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-73481f45-3055-4b4b-80f4-e68ffe83802d
...

Once test-nfs-pvc is bound and the pod starts successfully, your setup is working. You can now safely use storageClass: nfs-client in other workloads (e.g., Strimzi KafkaNodePool).

Find the PCI-E Slot number of PCI-E Add On card GPU, NIC, etc on Linux/Proxmox

Posted on March 5, 2024March 5, 2024 by Malinda RathnayakeLeave a Comment

i was working on a v-GPU POC using PVE Since Broadcom Screwed us with the Vsphere licensing costs (New post incoming about this adventure)

anyway i needed to find the PCI-E Slot used for the A4000 GPU on the host to disable it for troubleshooting

Guide

First we need to find the occupied slots and the Bus address for each slot

sudo dmidecode -t slot | grep -E "Designation|Usage|Bus Address"

Output will show the Slot ID, Usage and then the Bus Address

        Designation: CPU SLOT1 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT2 PCI-E 4.0 X8
        Current Usage: In Use
        Bus Address: 0000:41:00.0
        Designation: CPU SLOT3 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c1:00.0
        Designation: CPU SLOT4 PCI-E 4.0 X8
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT5 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c2:00.0
        Designation: CPU SLOT6 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT7 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:81:00.0
        Designation: PCI-E M.2-M1
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: PCI-E M.2-M2
        Current Usage: Available
        Bus Address: 0000:ff:00.0

We can use lspci -s #BusAddress# to locate whats installed on each slot

lspci -s 0000:c2:00.0
c2:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)

lspci -s 0000:81:00.0
81:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1)

Im sure there is a much more elegant way to do this, but this worked as a quick ish way to find what i needed. if you know a better way please share in the comments

Until next time!!!

Reference –

https://stackoverflow.com/questions/25908782/in-linux-is-there-a-way-to-find-out-which-pci-card-is-plugged-into-which-pci-sl

Use Mailx to send emails using office 365

Posted on August 7, 2023August 7, 2023 by Malinda RathnayakeLeave a Comment

just something that came up while setting up a monitoring script using mailx, figured ill note it down here so i can get it to easily later when I need it 😀

Important prerequisites

You need to enable smtp basic Auth on Office 365 for the account used for authentication
Create an App password for the user account
nssdb folder must be available and readable by the user running the mailx command

Assuming all of the above prerequisite are $true we can proceed with the setup

Install mailx

RHEL/Alma linux

sudo dnf install mailx

NSSDB Folder

make sure the nssdb folder must be available and readable by the user running the mailx command

certutil -L -d /etc/pki/nssdb

The Output might be empty, but that’s ok; this is there if you need to add a locally signed cert or another CA cert manually, Microsoft Certs are trusted by default if you are on an up to date operating system with the local System-wide Trust Store

Reference – RHEL-sec-shared-system-certificates

Configure Mailx config file

sudo nano /etc/mail.rc

Append/prepend the following lines and Comment out or remove the same lines already defined on the existing config files

set smtp=smtp.office365.com
set smtp-auth-user=###[email protected]###
set smtp-auth-password=##Office365-App-password#
set nss-config-dir=/etc/pki/nssdb/
set ssl-verify=ignore
set smtp-use-starttls
set from="###[email protected]###"

This is the bare minimum needed other switches are located here – link

Testing

echo "Your message is sent!" | mailx -v -s "test" [email protected]

-v switch will print the verbos debug log to console

Connecting to 52.96.40.242:smtp . . . connected.
220 xxde10CA0031.outlook.office365.com Microsoft ESMTP MAIL Service ready at Sun, 6 Aug 2023 22:14:56 +0000
>>> EHLO vls-xxx.multicastbits.local
250-MN2PR10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-STARTTLS
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> STARTTLS
220 2.0.0 SMTP server ready
>>> EHLO vls-xxx.multicastbits.local
250-xxde10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-AUTH LOGIN XOAUTH2
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> AUTH LOGIN
334 VXNlcm5hbWU6
>>> Zxxxxxxxxxxxc0BmdC1zeXMuY29t
334 UGsxxxxxmQ6
>>> c2Rxxxxxxxxxxducw==
235 2.7.0 Authentication successful
>>> MAIL FROM:<###[email protected]###>
250 2.1.0 Sender OK
>>> RCPT TO:<[email protected]>
250 2.1.5 Recipient OK
>>> DATA
354 Start mail input; end with <CRLF>.<CRLF>
>>> .
250 2.0.0 OK <[email protected]> [Hostname=Bsxsss744.namprd11.prod.outlook.com]
>>> QUIT
221 2.0.0 Service closing transmission channel

Now you can use this in your automation scripts or timers using the mailx command

#!/bin/bash

log_file="/etc/app/runtime.log"
recipient="[email protected]"
subject="Log file from /etc/app/runtime.log"

# Check if the log file exists
if [ ! -f "$log_file" ]; then
  echo "Error: Log file not found: $log_file"
  exit 1
fi

# Use mailx to send the log file as an attachment
echo "Sending log file..."
mailx -s "$subject" -a "$log_file" -r "[email protected]" "$recipient" < /dev/null
echo "Log file sent successfully."

Secure it

sudo chown root:root /etc/mail.rc
sudo chmod 600 /etc/mail.rc

The above commands change the file’s owner and group to root, then set the file permissions to 600, which means only the owner (root) has read and write permissions and other users have no access to the file.

Use Environment Variables: Avoid storing sensitive information like passwords directly in the mail.rc file, consider using environment variables for sensitive data and reference those variables in the configuration.

For example, in the mail.rc file, you can set:

set smtp-auth-password=$MY_EMAIL_PASSWORD

You can set the variable using another config file or store it in the Ansible vault during runtime or use something like Hashicorp.

Sure, I would just use Python or PowerShell core, but you will run into more locked-down environments like OCI-managed DB servers with only Mailx is preinstalled and the only tool you can use 🙁

the Fact that you are here means you are already in the same boat. Hope this helped… until next time

Solution – RKE Cluster MetalLB provides Services with IP Addresses but doesn’t ARP for the address

Posted on March 16, 2023May 4, 2023 by Malinda RathnayakeLeave a Comment

I ran in to the the same issue detailed here working with a RKE cluster

https://github.com/metallb/metallb/issues/1154

After looking around for a few hours digging in to the logs i figured out the issue, hopefully this helps some one else our there in the situation save some time.

Make sure the IPVS mode is enabled on the cluster configuration

If you are using :

RKE2 – edit the cluster.yaml file

RKE1 – Edit the cluster configuration from the rancher UI > Cluster management > Select the cluster > edit configuration > edit as YAML

Locate the services field under rancher_kubernetes_engine_config and add the following options to enable IPVS

    kubeproxy:
      extra_args:
        ipvs-scheduler: lc
        proxy-mode: ipvs

https://www.suse.com/support/kb/doc/?id=000020035

Default

After changes

Make sure the Kernel modules are enabled on the nodes running control planes

Background

Example Rancher – RKE1 cluster

sudo docker ps | grep proxy # find the container ID for kubproxy

sudo docker logs ####containerID###

0313 21:44:08.315888  108645 feature_gate.go:245] feature gates: &{map[]}
I0313 21:44:08.346872  108645 proxier.go:652] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="nf_conntrack_ipv4"
E0313 21:44:08.347024  108645 server_others.go:107] "Can't use the IPVS proxier" err="IPVS proxier will not be used because the following required kernel modules are not loaded: [ip_vs_lc]"

Kubproxy is trying to load the needed kernel modules and failing to enable IPVS

Lets enable the kernel modules

sudo nano /etc/modules-load.d/ipvs.conf

ip_vs_lc
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4

Install ipvsadm to confirm the changes

sudo dnf install ipvsadm -y

Reboot the VM or the Baremetal server

use the sudo ipvsadm to confirm ipvs is enabled

sudo ipvsadm

Testing

kubectl get svc -n #namespace | grep load

arping -I ens192 192.168.94.140
ARPING 192.168.94.140 from 192.168.94.65 ens192
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 1.117ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.737ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.845ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.668ms
Sent 4 probes (1 broadcast(s))
Received 4 response(s)

If you have the service type load balancer on a deployment now you should be able to reach it if the container is responding on the service

helpful Links

https://metallb.universe.tf/configuration/troubleshooting/

https://github.com/metallb/metallb/issues/1154

https://github.com/rancher/rke2/issues/3710

How to extend root (cs-root) Filesystem using LVM Cent OS/RHEL/Almalinux

Posted on March 11, 2023September 27, 2023 by Malinda RathnayakeLeave a Comment

This guide will walk you through on how to extend and increase space for the root filesystem on a alma linux. Cent OS, REHL Server/Desktop/VM

Method A – Expanding the current disk

Edit the VM and Add space to the Disk

install the cloud-utils-growpart package, as the growpart command in it makes it really easy to extend partitioned virtual disks.

sudo dnf install cloud-utils-growpart

Verify that the VM’s operating system recognizes the new increased size of the sda virtual disk, using lsblk or fdisk -l

sudo fdisk -l

Notes -
Note down the disk id and the partition number for Linux LVM - in this demo disk id is sda and lvm partition is sda 3

lets trigger a rescan of a block devices (Disks)

#elevate to root
sudo su 

#trigger a rescan, Make sure to match the disk ID you noted down before 
echo 1 > /sys/block/sda/device/rescan
exit

Now sudo fdisk -l shows the correct size of the disks

Use growpart to increase the partition size for the lvm

sudo growpart /dev/sda 3

Confirm the volume group name

sudo vgs

Extend the logical volume

sudo lvextend -l +100%FREE /dev/almalinux/root

Grow the file system size

sudo xfs_growfs /dev/almalinux/root

Notes -
You can use this same steps to add space to different partitions such as home, swap if needed

Method B -Adding a second Disk to the LVM and expanding space

Why add a second disk?
may be the the current Disk is locked due to a snapshot and you cant remove it, Only solution would be to add a second disk/

Check the current space available

sudo df -h

Notes -
If you have 0% ~1MB left on the cs-root command auto-complete with tab and some of the later commands wont work, You should clear up atleast 4-10mb by clearing log files, temp files, etc

Mount an additional disk to the VM (Assuming this is a VM) and make sure the disk is visible on the OS level

sudo lvmdiskscan

sudo fdisk -l

Confirm the volume group name

sudo vgs

Lets increase the space

First lets initialize the new disk we mounted

sudo mkfs.xfs /dev/sdb

Create the Physical volume

sudo pvcreate /dev/sdb

extend the volume group

sudo vgextend cs /dev/sdb

  Volume group "cs" successfully extended

Extend the logical volume

sudo lvextend -l +100%FREE /dev/cs/root

Grow the file system size

sudo xfs_growfs /dev/cs/root

Confirm the changes

sudo df -h

Just making easy for us!!

#Method A - Expanding the current disk 
#AlmaLinux
sudo dnf install cloud-utils-growpart

sudo lvmdiskscan
sudo fdisk -l                          #note down the disk ID and partition num


sudo su                                #elevate to root
echo 1 > /sys/block/sda/device/rescan  #trigger a rescan
exit                                   #exit root shell

sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

#Method B - Adding a second Disk 
#CentOS

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend cs /dev/sdb
sudo lvextend -l +100%FREE /dev/cs/root
sudo xfs_growfs /dev/cs/root
sudo df -h

#AlmaLinux

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend almalinux /dev/sdb
sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

Setup guide for VSFTPD FTP Server – SELinux enforced with fail2ban (RHEL, CentOS, Almalinux)

Posted on March 2, 2023March 11, 2023 by Malinda RathnayakeLeave a Comment

Few things to note

if you want to prevent directory traversal we need to setup chroot with vsftpd (not covered on this KB)

For the demo I just used Unencrypted FTP on port 21 to keep things simple, Please utilize SFTP with the letsencrypt certificate for better security. i will cover this on another article and link it here

Update and Install packages we need

sudo dnf update

sudo dnf install net-tools lsof unzip zip tree policycoreutils-python-utils-2.9-20.el8.noarch vsftpd nano setroubleshoot-server -y

Setup Groups and Users and security hardening

if you want to prevent directory traversal we need to setup chroot with vsftpd (not covered on this KB)

Create the Service admin account

sudo useradd ftpadmin
sudo passwd ftpadmin

Create the group

sudo groupadd FTP_Root_RW

Create FTP only user shell for the FTP users

echo -e '#!/bin/sh\necho "This account is limited to FTP access only."' | sudo tee -a /bin/ftponly
sudo chmod a+x /bin/ftponly

echo "/bin/ftponly" | sudo tee -a /etc/shells

Create FTP users

sudo useradd ftpuser01 -m -s /bin/ftponly
sudo useradd ftpuser02 -m -s /bin/ftponly
user passwd ftpuser01 
user passwd ftpuser02

Add the users to the group

sudo usermod -a -G FTP_Root_RW ftpuser01
sudo usermod -a -G FTP_Root_RW ftpuser02

sudo usermod -a -G FTP_Root_RW ftpadmin

Disable SSH Access for the FTP users.

Edit sshd_config

sudo nano /etc/ssh/sshd_config

Add the following line to the end of the file

DenyUsers ftpuser01 ftpuser02

Open ports on the VM Firewall

sudo firewall-cmd --permanent --add-port=20-21/tcp

#Allow the passive Port-Range we will define it later on the vsftpd.conf
sudo firewall-cmd --permanent --add-port=60000-65535/tcp

#Reload the ruleset
sudo firewall-cmd --reload

Setup the Second Disk for FTP DATA

Attach another disk to the VM and reboot if you haven’t done this already

lsblk to check the current disks and partitions detected by the system

lsblk

Create the XFS partition

sudo mkfs.xfs /dev/sdb
# use mkfs.ext4 for ext4

Why XFS? https://access.redhat.com/articles/3129891

Create the folder for the mount point

sudo mkdir /FTP_DATA_DISK

Update the etc/fstab file and add the following line

sudo nano etc/fstab

/dev/sdb /FTP_DATA_DISK xfs defaults 1 2

Mount the disk

sudo mount -a

Testing

mount | grep sdb

Setup the VSFTPD Data and Log Folders

Setup the FTP Data folder

sudo mkdir /FTP_DATA_DISK/FTP_Root -p

Create the log directory

sudo mkdir /FTP_DATA_DISK/_logs/ -p

Set permissions

sudo chgrp -R FTP_Root_RW /FTP_DATA_DISK/FTP_Root/
sudo chmod 775 -R /FTP_DATA_DISK/FTP_Root/

Setup the VSFTPD Config File

Backup the default vsftpd.conf and create a newone

sudo mv /etc/vsftpd/vsftpd.conf /etc/vsftpd/vsftpdconfback
sudo nano /etc/vsftpd/vsftpd.conf

#KB Link - ####

anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=002
dirmessage_enable=YES
ftpd_banner=Welcome to multicastbits Secure FTP service.
chroot_local_user=NO
chroot_list_enable=NO
chroot_list_file=/etc/vsftpd/chroot_list
listen=YES
listen_ipv6=NO

userlist_file=/etc/vsftpd/user_list
pam_service_name=vsftpd
userlist_enable=YES
userlist_deny=NO
listen_port=21
connect_from_port_20=YES
local_root=/FTP_DATA_DISK/FTP_Root/

xferlog_enable=YES
vsftpd_log_file=/FTP_DATA_DISK/_logs/vsftpd.log
log_ftp_protocol=YES
dirlist_enable=YES
download_enable=NO

pasv_enable=Yes
pasv_max_port=65535
pasv_min_port=60000

Add the FTP users to the userlist file

Backup the Original file

sudo mv /etc/vsftpd/user_list /etc/vsftpd/user_listBackup
echo "ftpuser01" | sudo tee -a /etc/vsftpd/user_list
echo "ftpuser02" | sudo tee -a /etc/vsftpd/user_list

sudo systemctl start vsftpd

sudo systemctl enable vsftpd

sudo systemctl status vsftpd

Setup SELinux

instead of putting our hands up and disabling SElinux, we are going to setup the policies correctly

Find the available policies using getsebool -a | grep ftp

getsebool -a | grep ftp

ftpd_anon_write --> off
ftpd_connect_all_unreserved --> off
ftpd_connect_db --> off
ftpd_full_access --> off
ftpd_use_cifs --> off
ftpd_use_fusefs --> off
ftpd_use_nfs --> off
ftpd_use_passive_mode --> off
httpd_can_connect_ftp --> off
httpd_enable_ftp_server --> off
tftp_anon_write --> off
tftp_home_dir --> off
[lxadmin@vls-BackendSFTP02 _logs]$ 
[lxadmin@vls-BackendSFTP02 _logs]$ 
[lxadmin@vls-BackendSFTP02 _logs]$ getsebool -a | grep ftp
ftpd_anon_write --> off
ftpd_connect_all_unreserved --> off
ftpd_connect_db --> off
ftpd_full_access --> off
ftpd_use_cifs --> off
ftpd_use_fusefs --> off
ftpd_use_nfs --> off
ftpd_use_passive_mode --> off
httpd_can_connect_ftp --> off
httpd_enable_ftp_server --> off
tftp_anon_write --> off
tftp_home_dir --> off

Set SELinux boolean values

sudo setsebool -P ftpd_use_passive_mode on

sudo setsebool -P ftpd_use_cifs on

sudo setsebool -P ftpd_full_access 1

    "setsebool" is a tool for setting SELinux boolean values, which control various aspects of the SELinux policy.

    "-P" specifies that the boolean value should be set permanently, so that it persists across system reboots.

    "ftpd_use_passive_mode" is the name of the boolean value that should be set. This boolean value controls whether the vsftpd FTP server should use passive mode for data connections.

    "on" specifies that the boolean value should be set to "on", which means that vsftpd should use passive mode for data connections.

    Enable ftp_home_dir --> on if you are using chroot

Add a new file context rule to the system.

sudo semanage fcontext -a -t public_content_rw_t "/FTP_DATA_DISK/FTP_Root/(/.*)?"

    "fcontext" is short for "file context", which refers to the security context that is associated with a file or directory.

    "-a" specifies that a new file context rule should be added to the system.

    "-t" specifies the new file context type that should be assigned to files or directories that match the rule.

    "public_content_rw_t" is the name of the new file context type that should be assigned to files or directories that match the rule. In this case, "public_content_rw_t" is a predefined SELinux type that allows read and write access to files and directories in public directories, such as /var/www/html.

    "/FTP_DATA_DISK/FTP_Root/(/.)?" specifies the file path pattern that the rule should match. The pattern includes the "/FTP_DATA_DISK/FTP_Root/" directory and any subdirectories or files beneath it. The regular expression "/(.)?" matches any file or directory name that may follow the "/FTP_DATA_DISK/FTP_Root/" directory path.

In summary, this command sets the file context type for all files and directories under the "/FTP_DATA_DISK/FTP_Root/" directory and its subdirectories to "public_content_rw_t", which allows read and write access to these files and directories.

Reset the SELinux security context for all files and directories under the “/FTP_DATA_DISK/FTP_Root/”

sudo restorecon -Rvv /FTP_DATA_DISK/FTP_Root/

    "restorecon" is a tool that resets the SELinux security context for files and directories to their default values.

    "-R" specifies that the operation should be recursive, meaning that the security context should be reset for all files and directories under the specified directory.

    "-vv" specifies that the command should run in verbose mode, which provides more detailed output about the operation.

"/FTP_DATA_DISK/FTP_Root/" is the path of the directory whose security context should be reset.

Setup Fail2ban

Install fail2ban

sudo dnf install fail2ban

Create the jail.local file

This file is used to overwrite the config blocks in /etc/fail2ban/fail2ban.conf

sudo nano /etc/fail2ban/jail.local

vsftpd]
enabled = true
port = ftp,ftp-data,ftps,ftps-data
logpath = /FTP_DATA_DISK/_logs/vsftpd.log
maxretry = 5
bantime = 7200

Make sure to update the logpath directive to match the vsftpd log file we defined on the vsftpd.conf file

sudo systemctl start fail2ban

sudo systemctl enable fail2ban

sudo systemctl status fail2ban

journalctl -u fail2ban  will help you narrow down any issues with the service

Testing

sudo tail -f /var/log/fail2ban.log

Fail2ban injects and manages the following rich rules

Client will fail to connect using FTP until the ban is lifted

Remove the ban IP list

#get the list of banned IPs 
sudo fail2ban-client get vsftpd banned

#Remove a specific IP from the list 
sudo fail2ban-client set vsftpd unbanip <IP>

#Remove/Reset all the the banned IP lists
sudo fail2ban-client unban --all

This should get you up and running, For the demo I just used Unencrypted FTP on port 21 to keep things simple, Please utilize SFTP with the letsencrypt certificate for better security. i will cover this on another article and link it here

Change the location of the Docker overlay2 storage directory

Posted on February 24, 2023March 11, 2023 by Malinda Rathnayake3 Comments

If you found this page you already know why you are looking for this, your server /dev/mapper/cs-root is filled due to /var/lib/docker taking up most of the space

Yes, you can change the location of the Docker overlay2 storage directory by modifying the daemon.json file. Here’s how to do it:

Open or create the daemon.json file using a text editor:

sudo nano /etc/docker/daemon.json

{
    "data-root": "/path/to/new/location/docker"
}

Replace “/path/to/new/location/docker” with the path to the new location of the overlay2 directory.

If the file already contains other configuration settings, add the "data-root" setting to the file under the "storage-driver" setting:

{
    "storage-driver": "overlay2",
    "data-root": "/path/to/new/location/docker"
}

Save the file and Restart docker

sudo systemctl restart docker

Don’t forget to remove the old data

rm -rf /var/lib/docker/overlay2

PowerShell remoting (WinRM) over HTTPS using a AD CS PKI (CA) signed client Certificate

Posted on March 31, 2021March 31, 2021 by Malinda RathnayakeLeave a Comment

This is a guide to show you how to enroll your servers/desktops to allow powershell remoting (WINRM) over HTTPS

Assumptions

You have a working Root CA on the ADDS environment – Guide
CRL and AIA is configured properly – Guide
Root CA cert is pushed out to all Servers/Desktops – This happens by default

Contents

Setup CA Certificate template
Deploy Auto-enrolled Certificates via Group Policy
Powershell logon script to set the WinRM listener
Deploy the script as a logon script via Group Policy
Testing

1 – Setup CA Certificate template to allow Client Servers/Desktops to checkout the certificate from the CA

Connect to the The Certification Authority Microsoft Management Console (MMC)

Navigate to Certificate Templates > Manage

On the “Certificate templates Console” window > Select Web Server > Duplicate Template

Under the new Template window Set the following attributes

General – Pick a Name and Validity Period – This is up to you

Compatibility – Set the compatibility attributes (You can leave this on the default values, It up to you)

Subject Name – Set ‘Subject Name’ attributes (Important)

Security – Add “Domain Computers” Security Group and Set the following permissions

Read – Allow
Enroll – Allow
Autoenroll – Allow

Click “OK” to save and close out of “Certificate template console”

Issue to the new template

Go back to the “The Certification Authority Microsoft Management Console” (MMC)

Under templates (Right click the empty space) > Select New > Certificate template to Issue

Under the Enable Certificate template window > Select the Template you just created

Allow few minutes for ADDS to replicate and pick up the changes with in the forest

2 – Deploy Auto-enrolled Certificates via Group Policy

Create a new GPO

Windows Settings > Security Settings > Public Key Policies/Certificate Services Client – Auto-Enrollment Settings

Link the GPO to the relevant OU with in your ADDS environment

Note – You can push out the root CA cert as a trusted root certificate with this same policy if you want to force computers to pick up the CA cert,

Testing

If you need to test it gpupdate/force or reboot your test machine, The Server VM/PC will pickup a certificate from ADCS PKI

3 – Powershell logon script to set the WINRM listener

Dry run

Setup the log file
Check for the Certificate matching the machines FQDN Auto-enrolled from AD CS
If exist
- Set up the HTTPS WInRM listener and bind the certificate
- Write log
else
- Write log

#Malinda Rathnayake- 2020
#
#variable
$Date = Get-Date -Format "dd_MM_yy"
$port=5986
$SessionRunTime = Get-Date -Format "dd_yyyy_HH-mm"
#
#Setup Logs folder and log File
$ScriptVersion = '1.0'
$locallogPath = "C:\_Scripts\_Logs\WINRM_HTTPS_ListenerBinding"
#
$logging_Folder = (New-Item -Path $locallogPath -ItemType Directory -Name $Date -Force)
$ScriptSessionlogFile = New-Item $logging_Folder\ScriptSessionLog_$SessionRunTime.txt -Force
$ScriptSessionlogFilePath = $ScriptSessionlogFile.VersionInfo.FileName
#
#Check for the the auto-enrolled SSL Cert
$RootCA = "Company-Root-CA" #change This
$hostname = ([System.Net.Dns]::GetHostByName(($env:computerName))).Hostname
$certinfo = (Get-ChildItem -Path Cert:\LocalMachine\My\ |? {($_.Subject -Like "CN=$hostname") -and ($_.Issuer -Like "CN=$RootCA*")})
$certThumbprint = $certinfo.Thumbprint
#
#Script-------------------------------------------------------
#
#Remove the existing WInRM Listener if there is any
Get-ChildItem WSMan:\Localhost\Listener | Where -Property Keys -eq "Transport=HTTPS" | Remove-Item -Recurse -Force
#
#If the client certificate exists Setup the WinRM HTTPS listener with the cert else Write log
if ($certThumbprint){
#
New-Item -Path WSMan:\Localhost\Listener -Transport HTTPS -Address * -CertificateThumbprint $certThumbprint -HostName $hostname -Force
#
netsh advfirewall firewall add rule name="Windows Remote Management (HTTPS-In)" dir=in action=allow protocol=TCP localport=$port
#
Add-Content -Path $ScriptSessionlogFilePath -Value "Certbinding with the HTTPS WinRM HTTPS Listener Completed"
Add-Content -Path $ScriptSessionlogFilePath -Value "$certinfo.Subject"}
else{
Add-Content -Path $ScriptSessionlogFilePath -Value "No Cert matching the Server FQDN found, Please run gpupdate/force or reboot the system"
}

Script is commented with Explaining each section (should have done functions but i was pressed for time, never got around to do it, if you do fix it up and improve this please let me know in the comments :D)

5 – Deploy the script as a logon script via Group Policy

Setup a GPO and set this script as a logon Powershell script

Im using a user policy with GPO Loop-back processing set to Merge applied to the server OU

Testing

To confirm WinRM is listening on HTTPS, type the following commands:

winrm enumerate winrm/config/listener

Winrm get http://schemas.microsoft.com/wbem/wsman/1/config

Sources that helped me

https://docs.microsoft.com/en-us/troubleshoot/windows-client/system-management-components/configure-winrm-for-https

https://gmusumeci.medium.com/get-rid-of-those-annoying-self-signed-certificates-with-microsoft-certificate-services-part-3-9d4b8e819f45

http://vcloud-lab.com/entries/powershell/powershell-remoting-over-https-using-self-signed-ssl-certificate

ArubaOS CX Virtual Switching Extension – VSX Stacking Guide

Posted on December 8, 2020September 18, 2021 by Malinda Rathnayake2 Comments

What is VSX?

VSX is a cluster technology that allows the two VSX switches to run with independent control planes (OSPF/BGP) and present themselves as different routers in the network. In the datapath, however, they function as a single router and support active-active forwarding.

VSX allows you to mitigate inherent issues with a shared control plane that comes with traditional stacking while maintaining all the benefits

Control plane: Inter-Switch-Link and Keepalive
Data plane L2: MCLAGs
Data plane L3: Active gateway

This is a very similar technology compared to Dell VLT stacking with Dell OS10

Basic feature Comparison with Dell VLT Stacking

	Dell VLT Stacking	Aruba VSX
Supports Multi chassis Lag	✅	✅
independent control planes	✅	✅
All active-gateway configuration (L3 load balancing)	✅(VLT Peer routing)	✅(VSX Active forwarding)
Layer 3 Packet load balancing	✅	✅
Can Participate in Spanning tree MST/RSTP	✅	✅
Gateway IP Redundancy	✅VRRP	✅(VSX Active Gateway or VRRP)

Setup Guide

What you need?

10/25/40/100GE Port for the interswitch link

VSX supported switch, VSX is only supported on switches above CX6300 SKU

Switch Series	VSX
CX 6200 series	X
CX 6300 series	X
CX 6400 series	✅
CX 8200 series	✅
CX 8320/8325 series	✅
CX 8360 series	✅

**Updated 2020-Dec

For this guide im using a 8325 series switch

Dry run

Setup LAG interface for the inter-switch link (ISL)
Create the VSX cluster
Setup a keepalive link and a new VRF for the keepalive traffic

Setup LAG interface for the inter-switch link (ISL)

In order to form the VSX cluster, we need a LAG interface for the inter switch communication

Naturally i pick the fastest ports on the switch to create this 10/25/40/100GE

Depending on what switch you have, The ISL bandwidth can be a limitation/Bottleneck, Account for this factor when designing a VSX based solution
Utilize VSX-Activeforwarding or Active gateways to mitigate this

Create the LAG interface

This is a regular Port channel no special configurations, you need to create this on both switches

Native VLAN needs to be the default VLAN
Trunk port and All VLANs allowed

CORE01#

interface lag 256
no shutdown
description VSX-LAG
no routing
vlan trunk native 1 tag
vlan trunk allowed all
lacp mode active
exit


-------------------------------

CORE02#

interface lag 256
no shutdown
description VSX-LAG
no routing
vlan trunk native 1 tag
vlan trunk allowed all
lacp mode active
exit

Add/Assign the physical ports to the LAG interface

I’m using two 100GE ports for the ISL LAG

CORE01#

interface 1/1/55
no shutdown
lag 256
exit
interface 1/1/56
no shutdown
lag 256
exit

-------------------------------

CORE02#

interface 1/1/55
no shutdown
lag 256
exit
interface 1/1/56
no shutdown
lag 256
exit

Do the same configuration on the VSX Peer switch (Second Switch)

Connect the cables via DAC/Optical and confirm the Port-channel health

CORE01# sh lag 256
System-ID       : b8:d4:e7:d5:36:00
System-priority : 65534

Aggregate lag256 is up
 Admin state is up
 Description : VSX-LAG
 Type                        : normal
 MAC Address                 : b8:d4:e7:d5:36:00
 Aggregated-interfaces       : 1/1/55 1/1/56
 Aggregation-key             : 256
 Aggregate mode              : active
 Hash                        : l3-src-dst
 LACP rate                   : slow
 Speed                       : 200000 Mb/s
 Mode                        : trunk


-------------------------------------------------------------------

CORE02# sh lag 256
System-ID       : b8:d4:e7:d5:f3:00
System-priority : 65534

Aggregate lag256 is up
 Admin state is up
 Description : VSX-LAG
 Type                        : normal
 MAC Address                 : b8:d4:e7:d5:f3:00
 Aggregated-interfaces       : 1/1/55 1/1/56
 Aggregation-key             : 256
 Aggregate mode              : active
 Hash                        : l3-src-dst
 LACP rate                   : slow
 Speed                       : 200000 Mb/s
 Mode                        : trunk

Form the VSX Cluster

under the configuration mode, go in to the VSX context by entering “vsx” and issue the following commands on both switches

CORE01#

vsx
    inter-switch-link lag 256
    role primary
    linkup-delay-timer 30

-------------------------------

CORE02#

vsx
    inter-switch-link lag 256
    role secondary
    linkup-delay-timer 30

Check the VSX Status

CORE01# sh vsx status
VSX Operational State
---------------------
  ISL channel             : In-Sync
  ISL mgmt channel        : operational
  Config Sync Status      : In-Sync
  NAE                     : peer_reachable
  HTTPS Server            : peer_reachable

Attribute           Local               Peer
------------        --------            --------
ISL link            lag256              lag256
ISL version         2                   2
System MAC          b8:d4:e7:d5:36:00   b8:d4:e7:d5:f3:00
Platform            8325                8325
Software Version    GL.10.06.0001       GL.10.06.0001
Device Role         primary             secondary

----------------------------------------

CORE02# sh vsx status
VSX Operational State
---------------------
  ISL channel             : In-Sync
  ISL mgmt channel        : operational
  Config Sync Status      : In-Sync
  NAE                     : peer_reachable
  HTTPS Server            : peer_reachable

Attribute           Local               Peer
------------        --------            --------
ISL link            lag256              lag256
ISL version         2                   2
System MAC          b8:d4:e7:d5:f3:00   b8:d4:e7:d5:36:00
Platform            8325                8325
Software Version    GL.10.06.0001       GL.10.06.0001
Device Role         secondary           primary

Setup the Keepalive Link

its recommended to set up a Keepalive link to avoid Split-brain scenarios if the ISL goes down, We are trying to prevent both switches from thinking they are the active devices creating STP loops and other issues on the network

This is not a must-have, it’s nice to have, As of Aruba CX OS 10.06.x you need to sacrifice one of your data ports for this

Dell OS10 VLT archives this via the OOBM network ports, Supposedly Keepalive over OOBM is something Aruba is working on for future releases

Few things to note

It’s recommended using a routed port in a separate VRF for the keepalive link
can use a 1Gbps link for this if needed

Provision the port and VRF

CORE01#

vrf KEEPALIVE

interface 1/1/48
no shutdown
vrf attach KEEPALIVE
description VSX-keepalive-Link
ip address 192.168.168.1/24
exit

-----------------------------------------

CORE02#

vrf KEEPALIVE

interface 1/1/48
no shutdown
vrf attach KEEPALIVE
description VSX-keepalive-Link
ip address 192.168.168.2/24
exit

Define the Keepalive link

Note – Remember to define the vrf id in the keepalive statement

Thanks /u/illumynite for pointing that out

CORE01#

vsx
    inter-switch-link lag 256
    role primary
    keepalive peer 192.168.168.2 source 192.168.168.1 vrf KEEPALIVE
    linkup-delay-timer 30

-----------------------------------------

CORE02#

vsx
    inter-switch-link lag 256
    role secondary
    keepalive peer 192.168.168.1 source 192.168.168.2 vrf KEEPALIVE
    linkup-delay-timer 30

Next up…….

VSX MC-LAG
VSX Active forwarding
VSX Active gateway

References

AOS-CX 10.06 Virtual SwitchingExtension (VSX) Guide

As always if you notice any mistakes please do let me know in the comments