Kafka Deployment with Strimzi Operator and Envoy

This guide walks through the deployment of a production-ready Apache Kafka cluster on Kubernetes using the Strimzi Operator, complete with user authentication, RBAC permissions, and an Envoy proxy for external access.

Deliverables

  • High availability with 3 controllers and 3 brokers
  • User authentication with SCRAM-SHA-512
  • Fine-grained access control through ACLs
  • External access through an Envoy proxy
  • SSL/TLS is not setup to keep this exserse simple, this will be covered in another blog post

Table of Contents

Step 1: Install Strimzi Operator

First, install the Strimzi Kafka Operator using Helm:

helm repo add strimzi https://strimzi.io/charts/
helm repo update

helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator 
  --namespace kafka 
  --create-namespace

 

This creates a dedicated kafka namespace and installs the Strimzi operator that will manage our Kafka resources.

Step 2: Deploy Kafka Cluster

Install Custom Resource Definitions (CRDs)

Apply the necessary CRDs that define Kafka-related resources:

# Install all CRDs
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/040-Crd-kafka.yaml
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/04A-Crd-kafkanodepool.yaml
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/043-Crd-kafkatopic.yaml
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/refs/heads/main/install/cluster-operator/044-Crd-kafkauser.yaml

 

Setup Kafka Node Pools

Create a file named 10-nodepools.yaml with the following content:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: controllers
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  replicas: 3
  roles:
    - controller
  storage:
    type: persistent-claim
    class: longhorn
    size: 10Gi
    deleteClaim: false
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: brokers
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: persistent-claim
    class: longhorn
    size: 20Gi
    deleteClaim: false

 

 

This creates

  • 3 Kafka controller nodes with 10GB storage each
  • 3 Kafka broker nodes with 20GB storage each
  • Using the longhorn storage class for persistence

 

Apply the node pools configuration:

kubectl apply -f 10-nodepools.yaml

 

Create the Kafka Cluster

Create a file named 20-kafka.yaml with the following content:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: mkbits-strimzi-cluster01
  namespace: kafka
  annotations:
    strimzi.io/kraft: "enabled"
    strimzi.io/node-pools: "enabled"
spec:
  kafka:
    version: 3.9.0
    config:
      inter.broker.protocol.version: "3.9"
      log.message.format.version:  "3.9"
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
    listeners:
      - name: tls
        port: 9093
        type: internal
        tls: true
        authentication:
          type: scram-sha-512
      - name: plain
        port: 9092
        type: internal
        tls: false
        authentication:
          type: scram-sha-512
    authorization:
      type: simple
  entityOperator:
    topicOperator: {}
    userOperator: {}
Important Details:
  • Uses Kafka version 3.9.0 with KRaft mode enabled (no ZooKeeper)
  • Configures both TLS (9093) and plain (9092) internal listeners
  • Both listeners use SCRAM-SHA-512 authentication
  • Simple authorization is enabled for access control
  • Topic and User operators are enabled for managing topics and users

Apply the Kafka cluster configuration:

kubectl apply -f 20-kafka.yaml

 

Step 3: Configure Users and Permissions

User Creation

Create the following YAML files for different user configurations:

30-users.yaml:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: kafka-prod-user
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: prod_Topic01
          patternType: literal
        operation: All
      - resource:
          type: topic
          name: prod_Topic02
          patternType: literal
        operation: All
---   
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: kafka-dev-user
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: dev_Topic01
          patternType: literal
        operation: All

 Apply each user configuration:

kubectl apply -f 30-users.yaml

 

Retrieving User Credentials

Strimzi stores user credentials in Kubernetes secrets. Retrieve them with:

kubectl get secret <username> -n kafka -o jsonpath="{.data.password}" | base64 --decode

 

Example:

kubectl get secret kafka-prod-user -n kafka -o jsonpath="{.data.password}" | base64 --decode

 

Step 4: Create Topics

40-KafkaTopic.yaml

# topics-bundle.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: prod_Topic01
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  partitions: 6
  replicas: 3
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: prod_Topic02
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  partitions: 3
  replicas: 3
  config:
    cleanup.policy: delete             # ordinary log-retention-Default-7-Days
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: dev_Topic01
  namespace: kafka
  labels:
    strimzi.io/cluster: mkbits-strimzi-cluster01
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 86400000             # 1-day

 

Step 5: Deploy Envoy as a Kafka-Aware Proxy

Envoy serves as a protocol-aware proxy for Kafka, enabling:

  • Centralized connection handling
  • Reduced NAT complexity
  • External access to the Kafka cluster
  • Advanced routing and observability

Understanding Kafka DNS in Kubernetes

Strimzi creates headless services for Kafka brokers. In Kubernetes, pod DNS follows this format:

<pod-name>.<headless-service>.<namespace>.svc.cluster.local

 

For our Strimzi deployment, the elements are:

ComponentPatternExample
Pod name<cluster>-<pool>-<ordinal>mkbits-strimzi-cluster01-brokers-0
Headless service<cluster>-kafka-brokersmkbits-strimzi-cluster01-kafka-brokers

This gives us the following broker FQDNs:

mkbits-strimzi-cluster01-brokers-0.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local
mkbits-strimzi-cluster01-brokers-1.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local
mkbits-strimzi-cluster01-brokers-2.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local

 

Creating Envoy Configuration

Create a file named envoy-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-config
  namespace: kafka
data:
  envoy.yaml: |
    static_resources:
      listeners:
        - name: kafka_listener
          address:
            socket_address:
              address: 0.0.0.0
              port_value: 9094
          filter_chains:
            - filters:
                - name: envoy.filters.network.kafka_broker
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.network.kafka_broker.v3.KafkaBroker
                    stat_prefix: kafka
                    id_based_broker_address_rewrite_spec:
                      rules:
                        - id: 0
                          host: kafka-prod-eastus01.multicastbits.com
                          port: 9094
                        - id: 1
                          host: kafka-prod-eastus01.multicastbits.com
                          port: 9094
                        - id: 2
                          host: kafka-prod-eastus01.multicastbits.com
                          port: 9094
                - name: envoy.filters.network.tcp_proxy
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
                    stat_prefix: tcp
                    cluster: kafka_cluster
      clusters:
        - name: kafka_cluster
          connect_timeout: 1s
          type: strict_dns
          lb_policy: round_robin
          load_assignment:
            cluster_name: kafka_cluster
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: mkbits-strimzi-cluster01-brokers-0.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local
                          port_value: 9092
                  - endpoint:
                      address:
                        socket_address:
                          address: mkbits-strimzi-cluster01-brokers-1.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local
                          port_value: 9092
                  - endpoint:
                      address:
                        socket_address:
                          address: mkbits-strimzi-cluster01-brokers-2.mkbits-strimzi-cluster01-kafka-brokers.kafka.svc.cluster.local
                          port_value: 9092
    admin:
      access_log_path: /dev/null
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 9901

Key Configuration Points:

  • Exposes an admin interface on port 9901
  • Listens on port 9094 for Kafka traffic
  • Uses the Kafka broker filter to rewrite broker addresses to an external hostname
  • Establishes upstream connections to all Kafka brokers on port 9092

Apply the ConfigMap:

kubectl apply -f envoy-config.yaml

 

Deploying Envoy

Create a file named envoy-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: envoy
  namespace: kafka
spec:
  replicas: 1
  selector:
    matchLabels:
      app: envoy
  template:
    metadata:
      labels:
        app: envoy
    spec:
      containers:
        - name: envoy
          image: envoyproxy/envoy-contrib:v1.25-latest
          args:
            - "-c"
            - "/etc/envoy/envoy.yaml"
          ports:
            - containerPort: 9094
            - containerPort: 9901
          volumeMounts:
            - name: envoy-config
              mountPath: /etc/envoy
              readOnly: true
      volumes:
        - name: envoy-config
          configMap:
            name: envoy-config

 

Apply the Envoy deployment:

kubectl apply -f envoy-deployment.yaml

 

Exposing Envoy Externally

Create a file named envoy-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: envoy
  namespace: kafka
spec:
  type: LoadBalancer
  selector:
    app: envoy
  ports:
    - name: kafka
      port: 9094
      targetPort: 9094
    - name: admin
      port: 9901
      targetPort: 9901

 

Apply the service:

kubectl apply -f envoy-service.yaml

 

Maintenance and Verification

If you need to update the Envoy configuration later:

kubectl -n kafka apply -f envoy-config.yaml
kubectl -n kafka rollout restart deployment/envoy

 

To verify your deployment:

  1. Check that all pods are running:
    kubectl get pods -n kafka

     

  2. Get the external IP assigned to your Envoy service:
    kubectl get service envoy -n kafka

     

  3. Test connectivity using a Kafka client with the external address and retrieved user credentials.

 

Checking the health via envoy admin interface

 

http://kafka-prod-eastus01.multicastbits.com:9901/clusters

http://kafka-prod-eastus01.multicastbits.com:9901/clusters

http://kafka-prod-eastus01.multicastbits.com:9901/readyhttp://kafka-prod-eastus01.multicastbits.com:9901/stats?filter=kafka

 

Solution – RKE Cluster MetalLB provides Services with IP Addresses but doesn’t ARP for the address

I ran in to the the same issue detailed here working with a RKE cluster

https://github.com/metallb/metallb/issues/1154

After looking around for a few hours digging in to the logs i figured out the issue, hopefully this helps some one else our there in the situation save some time.

Make sure the IPVS mode is enabled on the cluster configuration

If you are using :

RKE2 – edit the cluster.yaml file

RKE1 – Edit the cluster configuration from the rancher UI > Cluster management > Select the cluster > edit configuration > edit as YAML

Locate the services field under rancher_kubernetes_engine_config and add the following options to enable IPVS

    kubeproxy:
      extra_args:
        ipvs-scheduler: lc
        proxy-mode: ipvs

https://www.suse.com/support/kb/doc/?id=000020035

Default

After changes

Make sure the Kernel modules are enabled on the nodes running control planes

Background

Example Rancher – RKE1 cluster

sudo docker ps | grep proxy # find the container ID for kubproxy

sudo docker logs ####containerID###

0313 21:44:08.315888  108645 feature_gate.go:245] feature gates: &{map[]}
I0313 21:44:08.346872  108645 proxier.go:652] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="nf_conntrack_ipv4"
E0313 21:44:08.347024  108645 server_others.go:107] "Can't use the IPVS proxier" err="IPVS proxier will not be used because the following required kernel modules are not loaded: [ip_vs_lc]"

Kubproxy is trying to load the needed kernel modules and failing to enable IPVS

Lets enable the kernel modules

sudo nano /etc/modules-load.d/ipvs.conf

ip_vs_lc
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4

Install ipvsadm to confirm the changes

sudo dnf install ipvsadm -y

Reboot the VM or the Baremetal server

use the sudo ipvsadm to confirm ipvs is enabled

sudo ipvsadm

Testing

kubectl get svc -n #namespace | grep load
arping -I ens192 192.168.94.140
ARPING 192.168.94.140 from 192.168.94.65 ens192
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 1.117ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.737ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.845ms
Unicast reply from 192.168.94.140 [00:50:56:96:E3:1D] 0.668ms
Sent 4 probes (1 broadcast(s))
Received 4 response(s)

If you have the service type load balancer on a deployment now you should be able to reach it if the container is responding on the service

helpful Links

https://metallb.universe.tf/configuration/troubleshooting/

https://github.com/metallb/metallb/issues/1154

https://github.com/rancher/rke2/issues/3710