Find the PCI-E Slot number of PCI-E Add On card GPU, NIC, etc on Linux/Proxmox

i was working on a v-GPU POC using PVE Since Broadcom Screwed us with the Vsphere licensing costs (New post incoming about this adventure)

anyway i needed to find the PCI-E Slot used for the A4000 GPU on the host to disable it for troubleshooting

Guide

First we need to find the occupied slots and the Bus address for each slot

sudo dmidecode -t slot | grep -E "Designation|Usage|Bus Address"

Output will show the Slot ID, Usage and then the Bus Address

        Designation: CPU SLOT1 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT2 PCI-E 4.0 X8
        Current Usage: In Use
        Bus Address: 0000:41:00.0
        Designation: CPU SLOT3 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c1:00.0
        Designation: CPU SLOT4 PCI-E 4.0 X8
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT5 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c2:00.0
        Designation: CPU SLOT6 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT7 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:81:00.0
        Designation: PCI-E M.2-M1
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: PCI-E M.2-M2
        Current Usage: Available
        Bus Address: 0000:ff:00.0

We can use lspci -s #BusAddress# to locate whats installed on each slot

lspci -s 0000:c2:00.0
c2:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)

lspci -s 0000:81:00.0
81:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1)

Im sure there is a much more elegant way to do this, but this worked as a quick ish way to find what i needed. if you know a better way please share in the comments

Until next time!!!

Reference –

https://stackoverflow.com/questions/25908782/in-linux-is-there-a-way-to-find-out-which-pci-card-is-plugged-into-which-pci-sl

Use Mailx to send emails using office 365

just something that came up while setting up a monitoring script using mailx, figured ill note it down here so i can get it to easily later when I need it 😀

Important prerequisites

  • You need to enable smtp basic Auth on Office 365 for the account used for authentication
  • Create an App password for the user account
  • nssdb folder must be available and readable by the user running the mailx command

Assuming all of the above prerequisite are $true we can proceed with the setup

Install mailx

RHEL/Alma linux

sudo dnf install mailx

NSSDB Folder

make sure the nssdb folder must be available and readable by the user running the mailx command

certutil -L -d /etc/pki/nssdb

The Output might be empty, but that’s ok; this is there if you need to add a locally signed cert or another CA cert manually, Microsoft Certs are trusted by default if you are on an up to date operating system with the local System-wide Trust Store

Reference – RHEL-sec-shared-system-certificates

Configure Mailx config file

sudo nano /etc/mail.rc

Append/prepend the following lines and Comment out or remove the same lines already defined on the existing config files

set smtp=smtp.office365.com
set smtp-auth-user=###[email protected]###
set smtp-auth-password=##Office365-App-password#
set nss-config-dir=/etc/pki/nssdb/
set ssl-verify=ignore
set smtp-use-starttls
set from="###[email protected]###"

This is the bare minimum needed other switches are located here – link

Testing

echo "Your message is sent!" | mailx -v -s "test" [email protected]

-v switch will print the verbos debug log to console

Connecting to 52.96.40.242:smtp . . . connected.
220 xxde10CA0031.outlook.office365.com Microsoft ESMTP MAIL Service ready at Sun, 6 Aug 2023 22:14:56 +0000
>>> EHLO vls-xxx.multicastbits.local
250-MN2PR10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-STARTTLS
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> STARTTLS
220 2.0.0 SMTP server ready
>>> EHLO vls-xxx.multicastbits.local
250-xxde10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-AUTH LOGIN XOAUTH2
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> AUTH LOGIN
334 VXNlcm5hbWU6
>>> Zxxxxxxxxxxxc0BmdC1zeXMuY29t
334 UGsxxxxxmQ6
>>> c2Rxxxxxxxxxxducw==
235 2.7.0 Authentication successful
>>> MAIL FROM:<###[email protected]###>
250 2.1.0 Sender OK
>>> RCPT TO:<[email protected]>
250 2.1.5 Recipient OK
>>> DATA
354 Start mail input; end with <CRLF>.<CRLF>
>>> .
250 2.0.0 OK <[email protected]> [Hostname=Bsxsss744.namprd11.prod.outlook.com]
>>> QUIT
221 2.0.0 Service closing transmission channel 

Now you can use this in your automation scripts or timers using the mailx command

#!/bin/bash

log_file="/etc/app/runtime.log"
recipient="[email protected]"
subject="Log file from /etc/app/runtime.log"

# Check if the log file exists
if [ ! -f "$log_file" ]; then
  echo "Error: Log file not found: $log_file"
  exit 1
fi

# Use mailx to send the log file as an attachment
echo "Sending log file..."
mailx -s "$subject" -a "$log_file" -r "[email protected]" "$recipient" < /dev/null
echo "Log file sent successfully."

Secure it

sudo chown root:root /etc/mail.rc
sudo chmod 600 /etc/mail.rc

The above commands change the file’s owner and group to root, then set the file permissions to 600, which means only the owner (root) has read and write permissions and other users have no access to the file.

Use Environment Variables: Avoid storing sensitive information like passwords directly in the mail.rc file, consider using environment variables for sensitive data and reference those variables in the configuration.

For example, in the mail.rc file, you can set:

set smtp-auth-password=$MY_EMAIL_PASSWORD

You can set the variable using another config file or store it in the Ansible vault during runtime or use something like Hashicorp.

Sure, I would just use Python or PowerShell core, but you will run into more locked-down environments like OCI-managed DB servers with only Mailx is preinstalled and the only tool you can use 🙁

the Fact that you are here means you are already in the same boat. Hope this helped… until next time

How to extend root (cs-root) Filesystem using LVM Cent OS/RHEL/Almalinux

This guide will walk you through on how to extend and increase space for the root filesystem on a alma linux. Cent OS, REHL Server/Desktop/VM

Method A – Expanding the current disk

Edit the VM and Add space to the Disk

install the cloud-utils-growpart package, as the growpart command in it makes it really easy to extend partitioned virtual disks.

sudo dnf install cloud-utils-growpart

Verify that the VM’s operating system recognizes the new increased size of the sda virtual disk, using lsblk or fdisk -l

sudo fdisk -l
Notes -
Note down the disk id and the partition number for Linux LVM - in this demo disk id is sda and lvm partition is sda 3

lets trigger a rescan of a block devices (Disks)

#elevate to root
sudo su 

#trigger a rescan, Make sure to match the disk ID you noted down before 
echo 1 > /sys/block/sda/device/rescan
exit

Now sudo fdisk -l shows the correct size of the disks

Use growpart to increase the partition size for the lvm

sudo growpart /dev/sda 3

Confirm the volume group name

sudo vgs

Extend the logical volume

sudo lvextend -l +100%FREE /dev/almalinux/root

Grow the file system size

sudo xfs_growfs /dev/almalinux/root
Notes -
You can use this same steps to add space to different partitions such as home, swap if needed

Method B -Adding a second Disk to the LVM and expanding space

Why add a second disk?
may be the the current Disk is locked due to a snapshot and you cant remove it, Only solution would be to add a second disk/

Check the current space available

sudo df -h 
Notes -
If you have 0% ~1MB left on the cs-root command auto-complete with tab and some of the later commands wont work, You should clear up atleast 4-10mb by clearing log files, temp files, etc

Mount an additional disk to the VM (Assuming this is a VM) and make sure the disk is visible on the OS level

sudo lvmdiskscan

OR

sudo fdisk -l

Confirm the volume group name

sudo vgs

Lets increase the space

First lets initialize the new disk we mounted

sudo mkfs.xfs /dev/sdb

Create the Physical volume

sudo pvcreate /dev/sdb

extend the volume group

sudo vgextend cs /dev/sdb
  Volume group "cs" successfully extended


Extend the logical volume

sudo lvextend -l +100%FREE /dev/cs/root

Grow the file system size

sudo xfs_growfs /dev/cs/root

Confirm the changes

sudo df -h

Just making easy for us!!

#Method A - Expanding the current disk 
#AlmaLinux
sudo dnf install cloud-utils-growpart

sudo lvmdiskscan
sudo fdisk -l                          #note down the disk ID and partition num


sudo su                                #elevate to root
echo 1 > /sys/block/sda/device/rescan  #trigger a rescan
exit                                   #exit root shell

sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

#Method B - Adding a second Disk 
#CentOS

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend cs /dev/sdb
sudo lvextend -l +100%FREE /dev/cs/root
sudo xfs_growfs /dev/cs/root
sudo df -h

#AlmaLinux

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend almalinux /dev/sdb
sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

Change the location of the Docker overlay2 storage directory

If you found this page you already know why you are looking for this, your server /dev/mapper/cs-root is filled due to /var/lib/docker taking up most of the space

Yes, you can change the location of the Docker overlay2 storage directory by modifying the daemon.json file. Here’s how to do it:

Open or create the daemon.json file using a text editor:

sudo nano /etc/docker/daemon.json

{
    "data-root": "/path/to/new/location/docker"
}

Replace “/path/to/new/location/docker” with the path to the new location of the overlay2 directory.

If the file already contains other configuration settings, add the "data-root" setting to the file under the "storage-driver" setting:

{
    "storage-driver": "overlay2",
    "data-root": "/path/to/new/location/docker"
}

Save the file and Restart docker

sudo systemctl restart docker

Don’t forget to remove the old data

rm -rf /var/lib/docker/overlay2

“System logs on hosts are stored on non-persistent storage” message on VCenter

Ran into this pesky little error message recently, on a vcenter environment

If the logs are stored on a local scratch disk, vCenter will display an alert stating –  “System logs on host xxx are stored on non-persistent storage”

Configure ESXi Syslog location – vSphere Web Client

Vcenter > Select “Host”> Configure > Advance System Settings

Click on Edit and search for “Syslog.global.logDir”

Edit the value and in this case, I’m going to use the local data store (Localhost_DataStore01) to store the syslogs.

You can also define a remote syslog server using the “Syslog.global.LogHost” setting

Configure ESXi Syslog location – ESXCLI

Ssh on to the host

Check the current location

esxcli system syslog config get

*logs stored on the local scratch disk

Manually Set the Path

esxcli system syslog config set –logdir=/vmfs/directory/path

you can find the VMFS volume names/UUIDs under  –

/vmfs/volumes

remote syslog server can be set using

esxcli system syslog config set –loghost=’tcp://hostname:port’

Load the configuration changes with the syslog reload command

esxcli system syslog reload

The logs will immediately begin populating the specified location.

Unable to upgrade vCenter 6.5/6.7 to U2: Root password expired

As a Part of my pre-flight check for Vcenter upgrades i like to mount the ISO and go through the first 3 steps, during this I noticed the installer cannot connect to the source appliance with this error 

2019-05-01T20:05:02.052Z - info: Stream :: close
2019-05-01T20:05:02.052Z - info: Password not expired
2019-05-01T20:05:02.054Z - error: sourcePrecheck: error in getting source Info: ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.
2019-05-01T20:05:03.328Z - error: Request timed out after 30000 ms, url: https://vcenter.companyABC.local:443/
2019-05-01T20:05:09.675Z - info: Log file was saved at: C:\Users\MCbits\Desktop\installer-20190501-160025555.log

trying to reset via the admin interface or the DCUI didn’t work,  after digging around found a way to reset it by forcing the vcenter to boot in to single user mode

Procedure:

  1. Take a snapshot or backup of the vCenter Server Appliance before proceeding. Do not skip this step.
  2. Reboot the vCenter Server Appliance.
  3. After the OS starts, press e key to enter the GNU GRUB Edit Menu.
  4. Locate the line that begins with the word Linux.
  5. Append these entries to the end of the line: rw init=/bin/bash The line should look like the following screenshot:

After adding the statement, press F10 to continue booting 

Vcenter appliance will boot into single user mode

Type passwd to reset the root password

if you run into the following error message

"Authentication token lock busy"

you need to re-mount the filesystem in RW, which lets you change between read-only and read-write. this will allow you to make changes

mount -o remount,rw /

Until next time !!!

 

MS Exchange 2016 [ERROR] Cannot find path ‘..\Exchange_Server_V15\UnifiedMessaging\grammars’ because it does not exist.


So recently I ran into this annoying error message with Exchange 2016 CU11 Update.

Environment info-

  • Exchange 2016 upgrade from CU8 to CU11
  • Exchange binaries are installed under D:\Microsoft\Exchange_Server_V15\..
Microsoft.PowerShell.Commands.GetItemCommand.ProcessRecord()". [12/04/2018 16:41:43.0233] [1] [ERROR] Cannot find path 'D:\Microsoft\Exchange_Server_V15\UnifiedMessaging\grammars' because it does not exist. 
[12/04/2018 16:41:43.0233] [1] [ERROR-REFERENCE] Id=UnifiedMessagingComponent___99d8be02cb8d413eafc6ff15e437e13d Component=EXCHANGE14:\Current\Release\Shared\Datacenter\Setup
[12/04/2018 16:41:43.0234] [1] Setup is stopping now because of one or more critical errors. [12/04/2018 16:41:43.0234] [1] Finished executing component tasks.
[12/04/2018 16:41:43.0318] [1] Ending processing Install-UnifiedMessagingRole
[12/04/2018 16:44:51.0116] [0] CurrentResult setupbase.maincore:396: 0 [12/04/2018 16:44:51.0118] [0] End of Setup
[12/04/2018 16:44:51.0118] [0] **********************************************

Root Cause

Ran the Setup again and it failed with the same error
while going though the log files i notice that the setup looks for this file path while configuring the "Mailbox role: Unified Messaging service" (Stage 6 on the GUI installer)

$grammarPath = join-path $RoleInstallPath "UnifiedMessaging\grammars\*";

There was no folder present with the name grammars under the Path specified on the error

just to confirm, i checked another server on CU8 and the grammars folder is there.

Not sure why the folder got removed, it may have happened during the first run of the CU11 setup that failed,

Resolution

My first thought was to copy the folder from an existing CU8 server. but just to avoid any issues (since exchange is sensitive to file versions)
I created an empty folder with the name "grammars" under D:\Microsoft\Exchange_Server_V15\UnifiedMessaging\




Ran the setup again and it continued the upgrade process and completed without any issues...¯\_(ツ)_/¯











[12/04/2018 18:07:50.0416] [2] Ending processing Set-ServerComponentState
[12/04/2018 18:07:50.0417] [2] Beginning processing Write-ExchangeSetupLog
[12/04/2018 18:07:50.0420] [2] Install is complete. Server state has been set to Active.
[12/04/2018 18:07:50.0421] [2] Ending processing Write-ExchangeSetupLog
[12/04/2018 18:07:50.0422] [1] Finished executing component tasks.
[12/04/2018 18:07:50.0429] [1] Ending processing Start-PostSetup
[12/04/2018 18:07:50.0524] [0] CurrentResult setupbase.maincore:396: 0
[12/04/2018 18:07:50.0525] [0] End of Setup
[12/04/2018 18:07:50.0525] [0] **********************************************

Considering cost of this software M$ really have to be better about error handling IMO, i have run in to silly issues like this way too many times since Exchange 2010.