After the Weights Freeze: What Happens When You Hit Enter


In the last post I tried to explain how an LLM gets built. Billions of numbers, adjusted one fraction at a time, until structure emerges from prediction pressure. Circuits form. Clusters of meaning self-organize.

But that post ends where the interesting part begins. Now the model exists. The weights are frozen. Training is done.

Now we type something and hit enter. What actually happens?

This is the post I wish I’d had when I started using these tools.


The Forward Pass

We type a message, Text gets chopped into tokens.

Subword chunks, not full words. “Understanding” becomes something like ["under", "standing"]. Your message might be 20 words but 30+ tokens.

Those tokens flow forward through the model’s layers. Every layer transforms the representation. The attention mechanism lets each token look back at every other token in the context and decide what’s relevant.

The weights don’t change during this process. They’re frozen from training. The model is just running, applying its learned patterns to your specific input.

What comes out is a list.

Lets Say we type: "Write a short paragraph about Kafka vs RabbitMQ"

The model tokenizes that, processes it through every layer, and has to pick the very first token of its response.

To do that, it computes a score for every token in its vocabulary.


What’s a vocabulary?

The vocabulary is the fixed list of every token the model knows, built before training using byte-pair encoding on a massive text corpus that LLM companies have scraped off the internet.

For GPT-2, that’s 50,257 tokens. For newer models it’s larger, often 100k+.

The output is a probability distribution across that entire vocabulary, every time, for every single token it generates.


We are going to use GPT-2 as an example to explain the concept

 For that first token, the raw scores (logits) might look something like this:

Token 8,527  ("When"):     0.1263
Token 16,401 ("Kafka"):    0.0891
Token 3,198  ("The"):      0.0734
Token 11,045 ("Both"):     0.0622
Token 23,189 ("Apache"):   0.0418
Token 1,550  ("In"):       0.0387
Token 42,007 ("Choosing"): 0.0095
Token 7,904  ("Message"):  0.0071
Token 33,421 ("While"):    0.0068
Token 50,012 ("ĠðŁ"):     0.0000003
Token 831    (" Q"):        0.0000001
...
[50,246 more entries trailing into the decimals]

 

Every generation step. Fifty thousand scores. The model doesn’t “think of” the top five and pick one.

It produces all 50,257 simultaneously and the sampling process decides which one wins. Most of that list is near-zero noise.

Tokens like emoji fragments and random punctuation that have no business starting a paragraph about message brokers. But they’re scored anyway. Every time.

This is the fundamental object we’re manipulating every time we use these tools.

A probability distribution over the entire vocabulary, shaped by everything the model has seen so far in the context window.

Let’s hang onto that mental image. It will make everything else in this post make sense.


The Dice Roll in Practice

Other post covered temperature conceptually. Low temp means predictable, high temp means creative. But knowing how it works changes how we use it.

The model produces those raw scores (logits) for all 50,257 tokens. Temperature divides those scores before they get converted to probabilities. That division matters.

Lets use our Kafka vs RabbitMQ prompt and trace what happens.

 Low temperature (0.2): Stick to the spec

The division amplifies the gaps between scores. “When” was already the top pick, and after low-temp scaling it dominates. The model opens with “When” almost every time. Run it five times:

Run 1: "When comparing Kafka and RabbitMQ, the key distinction lies in..."
Run 2: "When choosing between Kafka and RabbitMQ, it's important to..."
Run 3: "When evaluating message brokers, Kafka and RabbitMQ represent..."
Run 4: "When comparing Kafka and RabbitMQ, the key distinction lies in..."
Run 5: "When choosing between Kafka and RabbitMQ, the fundamental..."

 Nearly identical openings. The first token barely varies, and that initial choice constrains everything that follows

Runs 1 and 4 might be word-for-word identical for the first 20 tokens before the dice diverge.

 High temperature (1.0): Creative

The division shrinks the gaps. “When” is still the most probable, but “Kafka,” “Both,” “Apache,” “Choosing” all get a real shot. The outputs sprawl:

Run 1: "Both systems handle messaging but their philosophies diverge..."
Run 2: "Kafka treats the log as the fundamental abstraction..."
Run 3: "Choosing between these two usually comes down to whether..."
Run 4: "Apache Kafka and RabbitMQ solve overlapping problems from..."
Run 5: "In the messaging landscape, Kafka and RabbitMQ occupy..."

The model is running the same weights and producing the same kind of distribution, temperature just changes how adventurous you are when sampling from it.

Each choice cascades. Once the model starts with “Kafka treats the log,” the next token distribution shifts entirely compared to starting with “Both systems handle.”

Temperature = 0: greedy decoding. Always pick the highest-scoring token. Completely deterministic: same input, same output, every time. No dice roll at all.

Then there’s the filtering that happens before the roll.

Top-k says “only consider the k highest-scoring tokens” (exclude the rest, then renormalize).
Top-p (nucleus sampling) says “start from the top and keep adding tokens until their cumulative probability reaches p (a threshold you choose, like 0.9), then exclude the rest.”
Most production systems use some combination of all three. (If you want the full technical breakdown of these decoding methods, Hugging Face’s walkthrough is excellent.)

This is why “regenerate” gives us a different response. Same weights, same context, same list of 50,257 scores. Different roll of the dice. The terrain is identical. The path through it changes.

Modern agentic coding tools like Cursor/Cline/Codex use separate modes for planning vs coding/debugging to take advantage of this, often along with different system prompts/constraints.

Planning needs to explore options, consider architectures, think laterally. That’s higher temperature territory.

Writing the actual code from the plan needs to be precise and deterministic. That’s lower temperature.

Same model behind both modes. Different sampling strategy for different phases of the work.

Where the knobs actually are

If you’re calling the model via an API, you can usually tune these parameters to match your needs. If you’re using a chat UI/tool, the app typically picks defaults for you.

With the Claude API, we get temperature (0.0 to 1.0, defaults to 1.0), top_k, and top_p. Anthropic’s guidance is to just use temperature and leave the others alone.

With Google’s Gemini, we get temperature controls directly in the AI Studio UI. No API needed, just a slider. Their range goes from 0.0 to 2.0.

Temperature 0.5 on Claude and temperature 0.5 on Gemini don’t produce the same behavior.

Each provider trains and tunes differently, so the same number produces different sampling characteristics. It’s the same concept across all of them, but we can’t just copy settings between providers and expect identical results.


System Prompts as Activation Space Anchoring

Before I started going down this rabbit hole, I assumed system prompts worked like config flags.

“Set the model to be a Python expert.” “Tell it to be concise.” Flips a switch in the model and changes the behavior. I think most people using these tools have that same mental model.

Turns out I was wrong. And understanding what’s actually happening made me noticeably better at using these tools.

A system prompt is text. It gets tokenized and fed into the model as the first tokens in the context window. Those tokens flow through the same layers as everything else. They produce activations, patterns of neural activity inside the model. And those activations influence every token that comes after.

Check the galaxy map from Anthropic’s feature visualization, where concepts cluster into neighborhoods

Code near code, legal language near legal language, casual conversation near casual conversation

The system prompt doesn’t tell the model which neighborhood to visit. It starts the model in that neighborhood.

When you write "You are a senior Python developer who writes production code with proper error handling, type hints, and logging"

every one of those tokens activates features in the model. “Senior” pulls toward experienced patterns.

“Production” pulls toward robustness. “Error handling,” “type hints,” “logging” each activate their own clusters.

Those activations become part of the context. Every subsequent token the model generates is influenced by them because the attention mechanism lets every new token look back at the system prompt tokens.

(For Claude API, Anthropic has documentation on how system prompts work at the implementation level.)

The system prompt seeds the context window with tokens that bias which internal features and clusters activate. It pulls the model into a specific region of its representation space (I’m using ‘space’ loosely here, it’s more ‘internal state’ more than geometry).

this is activation space anchoring (this is an analogy for shifting internal representations, not a literal coordinate system like a map.)

(This isn’t just theory. we can literally steer GPT-2’s behavior by adding activation vectors into its forward pass. Add a “wedding” vector and the model talks about weddings. Add an “anger” vector and it gets hostile. The activations are the steering mechanism, and system prompts are doing a version of the same thing through natural language.)

And this connects directly to the probability list. The system prompt doesn’t add new tokens to the vocabulary. It doesn’t unlock hidden capabilities.

What it does is reshape the probability distribution over those same 50,257 tokens.

Tokens related to the system prompt’s domain get boosted. so it will assign high probability to tokens in that subjects domain

Take our Kafka vs RabbitMQ prompt again. Without a system prompt, the first-token distribution had “When” on top, “Kafka” and “Both” trailing behind, a generic opening for a generic comparison.

Now add a system prompt: "You are a senior distributed systems architect. Prioritize throughput, partition tolerance, and operational tradeoffs. Be direct."

The same prompt. But those system prompt tokens have been flowing through the model’s layers, activating features related to distributed systems, performance, architecture. By the time the model gets to our question, the probability landscape has shifted:

Token 16,401 ("Kafka"):    0.1534   (was 0.0891)
Token 3,198  ("The"):      0.0812   (was 0.0734)
Token 6,571  ("At"):       0.0498   (new in top 10)
Token 11,045 ("Both"):     0.0411   (was 0.0622)
Token 8,527  ("When"):     0.0389   (was 0.1263, dropped hard)
Token 19,888 ("From"):     0.0285   (new in top 10)
Token 23,189 ("Apache"):   0.0271   (was 0.0418)

 “When,” the safe essay-style opener, dropped from first place to fifth. “Kafka” jumped to the top. The model is more likely to lead with the technical substance rather than a comparison framework. That "Be direct" token cluster suppressed the hedging openers. The distributed systems context boosted tokens that lead to architectural analysis.

Same vocabulary. Same 50,257 entries. Different weights across the list.

Simplified interactive visualization of Activation Space Anchoring

Activation Space Navigator

This is a toy 2D visualization. Real activation space is enormous and multidimensional, but it’s faithful to the idea that prompts steer the model by shifting internal activations.


Temperature + System Prompt: Two Knobs, One Process

Once you start seeing it this way, temperature and system prompts stop being separate concepts.

The system prompt shapes which probabilities are high and which are low. It sculpts the distribution. Boosts code tokens, suppresses casual ones, or whatever the prompt content biases toward.

Temperature controls how strictly the model follows that shaped distribution.

Low temperature means “stick to what the system prompt is pushing you toward.”

High temperature means “the system prompt set a direction, but feel free to wander.”

They’re two knobs on the same process. One shapes the probability field. The other controls how tightly the model walks along its ridges.


MoE Routing: When the Architecture Gets Involved

Some models take this further. Mixture of Experts (MoE) architectures, used in models like Gemini and DeepSeek, don’t activate all their parameters for every token. They route each token through a subset of specialized “expert” subnetworks. (Hugging Face has a solid explainer on how MoE works with a full architecture breakdown.)

tldr;

In a MoE model, the system prompt tokens flow through the network and produce hidden states, just like in a dense model. But the way they influence routing is indirect, and it matters to get this right.

The router itself is stateless. It’s a simple feed-forward layer that looks at one token’s hidden state and decides which experts to use. It has no memory of what came before. So the system prompt tokens don’t “tilt” the router or bias it over time.

What actually happens is the Attention mechanism does the work first. When a token from your actual question (say “Kafka”) is being processed, it attends back to the system prompt tokens (“You are a distributed systems architect”).

That attention pulls system-prompt context into the current token’s hidden state vector. By the time that enriched “Kafka” vector reaches the MoE layer, it looks different than it would without the system prompt. The router sees that specific vector, evaluates it, and routes it to the experts that match. A “Kafka” vector colored by distributed systems context gets routed differently than a “Kafka” vector colored by literary analysis context.

It’s not a clean “wake up the code expert” signal. It’s per-token and indirect. The system prompt infects each new token through Attention, and that infected representation is what the router evaluates.

Implementation details vary by architecture, but the core idea is the same: routing decisions are made per-token from that token’s current hidden state.

The effect is real, but the mechanism is Attention doing the heavy lifting before the router ever sees the token.

This is very similar to the activation anchoring principle, but operating at an additional architectural level. Not just biasing which features activate within a single network, but biasing which sub-networks get used at all.


Why Models Drift in Long Conversations

This one drove me nuts before I understood the mechanism.

we write a careful system prompt. The model follows it perfectly for 10 messages. By message 20, it’s drifting. The tone shifts.

It starts complimenting you. It forgets constraints you set. With some models, the anti-sycophancy instructions you wrote might as well not exist after enough back-and-forth.

The architecture explains exactly why.

Attention has a cost that scales with context length. As the conversation grows, each new token has more previous tokens to attend to. The system prompt tokens are still there, they haven’t been deleted, but they’re now a small fraction of a much larger context window.

Think of it like a voice in a growing crowd. Your system prompt is a person at the front of the room speaking clearly. When there are 10 people in the room, everyone hears this them fine. When there are 500 people all talking, that original voice gets harder to pick out.

As context grows, relevant instructions can lose salience among competing tokens; re-anchoring helps reenforce intended context.

Transformers don’t inherently know word order, so they use positional encodings (like RoPE, Rotary Position Embedding) to inject position information into each token.

These encodings bias the attention mechanism to favor tokens that are physically closer. As the conversation gets longer, the physical distance between the current token and the system prompt grows.

Now when we Combine that distance penalty with the fact that recent back-and-forth dialogue we built up in the chat, the system prompt’s anchoring effect fades.

And what fills the gap is the model’s base personality. The behaviors baked in during RLHF and preference tuning.

The agreeable, helpful, slightly sycophantic tendencies that training optimized for. The system prompt was overriding those tendencies, but as its influence weakens, the base behavior seeps back through.

This is why context window isn’t just a memory constraint. It’s a behavioral stability constraint.

A model with a 128k context window doesn’t just remember more, it maintains system prompt influence over a longer conversation.

( “Lost in the Middle” Paper shows language models perform best when relevant information is at the beginning or end of the context, and significantly worse when it’s buried in the middle. system prompt sits at the very beginning, which helps, but distance penalty applies)


Practical Implications

Dense system prompts beat fluffy ones.

Length isn’t the problem. Anthropic’s own default system prompt for Claude is thousands of tokens long, and it works. A 2,000-token prompt packed with dense architectural constraints, few-shot examples, strict schemas, and specific behavioral rules.

This creates a massive anchor in the context that practically forces the model into a specific behavioral subspace.

But a 2,000-token prompt full of vague running sentences (“Be a helpful, friendly, synergistic assistant who always puts the user first”) is actively sabotaging the prompt and just burning tokens and a little hole in your wallet and warming our planet.

Every token in the system prompt must earn its keep. The failure mode isn’t “too long,” it’s “too much noise.” Contradictory instructions, redundant phrasing, and generic filler all dilute the signal of the tokens that actually matter.

Domain context is activation anchoring.

When we paste a code file, an API schema, or a data model into the context, we are not just “giving the model information.” Its flooding the context with domain-specific tokens that bias the entire activation landscape.

This is why RAG (Retrieval-Augmented Generation) is popular. Not just because the model “reads” the retrieved documents, but because those documents’ tokens reshape the probability distribution toward domain-relevant outputs.

Temperature stacking with system prompts.

Now we can be deliberate: use a tight system prompt to sculpt the distribution, then use temperature to control variance within that sculpted space.

Tight prompt + low temp for implementation.

Tight prompt + higher temp for exploring design alternatives. Same anchor, different sampling discipline.

Mitigations

Refresh the system prompt in long conversations. when you are 30 messages deep and the model is drifting, restating the key constraints will re-anchor the model. we are injecting fresh system-prompt-like tokens closer to the model’s current attention window, boosting their influence relative to the stale tokens at the beginning.

Use spec-based development and write skills. Every modern agent supports them. A spec is a dense, structured document that front-loads context.

Skills are reusable instruction sets that get injected into the system prompt. Both are mechanisms for packing the context window with high-signal tokens that keep the model anchored to what we actually want. I wrote about this workflow in a previous post.


Same Patterns, Different Layer

At the inference layer, the mechanism is different but the shape is the same.

We write a prompt. Those tokens create activation patterns. Those patterns bias a probability distribution. Sampling selects from that distribution. The output feeds back in and the loop continues. Simple operations, iterated, producing behavior that looks like understanding.

The system prompt anchors activation space the same way training data anchors weight space: through statistical pressure on what comes next.

The patterns repeat across layers of the system. Training, architecture, inference, usage. Layers within layers across densely packed weights in the network.

This is not a deep insight. but once we see the machinery, the mystique fades. The model isn’t doing something magical when it writes good code or drifts into sycophancy.

It’s doing math on probability distributions. Understanding that makes us better at using them.

“When you hit enter”, you are querying a frozen snapshot. The model cannot learn from your prompt. Even if you use RAG or an agent to inject additional context, you are only modifying the input state, the model itself remains static, routing those new tokens through the exact same frozen circuitry.

This is why the biggest lever for making a model smarter is packing more high-signal data into the weights before the freeze. And that single fact is driving the entire AI economy we have today in 2025-2026. It’s why AI labs are scraping every corner of the internet, triggering massive copyright lawsuits from publishers and artists.

The more impactful issue today is the violently expensive infrastructure required to store and process it all. To build and run these frozen matrices, High Bandwidth Memory (HBM) for AI accelerators is currently eating the global supply of DRAM wafers. which is why a standard DDR5 kit costs roughly twice what it did a year ago.

Well, if you got this far, thanks for reading and I hope this helped, until next time!!!!


References and Further Reading

 

Find the PCI-E Slot number of PCI-E Add On card GPU, NIC, etc on Linux/Proxmox

i was working on a v-GPU POC using PVE Since Broadcom Screwed us with the Vsphere licensing costs (New post incoming about this adventure)

anyway i needed to find the PCI-E Slot used for the A4000 GPU on the host to disable it for troubleshooting

Guide

First we need to find the occupied slots and the Bus address for each slot

sudo dmidecode -t slot | grep -E "Designation|Usage|Bus Address"

Output will show the Slot ID, Usage and then the Bus Address

        Designation: CPU SLOT1 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT2 PCI-E 4.0 X8
        Current Usage: In Use
        Bus Address: 0000:41:00.0
        Designation: CPU SLOT3 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c1:00.0
        Designation: CPU SLOT4 PCI-E 4.0 X8
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT5 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:c2:00.0
        Designation: CPU SLOT6 PCI-E 4.0 X16
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: CPU SLOT7 PCI-E 4.0 X16
        Current Usage: In Use
        Bus Address: 0000:81:00.0
        Designation: PCI-E M.2-M1
        Current Usage: Available
        Bus Address: 0000:ff:00.0
        Designation: PCI-E M.2-M2
        Current Usage: Available
        Bus Address: 0000:ff:00.0

We can use lspci -s #BusAddress# to locate whats installed on each slot

lspci -s 0000:c2:00.0
c2:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)

lspci -s 0000:81:00.0
81:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1)

Im sure there is a much more elegant way to do this, but this worked as a quick ish way to find what i needed. if you know a better way please share in the comments

Until next time!!!

Reference –

https://stackoverflow.com/questions/25908782/in-linux-is-there-a-way-to-find-out-which-pci-card-is-plugged-into-which-pci-sl

Use Mailx to send emails using office 365

just something that came up while setting up a monitoring script using mailx, figured ill note it down here so i can get it to easily later when I need it 😀

Important prerequisites

  • You need to enable smtp basic Auth on Office 365 for the account used for authentication
  • Create an App password for the user account
  • nssdb folder must be available and readable by the user running the mailx command

Assuming all of the above prerequisite are $true we can proceed with the setup

Install mailx

RHEL/Alma linux

sudo dnf install mailx

NSSDB Folder

make sure the nssdb folder must be available and readable by the user running the mailx command

certutil -L -d /etc/pki/nssdb

The Output might be empty, but that’s ok; this is there if you need to add a locally signed cert or another CA cert manually, Microsoft Certs are trusted by default if you are on an up to date operating system with the local System-wide Trust Store

Reference – RHEL-sec-shared-system-certificates

Configure Mailx config file

sudo nano /etc/mail.rc

Append/prepend the following lines and Comment out or remove the same lines already defined on the existing config files

set smtp=smtp.office365.com
set smtp-auth-user=###[email protected]###
set smtp-auth-password=##Office365-App-password#
set nss-config-dir=/etc/pki/nssdb/
set ssl-verify=ignore
set smtp-use-starttls
set from="###[email protected]###"

This is the bare minimum needed other switches are located here – link

Testing

echo "Your message is sent!" | mailx -v -s "test" [email protected]

-v switch will print the verbos debug log to console

Connecting to 52.96.40.242:smtp . . . connected.
220 xxde10CA0031.outlook.office365.com Microsoft ESMTP MAIL Service ready at Sun, 6 Aug 2023 22:14:56 +0000
>>> EHLO vls-xxx.multicastbits.local
250-MN2PR10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-STARTTLS
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> STARTTLS
220 2.0.0 SMTP server ready
>>> EHLO vls-xxx.multicastbits.local
250-xxde10CA0031.outlook.office365.com Hello [167.206.57.122]
250-SIZE 157286400
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-AUTH LOGIN XOAUTH2
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250 SMTPUTF8
>>> AUTH LOGIN
334 VXNlcm5hbWU6
>>> Zxxxxxxxxxxxc0BmdC1zeXMuY29t
334 UGsxxxxxmQ6
>>> c2Rxxxxxxxxxxducw==
235 2.7.0 Authentication successful
>>> MAIL FROM:<###[email protected]###>
250 2.1.0 Sender OK
>>> RCPT TO:<[email protected]>
250 2.1.5 Recipient OK
>>> DATA
354 Start mail input; end with <CRLF>.<CRLF>
>>> .
250 2.0.0 OK <[email protected]> [Hostname=Bsxsss744.namprd11.prod.outlook.com]
>>> QUIT
221 2.0.0 Service closing transmission channel 

Now you can use this in your automation scripts or timers using the mailx command

#!/bin/bash

log_file="/etc/app/runtime.log"
recipient="[email protected]"
subject="Log file from /etc/app/runtime.log"

# Check if the log file exists
if [ ! -f "$log_file" ]; then
  echo "Error: Log file not found: $log_file"
  exit 1
fi

# Use mailx to send the log file as an attachment
echo "Sending log file..."
mailx -s "$subject" -a "$log_file" -r "[email protected]" "$recipient" < /dev/null
echo "Log file sent successfully."

Secure it

sudo chown root:root /etc/mail.rc
sudo chmod 600 /etc/mail.rc

The above commands change the file’s owner and group to root, then set the file permissions to 600, which means only the owner (root) has read and write permissions and other users have no access to the file.

Use Environment Variables: Avoid storing sensitive information like passwords directly in the mail.rc file, consider using environment variables for sensitive data and reference those variables in the configuration.

For example, in the mail.rc file, you can set:

set smtp-auth-password=$MY_EMAIL_PASSWORD

You can set the variable using another config file or store it in the Ansible vault during runtime or use something like Hashicorp.

Sure, I would just use Python or PowerShell core, but you will run into more locked-down environments like OCI-managed DB servers with only Mailx is preinstalled and the only tool you can use 🙁

the Fact that you are here means you are already in the same boat. Hope this helped… until next time

How to extend root (cs-root) Filesystem using LVM Cent OS/RHEL/Almalinux

This guide will walk you through on how to extend and increase space for the root filesystem on a alma linux. Cent OS, REHL Server/Desktop/VM

Method A – Expanding the current disk

Edit the VM and Add space to the Disk

install the cloud-utils-growpart package, as the growpart command in it makes it really easy to extend partitioned virtual disks.

sudo dnf install cloud-utils-growpart

Verify that the VM’s operating system recognizes the new increased size of the sda virtual disk, using lsblk or fdisk -l

sudo fdisk -l
Notes -
Note down the disk id and the partition number for Linux LVM - in this demo disk id is sda and lvm partition is sda 3

lets trigger a rescan of a block devices (Disks)

#elevate to root
sudo su 

#trigger a rescan, Make sure to match the disk ID you noted down before 
echo 1 > /sys/block/sda/device/rescan
exit

Now sudo fdisk -l shows the correct size of the disks

Use growpart to increase the partition size for the lvm

sudo growpart /dev/sda 3

Confirm the volume group name

sudo vgs

Extend the logical volume

sudo lvextend -l +100%FREE /dev/almalinux/root

Grow the file system size

sudo xfs_growfs /dev/almalinux/root
Notes -
You can use this same steps to add space to different partitions such as home, swap if needed

Method B -Adding a second Disk to the LVM and expanding space

Why add a second disk?
may be the the current Disk is locked due to a snapshot and you cant remove it, Only solution would be to add a second disk/

Check the current space available

sudo df -h 
Notes -
If you have 0% ~1MB left on the cs-root command auto-complete with tab and some of the later commands wont work, You should clear up atleast 4-10mb by clearing log files, temp files, etc

Mount an additional disk to the VM (Assuming this is a VM) and make sure the disk is visible on the OS level

sudo lvmdiskscan

OR

sudo fdisk -l

Confirm the volume group name

sudo vgs

Lets increase the space

First lets initialize the new disk we mounted

sudo mkfs.xfs /dev/sdb

Create the Physical volume

sudo pvcreate /dev/sdb

extend the volume group

sudo vgextend cs /dev/sdb
  Volume group "cs" successfully extended


Extend the logical volume

sudo lvextend -l +100%FREE /dev/cs/root

Grow the file system size

sudo xfs_growfs /dev/cs/root

Confirm the changes

sudo df -h

Just making easy for us!!

#Method A - Expanding the current disk 
#AlmaLinux
sudo dnf install cloud-utils-growpart

sudo lvmdiskscan
sudo fdisk -l                          #note down the disk ID and partition num


sudo su                                #elevate to root
echo 1 > /sys/block/sda/device/rescan  #trigger a rescan
exit                                   #exit root shell

sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

#Method B - Adding a second Disk 
#CentOS

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend cs /dev/sdb
sudo lvextend -l +100%FREE /dev/cs/root
sudo xfs_growfs /dev/cs/root
sudo df -h

#AlmaLinux

sudo lvmdiskscan
sudo fdisk -l
sudo vgs
sudo mkfs.xfs /dev/sdb
sudo pvcreate /dev/sdb
sudo vgextend almalinux /dev/sdb
sudo lvextend -l +100%FREE /dev/almalinux/root
sudo xfs_growfs /dev/almalinux/root
sudo df -h

Change the location of the Docker overlay2 storage directory

If you found this page you already know why you are looking for this, your server /dev/mapper/cs-root is filled due to /var/lib/docker taking up most of the space

Yes, you can change the location of the Docker overlay2 storage directory by modifying the daemon.json file. Here’s how to do it:

Open or create the daemon.json file using a text editor:

sudo nano /etc/docker/daemon.json

{
    "data-root": "/path/to/new/location/docker"
}

Replace “/path/to/new/location/docker” with the path to the new location of the overlay2 directory.

If the file already contains other configuration settings, add the "data-root" setting to the file under the "storage-driver" setting:

{
    "storage-driver": "overlay2",
    "data-root": "/path/to/new/location/docker"
}

Save the file and Restart docker

sudo systemctl restart docker

Don’t forget to remove the old data

rm -rf /var/lib/docker/overlay2

“System logs on hosts are stored on non-persistent storage” message on VCenter

Ran into this pesky little error message recently, on a vcenter environment

If the logs are stored on a local scratch disk, vCenter will display an alert stating –  “System logs on host xxx are stored on non-persistent storage”

Configure ESXi Syslog location – vSphere Web Client

Vcenter > Select “Host”> Configure > Advance System Settings

Click on Edit and search for “Syslog.global.logDir”

Edit the value and in this case, I’m going to use the local data store (Localhost_DataStore01) to store the syslogs.

You can also define a remote syslog server using the “Syslog.global.LogHost” setting

Configure ESXi Syslog location – ESXCLI

Ssh on to the host

Check the current location

esxcli system syslog config get

*logs stored on the local scratch disk

Manually Set the Path

esxcli system syslog config set –logdir=/vmfs/directory/path

you can find the VMFS volume names/UUIDs under  –

/vmfs/volumes

remote syslog server can be set using

esxcli system syslog config set –loghost=’tcp://hostname:port’

Load the configuration changes with the syslog reload command

esxcli system syslog reload

The logs will immediately begin populating the specified location.

Unable to upgrade vCenter 6.5/6.7 to U2: Root password expired

As a Part of my pre-flight check for Vcenter upgrades i like to mount the ISO and go through the first 3 steps, during this I noticed the installer cannot connect to the source appliance with this error 

2019-05-01T20:05:02.052Z - info: Stream :: close
2019-05-01T20:05:02.052Z - info: Password not expired
2019-05-01T20:05:02.054Z - error: sourcePrecheck: error in getting source Info: ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.
2019-05-01T20:05:03.328Z - error: Request timed out after 30000 ms, url: https://vcenter.companyABC.local:443/
2019-05-01T20:05:09.675Z - info: Log file was saved at: C:\Users\MCbits\Desktop\installer-20190501-160025555.log

trying to reset via the admin interface or the DCUI didn’t work,  after digging around found a way to reset it by forcing the vcenter to boot in to single user mode

Procedure:

  1. Take a snapshot or backup of the vCenter Server Appliance before proceeding. Do not skip this step.
  2. Reboot the vCenter Server Appliance.
  3. After the OS starts, press e key to enter the GNU GRUB Edit Menu.
  4. Locate the line that begins with the word Linux.
  5. Append these entries to the end of the line: rw init=/bin/bash The line should look like the following screenshot:

After adding the statement, press F10 to continue booting 

Vcenter appliance will boot into single user mode

Type passwd to reset the root password

if you run into the following error message

"Authentication token lock busy"

you need to re-mount the filesystem in RW, which lets you change between read-only and read-write. this will allow you to make changes

mount -o remount,rw /

Until next time !!!

 

MS Exchange 2016 [ERROR] Cannot find path ‘..\Exchange_Server_V15\UnifiedMessaging\grammars’ because it does not exist.


So recently I ran into this annoying error message with Exchange 2016 CU11 Update.

Environment info-

  • Exchange 2016 upgrade from CU8 to CU11
  • Exchange binaries are installed under D:\Microsoft\Exchange_Server_V15\..
Microsoft.PowerShell.Commands.GetItemCommand.ProcessRecord()". [12/04/2018 16:41:43.0233] [1] [ERROR] Cannot find path 'D:\Microsoft\Exchange_Server_V15\UnifiedMessaging\grammars' because it does not exist. 
[12/04/2018 16:41:43.0233] [1] [ERROR-REFERENCE] Id=UnifiedMessagingComponent___99d8be02cb8d413eafc6ff15e437e13d Component=EXCHANGE14:\Current\Release\Shared\Datacenter\Setup
[12/04/2018 16:41:43.0234] [1] Setup is stopping now because of one or more critical errors. [12/04/2018 16:41:43.0234] [1] Finished executing component tasks.
[12/04/2018 16:41:43.0318] [1] Ending processing Install-UnifiedMessagingRole
[12/04/2018 16:44:51.0116] [0] CurrentResult setupbase.maincore:396: 0 [12/04/2018 16:44:51.0118] [0] End of Setup
[12/04/2018 16:44:51.0118] [0] **********************************************

Root Cause

Ran the Setup again and it failed with the same error
while going though the log files i notice that the setup looks for this file path while configuring the "Mailbox role: Unified Messaging service" (Stage 6 on the GUI installer)

$grammarPath = join-path $RoleInstallPath "UnifiedMessaging\grammars\*";

There was no folder present with the name grammars under the Path specified on the error

just to confirm, i checked another server on CU8 and the grammars folder is there.

Not sure why the folder got removed, it may have happened during the first run of the CU11 setup that failed,

Resolution

My first thought was to copy the folder from an existing CU8 server. but just to avoid any issues (since exchange is sensitive to file versions)
I created an empty folder with the name "grammars" under D:\Microsoft\Exchange_Server_V15\UnifiedMessaging\




Ran the setup again and it continued the upgrade process and completed without any issues...¯\_(ツ)_/¯











[12/04/2018 18:07:50.0416] [2] Ending processing Set-ServerComponentState
[12/04/2018 18:07:50.0417] [2] Beginning processing Write-ExchangeSetupLog
[12/04/2018 18:07:50.0420] [2] Install is complete. Server state has been set to Active.
[12/04/2018 18:07:50.0421] [2] Ending processing Write-ExchangeSetupLog
[12/04/2018 18:07:50.0422] [1] Finished executing component tasks.
[12/04/2018 18:07:50.0429] [1] Ending processing Start-PostSetup
[12/04/2018 18:07:50.0524] [0] CurrentResult setupbase.maincore:396: 0
[12/04/2018 18:07:50.0525] [0] End of Setup
[12/04/2018 18:07:50.0525] [0] **********************************************

Considering cost of this software M$ really have to be better about error handling IMO, i have run in to silly issues like this way too many times since Exchange 2010.