An LLM is a language model trained on large text corpora to generate, summarize, and transform content.
# Tiny toy language model with bigram counts
text = "ai helps teams make better decisions with data"
words = text.split()
bigrams = {}
for i in range(len(words) - 1):
pair = (words[i], words[i + 1])
bigrams[pair] = bigrams.get(pair, 0) + 1
print("Bigrams:")
for pair, count in sorted(bigrams.items()):
print(f"{pair}: {count}")
Step by step
Split text into tokens.
Count each adjacent token pair.
Use counts to estimate next-word likelihood.
This is the conceptual bridge to autoregressive LLM decoding.
Simple example with linear layers to inspect input and output tensors.
# Minimal PyTorch forward pass
try:
import torch
except Exception as e:
print("PyTorch not available:", e)
else:
torch.manual_seed(0)
x = torch.randn(4, 3)
model = torch.nn.Sequential(
torch.nn.Linear(3, 4),
torch.nn.ReLU(),
torch.nn.Linear(4, 1),
)
y = model(x)
print("Input shape:", tuple(x.shape))
print("Output shape:", tuple(y.shape))
print("First output:", float(y[0, 0]))
Step by step
Create input tensor x with shape [batch, features].
Apply linear layer, non-linearity, then output layer.
Inspect output shape and first prediction value.
Math behind
y = W2 * ReLU(W1 * x + b1) + b2
Expected output:
Input shape: (4, 3)
Output shape: (4, 1)
First output: numeric value
TensorFlow (minimal Keras)
Simple Sequential model example to understand batch inference.
# Minimal TensorFlow/Keras forward pass
try:
import tensorflow as tf
except Exception as e:
print("TensorFlow not available:", e)
else:
tf.random.set_seed(0)
x = tf.random.normal((4, 3))
model = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(1),
])
y = model(x)
print("Input shape:", x.shape)
print("Output shape:", y.shape)
print("First output:", float(y[0, 0]))
Step by step
Build a Sequential model with Dense layers.
Feed batch tensor and run forward inference.
Read tensor shape and sample output.
Math behind
Dense(x) = activation(x * W + b)
Expected output:
Input shape: (4, 3)
Output shape: (4, 1)
First output: numeric value
Tiny teaching LLM
Very small language model (bigrams) to teach tokens, logits, and probabilities.
# Tiny LLM teaching model (char-level bigram logits)
try:
import torch
except Exception as e:
print("PyTorch not available:", e)
else:
text = "hello llm"
vocab = sorted(set(text))
stoi = {c: i for i, c in enumerate(vocab)}
itos = {i: c for c, i in stoi.items()}
ids = torch.tensor([stoi[c] for c in text], dtype=torch.long)
model = torch.nn.Embedding(len(vocab), len(vocab))
logits = model(ids)
probs = torch.softmax(logits[-1], dim=-1)
top_p, top_i = torch.topk(probs, k=min(3, len(vocab)))
print("Vocab:", vocab)
print("Last input token:", repr(itos[int(ids[-1])]))
print("Top next-token probabilities:")
for p, i in zip(top_p, top_i):
print(itos[int(i)], round(float(p), 4))
Step by step
Build a character vocabulary and token IDs.
Use embedding table as learnable logits producer.
Apply softmax and read top-k next-token probabilities.
Math behind
p(next) = softmax(logits_last_token)
Expected output:
Vocab: [...]
Last input token: ...
Top next-token probabilities: three token-probability lines
RAG (Retrieval-Augmented Generation)
RAG combines retrieval + generation to produce more grounded and auditable answers.
# Tiny RAG demo (keyword overlap retrieval + prompt assembly)
docs = [
"LGPD requires lawful basis, consent management, and data subject rights.",
"Dedicated AI servers provide tenant isolation and private GPU workloads.",
"RAG improves answer grounding by injecting retrieved context into prompts.",
]
query = "How to run AI servers with LGPD compliance?"
q_terms = set(w.strip('.,!?').lower() for w in query.split())
scored = []
for d in docs:
d_terms = set(w.strip('.,!?').lower() for w in d.split())
score = len(q_terms & d_terms)
scored.append((score, d))
scored.sort(reverse=True, key=lambda x: x[0])
top_doc = scored[0][1]
prompt = f"Question: {query}\nContext: {top_doc}\nAnswer:"
print("Query:", query)
print("Top document:", top_doc)
print("Prompt preview:", prompt[:120] + "...")
Step by step
1) Index a small knowledge base (docs).
2) Retrieve the best context for the query.
3) Build the final prompt with question + retrieved context.