Ein LLM ist ein Sprachmodell, das mit grossen Textmengen trainiert wurde, um Inhalte zu erzeugen, zusammenzufassen und zu transformieren.
# Tiny toy language model with bigram counts
text = "ai helps teams make better decisions with data"
words = text.split()
bigrams = {}
for i in range(len(words) - 1):
pair = (words[i], words[i + 1])
bigrams[pair] = bigrams.get(pair, 0) + 1
print("Bigrams:")
for pair, count in sorted(bigrams.items()):
print(f"{pair}: {count}")
Schritt fur Schritt
Split text into tokens.
Count each adjacent token pair.
Use counts to estimate next-word likelihood.
This is the conceptual bridge to autoregressive LLM decoding.
Sehr kleines Sprachmodell (Bigramme), um Tokens, Logits und Wahrscheinlichkeiten zu lehren.
# Tiny LLM teaching model (char-level bigram logits)
try:
import torch
except Exception as e:
print("PyTorch not available:", e)
else:
text = "hello llm"
vocab = sorted(set(text))
stoi = {c: i for i, c in enumerate(vocab)}
itos = {i: c for c, i in stoi.items()}
ids = torch.tensor([stoi[c] for c in text], dtype=torch.long)
model = torch.nn.Embedding(len(vocab), len(vocab))
logits = model(ids)
probs = torch.softmax(logits[-1], dim=-1)
top_p, top_i = torch.topk(probs, k=min(3, len(vocab)))
print("Vocab:", vocab)
print("Last input token:", repr(itos[int(ids[-1])]))
print("Top next-token probabilities:")
for p, i in zip(top_p, top_i):
print(itos[int(i)], round(float(p), 4))
Schritt fur Schritt
Build a character vocabulary and token IDs.
Use embedding table as learnable logits producer.
Apply softmax and read top-k next-token probabilities.
Mathematik dahinter
p(next) = softmax(logits_last_token)
Erwartete Ausgabe:
Vocab: [...]
Last input token: ...
Top next-token probabilities: 3 Token-Wahrscheinlichkeitszeilen
RAG (Retrieval-Augmented Generation)
RAG kombiniert Retrieval + Generation fur prazisere und auditierbare Antworten.
# Tiny RAG demo (keyword overlap retrieval + prompt assembly)
docs = [
"LGPD requires lawful basis, consent management, and data subject rights.",
"Dedicated AI servers provide tenant isolation and private GPU workloads.",
"RAG improves answer grounding by injecting retrieved context into prompts.",
]
query = "How to run AI servers with LGPD compliance?"
q_terms = set(w.strip('.,!?').lower() for w in query.split())
scored = []
for d in docs:
d_terms = set(w.strip('.,!?').lower() for w in d.split())
score = len(q_terms & d_terms)
scored.append((score, d))
scored.sort(reverse=True, key=lambda x: x[0])
top_doc = scored[0][1]
prompt = f"Question: {query}\nContext: {top_doc}\nAnswer:"
print("Query:", query)
print("Top document:", top_doc)
print("Prompt preview:", prompt[:120] + "...")
Schritt fur Schritt
1) Indexieren Sie eine kleine Wissensbasis (Dokumente).
2) Holen Sie den besten Kontext fur die Anfrage.
3) Bauen Sie den finalen Prompt mit Frage + abgerufenem Kontext.