Um LLM é um modelo de linguagem treinado em grande volume de texto para gerar, resumir e transformar conteúdo.
# Tiny toy language model with bigram counts
text = "ai helps teams make better decisions with data"
words = text.split()
bigrams = {}
for i in range(len(words) - 1):
pair = (words[i], words[i + 1])
bigrams[pair] = bigrams.get(pair, 0) + 1
print("Bigrams:")
for pair, count in sorted(bigrams.items()):
print(f"{pair}: {count}")
Passo a passo
Split text into tokens.
Count each adjacent token pair.
Use counts to estimate next-word likelihood.
This is the conceptual bridge to autoregressive LLM decoding.
Exemplo simples com camadas lineares para ver tensor de entrada e saida.
# Minimal PyTorch forward pass
try:
import torch
except Exception as e:
print("PyTorch not available:", e)
else:
torch.manual_seed(0)
x = torch.randn(4, 3)
model = torch.nn.Sequential(
torch.nn.Linear(3, 4),
torch.nn.ReLU(),
torch.nn.Linear(4, 1),
)
y = model(x)
print("Input shape:", tuple(x.shape))
print("Output shape:", tuple(y.shape))
print("First output:", float(y[0, 0]))
Passo a passo
Create input tensor x with shape [batch, features].
Apply linear layer, non-linearity, then output layer.
Inspect output shape and first prediction value.
Matematica por tras
y = W2 * ReLU(W1 * x + b1) + b2
Saida esperada:
Input shape: (4, 3)
Output shape: (4, 1)
First output: valor numerico
TensorFlow (Keras minima)
Exemplo simples com modelo Sequential para entender inferencia em lote.
# Minimal TensorFlow/Keras forward pass
try:
import tensorflow as tf
except Exception as e:
print("TensorFlow not available:", e)
else:
tf.random.set_seed(0)
x = tf.random.normal((4, 3))
model = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(1),
])
y = model(x)
print("Input shape:", x.shape)
print("Output shape:", y.shape)
print("First output:", float(y[0, 0]))
Passo a passo
Build a Sequential model with Dense layers.
Feed batch tensor and run forward inference.
Read tensor shape and sample output.
Matematica por tras
Dense(x) = activation(x * W + b)
Saida esperada:
Input shape: (4, 3)
Output shape: (4, 1)
First output: valor numerico
Mini LLM pedagogico
Modelo de linguagem bem pequeno (bigramas) para ensinar token, logits e probabilidades.
# Tiny LLM teaching model (char-level bigram logits)
try:
import torch
except Exception as e:
print("PyTorch not available:", e)
else:
text = "hello llm"
vocab = sorted(set(text))
stoi = {c: i for i, c in enumerate(vocab)}
itos = {i: c for c, i in stoi.items()}
ids = torch.tensor([stoi[c] for c in text], dtype=torch.long)
model = torch.nn.Embedding(len(vocab), len(vocab))
logits = model(ids)
probs = torch.softmax(logits[-1], dim=-1)
top_p, top_i = torch.topk(probs, k=min(3, len(vocab)))
print("Vocab:", vocab)
print("Last input token:", repr(itos[int(ids[-1])]))
print("Top next-token probabilities:")
for p, i in zip(top_p, top_i):
print(itos[int(i)], round(float(p), 4))
Passo a passo
Build a character vocabulary and token IDs.
Use embedding table as learnable logits producer.
Apply softmax and read top-k next-token probabilities.
Matematica por tras
p(next) = softmax(logits_last_token)
Saida esperada:
Vocab: [...]
Last input token: ...
Top next-token probabilities: 3 linhas token-probabilidade
RAG (Retrieval-Augmented Generation)
RAG combina busca de conhecimento + geracao para respostas mais precisas e auditaveis.
# Tiny RAG demo (keyword overlap retrieval + prompt assembly)
docs = [
"LGPD requires lawful basis, consent management, and data subject rights.",
"Dedicated AI servers provide tenant isolation and private GPU workloads.",
"RAG improves answer grounding by injecting retrieved context into prompts.",
]
query = "How to run AI servers with LGPD compliance?"
q_terms = set(w.strip('.,!?').lower() for w in query.split())
scored = []
for d in docs:
d_terms = set(w.strip('.,!?').lower() for w in d.split())
score = len(q_terms & d_terms)
scored.append((score, d))
scored.sort(reverse=True, key=lambda x: x[0])
top_doc = scored[0][1]
prompt = f"Question: {query}\nContext: {top_doc}\nAnswer:"
print("Query:", query)
print("Top document:", top_doc)
print("Prompt preview:", prompt[:120] + "...")
Passo a passo
1) Indexe uma pequena base de conhecimento (docs).
2) Recupere o melhor contexto para a pergunta.
3) Monte o prompt final com pergunta + contexto recuperado.