Text Generation: Deterministic Methods

Overview

When language models predict the next token, they output probability distributions over the entire vocabulary. But how do we convert these probabilities into actual text? This lesson explores deterministic generation methods—approaches that make predictable, reproducible choices about which tokens to select from transformer-based language models.

Understanding these foundational methods is crucial before we explore more creative sampling techniques. Think of this as learning to walk before we run: mastering predictable text generation gives us the foundation to understand when and why we might want to introduce randomness in language model outputs.

Learning Objectives

After completing this lesson, you will be able to:

  • Understand the fundamental challenge of converting probabilities to text
  • Implement and explain greedy search generation
  • Implement and explain beam search generation
  • Compare the trade-offs between different deterministic approaches
  • Choose the right method for specific use cases
  • Debug common issues in deterministic generation

The Core Challenge: From Probabilities to Words

The Decision Point

Every time a language model generates text, it faces the same fundamental challenge at each step:

Model output: [0.4, 0.3, 0.15, 0.1, 0.05, ...] Vocabulary: ["the", "a", "an", "this", "that", ...] Decision: Which word do we actually choose?

Mental Model: The Path Through a Forest

Imagine text generation as walking through a dense forest where every step forward reveals multiple possible paths:

  • Starting point: Your prompt (the trailhead)
  • Each step: Choosing the next word
  • Path options: All possible next words, each with a "difficulty score" (inverse of probability)
  • Goal: Reach a meaningful destination (complete text)

Different generation strategies represent different hiking philosophies:

  • Greedy: Always take the easiest visible path
  • Beam Search: Scout ahead on multiple promising paths, then choose the best overall route

Greedy Search: The Simplest Approach

Core Concept

Greedy search always chooses the most probable next token. It's called "greedy" because it makes locally optimal choices without considering future consequences.

Algorithm in Plain English:

  1. Look at all possible next words
  2. Pick the one with the highest probability
  3. Add it to your text
  4. Repeat until you decide to stop

Visualization: Greedy Path Selection

Loading tool...

Algorithm Walkthrough

Let's trace through beam search step by step:

Step 1: Start with prompt

Prompt: "The future of AI" Beams: ["The future of AI"]

Step 2: Generate first token

Top candidates: "is", "will", "depends" Beams: ["The future of AI is", "The future of AI will", "The future of AI depends"]

Step 3: Generate second token (from each beam)

From "...is": "bright", "uncertain", "promising" From "...will": "be", "depend", "involve" From "...depends": "on", "heavily", "largely" Keep top 3 overall: 1. "The future of AI is bright" 2. "The future of AI will be" 3. "The future of AI depends on"

Python Implementation

def beam_search(model, tokenizer, prompt, beam_width=3, max_length=20): """ Simplified beam search implementation for learning purposes. Args: beam_width: Number of sequences to track simultaneously max_length: Maximum tokens to generate """ # Start with the prompt input_ids = tokenizer.encode(prompt, return_tensors="pt") # Each beam: (token_sequence, cumulative_score) beams = [(input_ids[0].tolist(), 0.0)] for step in range(max_length): all_candidates = [] # For each current beam, generate next token candidates for tokens, score in beams: # Get model predictions for next token outputs = model(input_ids=torch.tensor([tokens])) probs = torch.softmax(outputs.logits[0, -1, :], dim=-1) # Get top candidates for this beam top_probs, top_indices = torch.topk(probs, beam_width) # Create new candidate sequences for prob, token_id in zip(top_probs, top_indices): new_tokens = tokens + [token_id.item()] new_score = score + torch.log(prob).item() all_candidates.append((new_tokens, new_score)) # Keep only the best beam_width candidates overall all_candidates.sort(key=lambda x: x[1], reverse=True) beams = all_candidates[:beam_width] # Return the best sequence best_tokens, _ = beams[0] return tokenizer.decode(best_tokens) # Example usage result = beam_search(model, tokenizer, "The future of AI", beam_width=3) print(result)

Beam Search Advantages

✅ Better than greedy because:

  • Avoids local optima: Considers multiple paths before deciding
  • Higher quality output: Often more coherent and fluent
  • Still deterministic: Same input always gives same output
  • Configurable exploration: Adjust beam width for your needs

Comparison example:

Prompt: "The solution to climate change requires" Greedy: "a global effort to reduce carbon emissions." Beam (width=5): "coordinated international cooperation, technological innovation, and fundamental changes in how we produce and consume energy."

Beam Search Limitations

❌ Still has issues:

  • Computational cost: 5x more expensive than greedy (with beam width 5)
  • Generic output: Tends toward "safe" but boring completions
  • Length bias: Favors shorter sequences (more tokens = lower probability). Mitigate with length penalty > 1.0.
  • Repetition: Can still repeat n‑grams. Mitigate with no‑repeat‑ngram-size.
  • Still deterministic: No creativity or surprise

Beam Width Selection Guide

Task TypeRecommended Beam WidthReasoning
Translation4-6Balance quality and computation
Summarization3-5Want coherent but not too generic
Question Answering1-3Usually one correct answer
Creative WritingNot recommendedToo conservative
Code Generation2-4Syntax correctness important

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Loading tool...