Text Generation: Deterministic Methods

Overview

When language models predict the next token, they output probability distributions over the entire vocabulary. But how do we convert these probabilities into actual text? This lesson explores deterministic generation methods—approaches that make predictable, reproducible choices about which tokens to select from transformer-based language models.

Understanding these foundational methods is crucial before we explore more creative sampling techniques. Think of this as learning to walk before we run: mastering predictable text generation gives us the foundation to understand when and why we might want to introduce randomness in language model outputs.

Learning Objectives

After completing this lesson, you will be able to:

Understand the fundamental challenge of converting probabilities to text
Implement and explain greedy search generation
Implement and explain beam search generation
Compare the trade-offs between different deterministic approaches
Choose the right method for specific use cases
Debug common issues in deterministic generation

The Core Challenge: From Probabilities to Words

The Decision Point

Every time a language model generates text, it faces the same fundamental challenge at each step:

Model output: [0.4, 0.3, 0.15, 0.1, 0.05, ...]
Vocabulary:   ["the", "a", "an", "this", "that", ...]
Decision:     Which word do we actually choose?

Mental Model: The Path Through a Forest

Imagine text generation as walking through a dense forest where every step forward reveals multiple possible paths:

Starting point: Your prompt (the trailhead)
Each step: Choosing the next word
Path options: All possible next words, each with a "difficulty score" (inverse of probability)
Goal: Reach a meaningful destination (complete text)

Different generation strategies represent different hiking philosophies:

Greedy: Always take the easiest visible path
Beam Search: Scout ahead on multiple promising paths, then choose the best overall route

Greedy Search: The Simplest Approach

Core Concept

Greedy search always chooses the most probable next token. It's called "greedy" because it makes locally optimal choices without considering future consequences.

Algorithm in Plain English:

Look at all possible next words
Pick the one with the highest probability
Add it to your text
Repeat until you decide to stop

Visualization: Greedy Path Selection

Algorithm Walkthrough

Let's trace through beam search step by step:

Step 1: Start with prompt

Prompt: "The future of AI"
Beams: ["The future of AI"]

Step 2: Generate first token

Top candidates: "is", "will", "depends"
Beams: ["The future of AI is", "The future of AI will", "The future of AI depends"]

Step 3: Generate second token (from each beam)

From "...is": "bright", "uncertain", "promising"
From "...will": "be", "depend", "involve"  
From "...depends": "on", "heavily", "largely"

Keep top 3 overall:
1. "The future of AI is bright"
2. "The future of AI will be"
3. "The future of AI depends on"

Python Implementation

def beam_search(model, tokenizer, prompt, beam_width=3, max_length=20):
    """
    Simplified beam search implementation for learning purposes.
    
    Args:
        beam_width: Number of sequences to track simultaneously
        max_length: Maximum tokens to generate
    """
    # Start with the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    
    # Each beam: (token_sequence, cumulative_score)
    beams = [(input_ids[0].tolist(), 0.0)]
    
    for step in range(max_length):
        all_candidates = []
        
        # For each current beam, generate next token candidates
        for tokens, score in beams:
            # Get model predictions for next token
            outputs = model(input_ids=torch.tensor([tokens]))
            probs = torch.softmax(outputs.logits[0, -1, :], dim=-1)
            
            # Get top candidates for this beam
            top_probs, top_indices = torch.topk(probs, beam_width)
            
            # Create new candidate sequences
            for prob, token_id in zip(top_probs, top_indices):
                new_tokens = tokens + [token_id.item()]
                new_score = score + torch.log(prob).item()
                all_candidates.append((new_tokens, new_score))
        
        # Keep only the best beam_width candidates overall
        all_candidates.sort(key=lambda x: x[1], reverse=True)
        beams = all_candidates[:beam_width]
    
    # Return the best sequence
    best_tokens, _ = beams[0]
    return tokenizer.decode(best_tokens)

# Example usage
result = beam_search(model, tokenizer, "The future of AI", beam_width=3)
print(result)

Beam Search Advantages

✅ Better than greedy because:

Avoids local optima: Considers multiple paths before deciding
Higher quality output: Often more coherent and fluent
Still deterministic: Same input always gives same output
Configurable exploration: Adjust beam width for your needs

Comparison example:

Prompt: "The solution to climate change requires"

Greedy: "a global effort to reduce carbon emissions."
Beam (width=5): "coordinated international cooperation, technological innovation, and fundamental changes in how we produce and consume energy."

Beam Search Limitations

❌ Still has issues:

Computational cost: 5x more expensive than greedy (with beam width 5)
Generic output: Tends toward "safe" but boring completions
Length bias: Favors shorter sequences (more tokens = lower probability). Mitigate with length penalty > 1.0.
Repetition: Can still repeat n‑grams. Mitigate with no‑repeat‑ngram-size.
Still deterministic: No creativity or surprise

Beam Width Selection Guide

Task Type	Recommended Beam Width	Reasoning
Translation	4-6	Balance quality and computation
Summarization	3-5	Want coherent but not too generic
Question Answering	1-3	Usually one correct answer
Creative Writing	Not recommended	Too conservative
Code Generation	2-4	Syntax correctness important

NLP Fundamentals: Core Concepts and Architectures