Text Generation: Deterministic Methods

Overview

When language models predict the next token, they output probability distributions over the entire vocabulary. But how do we convert these probabilities into actual text? This lesson explores deterministic generation methods—approaches that make predictable, reproducible choices about which tokens to select from transformer-based language models.

Understanding these foundational methods is crucial before we explore more creative sampling techniques. Think of this as learning to walk before we run: mastering predictable text generation gives us the foundation to understand when and why we might want to introduce randomness in language model outputs.

Learning Objectives

After completing this lesson, you will be able to:

Understand the fundamental challenge of converting probabilities to text
Implement and explain greedy search generation
Implement and explain beam search generation
Compare the trade-offs between different deterministic approaches
Choose the right method for specific use cases
Debug common issues in deterministic generation

The Core Challenge: From Probabilities to Words

The Decision Point

Every time a language model generates text, it faces the same fundamental challenge at each step:

Model output: [0.4, 0.3, 0.15, 0.1, 0.05, ...]
Vocabulary:   ["the", "a", "an", "this", "that", ...]
Decision:     Which word do we actually choose?

Mental Model: The Path Through a Forest

Imagine text generation as walking through a dense forest where every step forward reveals multiple possible paths:

Starting point: Your prompt (the trailhead)
Each step: Choosing the next word
Path options: All possible next words, each with a "difficulty score" (inverse of probability)
Goal: Reach a meaningful destination (complete text)

Different generation strategies represent different hiking philosophies:

Greedy: Always take the easiest visible path
Beam Search: Scout ahead on multiple promising paths, then choose the best overall route

Greedy Search: The Simplest Approach

Core Concept

Greedy search always chooses the most probable next token. It's called "greedy" because it makes locally optimal choices without considering future consequences.

Algorithm in Plain English:

Look at all possible next words
Pick the one with the highest probability
Add it to your text
Repeat until you decide to stop

Visualization: Greedy Path Selection

Loading interactive component...

Python Implementation

def greedy_search(model, tokenizer, prompt, max_length=50):
    """
    Generate text using greedy search (always pick most likely token).
    
    Args:
        model: The language model
        tokenizer: Tokenizer for encoding/decoding
        prompt: Starting text
        max_length: Maximum tokens to generate
    """
    # Convert prompt to token IDs
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    generated = input_ids[0].tolist()
    
    for step in range(max_length):
        # Get model predictions for next token
        outputs = model(input_ids=torch.tensor([generated]))
        next_token_logits = outputs.logits[0, -1, :]  # Last position logits
        
        # Choose token with highest probability
        next_token_id = torch.argmax(next_token_logits).item()
        
        # Add chosen token to sequence
        generated.append(next_token_id)
        
        # Stop if we generate end-of-sequence token
        if next_token_id == tokenizer.eos_token_id:
            break
    
    return tokenizer.decode(generated)

# Example usage
prompt = "The most important breakthrough in AI is"
result = greedy_search(model, tokenizer, prompt)
print(f"Input: {prompt}")
print(f"Output: {result}")

When Greedy Search Works Well

✅ Excellent for:

Factual question answering: "What is the capital of France?" → "Paris"
Short completions: Where there's usually one best answer
Translation of common phrases: Well-established translations
Code completion: Where syntax correctness matters most

Example outputs where greedy excels:

Input: "The chemical symbol for water is"
Greedy: "H2O, consisting of two hydrogen atoms and one oxygen atom."
✅ Perfect! Factual and correct.

Input: "To install numpy, run"
Greedy: "pip install numpy"
✅ Exactly what we want!

When Greedy Search Fails

❌ Problems arise with:

Creative writing: Produces boring, predictable text
Long text generation: Gets stuck in repetitive loops
Open-ended questions: Gives generic responses

Example where greedy fails:

Input: "Once upon a time, in a magical forest"
Greedy: "there was a little girl who was walking through the forest. She was walking through the forest when she saw a little girl walking through the forest..."
❌ Repetitive and boring!

The Repetition Problem

Greedy search often gets trapped in loops because:

Common words have high probability
Once generated, they increase probability of similar patterns
No mechanism to avoid repetition

Demonstration:

# This often happens with greedy search
prompt = "I think that"
# Greedy might produce: "I think that I think that I think that..."

Beam Search: Exploring Multiple Paths

Core Concept

Beam search improves on greedy search by considering multiple possible sequences simultaneously. Instead of committing to one path, it keeps track of the top-K most promising sequences (called "beams").

Key Innovation: Look ahead before making final decisions.

The Beam Width Parameter

Beam width = 1: Equivalent to greedy search
Beam width = 3: Keep track of 3 best sequences
Beam width = 5: Keep track of 5 best sequences
Higher beam width: More exploration, more computation

Visualization: Beam Search Tree

Loading interactive component...

Algorithm Walkthrough

Let's trace through beam search step by step:

Step 1: Start with prompt

Prompt: "The future of AI"
Beams: ["The future of AI"]

Step 2: Generate first token

Top candidates: "is", "will", "depends"
Beams: ["The future of AI is", "The future of AI will", "The future of AI depends"]

Step 3: Generate second token (from each beam)

From "...is": "bright", "uncertain", "promising"
From "...will": "be", "depend", "involve"  
From "...depends": "on", "heavily", "largely"

Keep top 3 overall:
1. "The future of AI is bright"
2. "The future of AI will be"
3. "The future of AI depends on"

Python Implementation

def beam_search(model, tokenizer, prompt, beam_width=3, max_length=20):
    """
    Simplified beam search implementation for learning purposes.
    
    Args:
        beam_width: Number of sequences to track simultaneously
        max_length: Maximum tokens to generate
    """
    # Start with the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    
    # Each beam: (token_sequence, cumulative_score)
    beams = [(input_ids[0].tolist(), 0.0)]
    
    for step in range(max_length):
        all_candidates = []
        
        # For each current beam, generate next token candidates
        for tokens, score in beams:
            # Get model predictions for next token
            outputs = model(input_ids=torch.tensor([tokens]))
            probs = torch.softmax(outputs.logits[0, -1, :], dim=-1)
            
            # Get top candidates for this beam
            top_probs, top_indices = torch.topk(probs, beam_width)
            
            # Create new candidate sequences
            for prob, token_id in zip(top_probs, top_indices):
                new_tokens = tokens + [token_id.item()]
                new_score = score + torch.log(prob).item()
                all_candidates.append((new_tokens, new_score))
        
        # Keep only the best beam_width candidates overall
        all_candidates.sort(key=lambda x: x[1], reverse=True)
        beams = all_candidates[:beam_width]
    
    # Return the best sequence
    best_tokens, _ = beams[0]
    return tokenizer.decode(best_tokens)

# Example usage
result = beam_search(model, tokenizer, "The future of AI", beam_width=3)
print(result)

Beam Search Advantages

✅ Better than greedy because:

Avoids local optima: Considers multiple paths before deciding
Higher quality output: Often more coherent and fluent
Still deterministic: Same input always gives same output
Configurable exploration: Adjust beam width for your needs

Comparison example:

Prompt: "The solution to climate change requires"

Greedy: "a global effort to reduce carbon emissions."
Beam (width=5): "coordinated international cooperation, technological innovation, and fundamental changes in how we produce and consume energy."

Beam Search Limitations

❌ Still has issues:

Computational cost: 5x more expensive than greedy (with beam width 5)
Generic output: Tends toward "safe" but boring completions
Length bias: Favors shorter sequences (more tokens = lower probability). Mitigate with length penalty > 1.0.
Repetition: Can still repeat n‑grams. Mitigate with no‑repeat‑ngram-size.
Still deterministic: No creativity or surprise

Beam Width Selection Guide

Task Type	Recommended Beam Width	Reasoning
Translation	4-6	Balance quality and computation
Summarization	3-5	Want coherent but not too generic
Question Answering	1-3	Usually one correct answer
Creative Writing	Not recommended	Too conservative
Code Generation	2-4	Syntax correctness important

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Loading interactive component...

Quality Metrics Comparison

Aspect	Greedy Search	Beam Search (width=5)
Fluency	Good	Very Good
Coherence	Good	Very Good
Creativity	Poor	Poor
Consistency	High	High
Speed	Fast	5x Slower
Memory Usage	Low	5x Higher

When to Choose Each Method

Choose Greedy Search when:

Speed is critical
Memory is limited
Task has clear "correct" answers
Generating short completions
Prototyping or debugging

Choose Beam Search when:

Quality is more important than speed
Task benefits from planning ahead (translation, summarization)
You have computational resources
Output length is moderate (20-100 tokens)
Serving fewer requests but need higher quality

Practical Implementation with Hugging Face

Modern libraries make these methods easy to use:

from transformers import pipeline, GPT2LMHeadModel, GPT2Tokenizer

# Setup
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Greedy search (do_sample=False, num_beams=1)
greedy_output = generator(
    "The future of renewable energy",
    max_length=50,
    num_return_sequences=1,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

# Beam search (do_sample=False, num_beams>1)
beam_output = generator(
    "The future of renewable energy",
    max_length=50,
    num_beams=5,
    num_return_sequences=1,
    do_sample=False,
    early_stopping=True,
    pad_token_id=tokenizer.eos_token_id
)

print("Greedy:", greedy_output[0]['generated_text'])
print("Beam:", beam_output[0]['generated_text'])

Key Parameters Explained

do_sample=False: Use deterministic methods (greedy/beam)
num_beams=1: Greedy search
num_beams>1: Beam search with specified width
early_stopping=True: Stop when EOS token is generated
num_return_sequences: How many different outputs to return
length_penalty: >1.0 encourages longer outputs; <1.0 prefers shorter
no_repeat_ngram_size: Prevents repeating n‑grams of given size

Common Issues and Debugging

Problem 1: Repetitive Output

# Symptoms: "I think that I think that I think..."
# Solutions:
- Try beam search instead of greedy
- Add length constraints
- Use repetition penalty (covered in next lesson)

Problem 2: Truncated Output

# Symptoms: Output stops too early
# Solutions:
- Increase max_length parameter
- Check for unexpected EOS tokens
- Verify tokenizer settings

Problem 3: Generic Output

# Symptoms: Boring, predictable text
# Solutions:
- This is expected with deterministic methods
- Consider probabilistic sampling (next lesson)
- Adjust beam width

Problem 4: High Memory Usage

# Symptoms: Out of memory errors with beam search
# Solutions:
- Reduce beam width
- Process shorter sequences
- Use gradient checkpointing
- Switch to greedy for prototyping

Summary and Next Steps

What We've Learned

Greedy Search: Simple, fast, but can get stuck in loops
Beam Search: Better quality through exploration, but more expensive
Trade-offs: Speed vs. quality, determinism vs. creativity
Use Cases: When to choose each method

The Limitation of Deterministic Methods

Both greedy and beam search share a fundamental limitation: they're too conservative. They always choose the "safe" options, leading to:

Predictable, sometimes boring text
Lack of creativity and surprise
Difficulty with open-ended creative tasks

What's Next

In our next lesson, we'll explore probabilistic sampling techniques that introduce controlled randomness:

Temperature sampling: Adding creativity while maintaining quality
Top-k sampling: Limiting choices to reasonable options
Nucleus (top-p) sampling: Dynamically adjusting choice sets
Combining techniques: Building production-ready systems

These methods will give us the tools to generate more interesting, creative, and diverse text while still maintaining quality and coherence.

Practice Exercises

Exercise 1: Implementation Challenge

Implement both greedy and beam search from scratch (without using Hugging Face's built-in methods). Compare your results with the library versions.

Exercise 2: Parameter Exploration

Test beam search with different beam widths (1, 3, 5, 10) on the same prompt. Analyze how beam width affects:

Output quality
Generation time
Memory usage

Exercise 3: Use Case Analysis

For each scenario below, decide whether to use greedy search or beam search and justify your choice:

Real-time chatbot responses
Academic paper summarization
Code completion in an IDE
News article generation
Poetry writing assistant

Exercise 4: Debugging Practice

Given these problematic outputs, identify the likely issue and propose solutions:

"The the the the the the..."
Output stops after just 3 tokens
Very generic, dictionary-like responses
Out of memory errors

Additional Resources

Hugging Face Text Generation Guide
The Illustrated GPT-2 - Great visualization of the generation process
Beam Search Implementation Tutorial
Google's Neural Machine Translation System - Where beam search shines

NLP Fundamentals: Core Concepts and Architectures

Text Generation: Deterministic Methods

Overview

Learning Objectives

The Core Challenge: From Probabilities to Words

The Decision Point

Mental Model: The Path Through a Forest

Greedy Search: The Simplest Approach

Core Concept

Visualization: Greedy Path Selection

Python Implementation

When Greedy Search Works Well

When Greedy Search Fails

The Repetition Problem

Beam Search: Exploring Multiple Paths

Core Concept

The Beam Width Parameter

Visualization: Beam Search Tree

Algorithm Walkthrough

Python Implementation

Beam Search Advantages

Beam Search Limitations

Beam Width Selection Guide

Direct Comparison: Greedy vs. Beam Search

Side-by-Side Examples

Quality Metrics Comparison

When to Choose Each Method

Practical Implementation with Hugging Face

Key Parameters Explained

Common Issues and Debugging

Problem 1: Repetitive Output

Problem 2: Truncated Output

Problem 3: Generic Output

Problem 4: High Memory Usage

Summary and Next Steps

What We've Learned

The Limitation of Deterministic Methods

What's Next

Practice Exercises

Exercise 1: Implementation Challenge

Exercise 2: Parameter Exploration

Exercise 3: Use Case Analysis

Exercise 4: Debugging Practice

Additional Resources