Dictionaries and Sets

Learning Objectives: After this lesson, you'll master key-value pairs with dictionaries and unique collections with sets, understand when to use each data structure, and apply them to real-world problems.

Introduction to Dictionaries

A dictionary is Python's implementation of a hash table - a collection of key-value pairs where each key is unique. Think of it like a real dictionary where you look up a word (key) to find its definition (value).

Dictionary Operations

Dictionary Methods and Patterns

Working with Nested Dictionaries

Introduction to Sets

A set is an unordered collection of unique elements. Sets are perfect for:

Removing duplicates
Mathematical operations (union, intersection)
Membership testing

Set Operations and Mathematics

Practical Applications

Example 1: Word Frequency Counter

Example 2: Student Grade Management

Practice Exercises

Exercise 1: Contact Book

Exercise 2: Inventory Management

Exercise 3: Text Analysis

Key Takeaways

✅ Dictionaries store key-value pairs with fast O(1) lookup time
✅ Dictionary methods - get(), keys(), values(), items(), update(), pop()
✅ Sets store unique elements and support mathematical operations
✅ Set operations - union (|), intersection (&), difference (-), symmetric difference (^)
✅ Choose dictionaries for mappings and fast lookups by key
✅ Choose sets for unique collections and mathematical operations
✅ Nested structures enable complex data organization
✅ Dictionary/set comprehensions provide concise creation syntax

Connections: Dictionaries and Sets Across Computer Science

🔗 Connection to Hash Tables (How Dictionaries Work)

Dictionaries are Python's implementation of hash tables, one of computer science's most important data structures:

How Hash Tables Work:

Dictionary: {"apple": 5, "banana": 3, "cherry": 8}

Step 1: Hash Function
"apple"  → hash("apple")  → 2764821906 → % 8 → Index 2
"banana" → hash("banana") → 8731806542 → % 8 → Index 6
"cherry" → hash("cherry") → 9842763102 → % 8 → Index 6 (collision!)

Step 2: Storage (with collision resolution)
┌───┬─────────┐
│ 0 │  empty  │
│ 1 │  empty  │
│ 2 │ apple:5 │  ← Direct placement
│ 3 │  empty  │
│ 4 │  empty  │
│ 5 │  empty  │
│ 6 │banana:3 │  ← First in chain
│   │cherry:8 │  ← Collision resolved
│ 7 │  empty  │
└───┴─────────┘

Why Dictionaries Are Fast (O(1)):

Direct access via hash computation
No need to search through all items
Compare to list search: O(n) - must check each item

Hash Function Requirements:

Deterministic: Same input always gives same hash
Uniform distribution: Spreads keys evenly
Fast to compute: Quick hash calculation

# Python's built-in hash function
print(hash("apple"))   # Some large number
print(hash("apple"))   # Same number every time

# Immutable types can be hashed
hash(5)          # ✅ OK
hash("hello")    # ✅ OK
hash((1, 2))     # ✅ OK (tuple)
# hash([1, 2])   # ❌ Error - lists are mutable

🔗 Connection to Databases

Dictionaries model database concepts:

Dictionary as Database Record:

# Single database row as dictionary
user = {
    "id": 12345,
    "username": "alice_smith",
    "email": "alice@example.com",
    "age": 28,
    "is_active": True
}

# Collection of records (like database table)
users_table = [
    {"id": 1, "name": "Alice", "age": 28},
    {"id": 2, "name": "Bob", "age": 35},
    {"id": 3, "name": "Charlie", "age": 42}
]

# Query by key (like database index)
user_index = {user["id"]: user for user in users_table}
alice = user_index[1]  # O(1) lookup, just like database index!

Database Concepts in Python:

Database Table    → List of dictionaries
Table Row         → Dictionary
Column Names      → Dictionary keys
Field Values      → Dictionary values
Primary Key       → Dictionary key for lookup
Index             → Separate dictionary for fast access
Join              → Combining dictionaries from multiple sources

Real Database Query vs Python:

-- SQL Query
SELECT name, age 
FROM users 
WHERE age > 30;

# Python equivalent
results = [
    {"name": user["name"], "age": user["age"]}
    for user in users_table
    if user["age"] > 30
]

🔗 Connection to Set Theory (Mathematics)

Python sets implement mathematical set theory:

Set Theory Fundamentals:

Universal Set (U): All possible elements
Set A: {1, 2, 3, 4, 5}
Set B: {4, 5, 6, 7, 8}

Union (A ∪ B):        {1, 2, 3, 4, 5, 6, 7, 8}
Intersection (A ∩ B): {4, 5}
Difference (A - B):   {1, 2, 3}
Symmetric Diff (A Δ B): {1, 2, 3, 6, 7, 8}
Complement (A'):      Everything not in A

Python Implementation:

A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

A | B  # Union:  {1, 2, 3, 4, 5, 6, 7, 8}
A & B  # Intersection: {4, 5}
A - B  # Difference: {1, 2, 3}
A ^ B  # Symmetric difference: {1, 2, 3, 6, 7, 8}

# Subset/superset
{1, 2}.issubset(A)     # True
A.issuperset({1, 2})   # True
A.isdisjoint(B)        # False (they share elements)

Venn Diagrams:

      A          B
    ┌───┐      ┌───┐
    │   │      │   │
    │ 1 │      │ 6 │
    │ 2 ├──────┤ 7 │
    │ 3 │ 4  5 │ 8 │
    └───┘      └───┘
    
A - B = {1,2,3}     (left only)
A & B = {4,5}       (overlap)
B - A = {6,7,8}     (right only)
A | B = {1-8}       (everything)
A ^ B = {1,2,3,6,7,8} (no overlap)

🔗 Connection to Other Data Structures

Performance Comparison:

Operation	List	Dictionary	Set
Access by index/key	O(1)	O(1)	N/A
Search for value	O(n)	O(1)	O(1)
Insert	O(1)*	O(1)	O(1)
Delete	O(n)	O(1)	O(1)
Check membership	O(n)	O(1)	O(1)
Order preserved	✅ Yes	✅ Yes (3.7+)	❌ No

*Append is O(1), insert at position is O(n)

When to Use Each:

# Use LIST when:
# - Order matters
# - Duplicates allowed
# - Access by position needed
shopping_list = ["milk", "eggs", "milk"]  # OK to repeat

# Use DICTIONARY when:
# - Need key-value mapping
# - Fast lookup by key
# - Modeling objects/records
user_scores = {"alice": 95, "bob": 87, "charlie": 92}

# Use SET when:
# - Need unique values only
# - Mathematical set operations
# - Fast membership testing
seen_items = {"apple", "banana"}  # Automatically removes duplicates

🔗 Connection to JSON and APIs

Dictionaries map directly to JSON, the universal data exchange format:

Python Dictionary ↔ JSON:

import json

# Python dictionary
user = {
    "name": "Alice",
    "age": 28,
    "emails": ["alice@work.com", "alice@home.com"],
    "address": {
        "street": "123 Main St",
        "city": "Boston"
    }
}

# Convert to JSON string
json_string = json.dumps(user, indent=2)
print(json_string)
# {
#   "name": "Alice",
#   "age": 28,
#   "emails": ["alice@work.com", "alice@home.com"],
#   "address": {
#     "street": "123 Main St",
#     "city": "Boston"
#   }
# }

# Convert back to dictionary
user_dict = json.loads(json_string)

Real API Response:

# API returns JSON → Python converts to dictionary
api_response = {
    "status": "success",
    "data": {
        "users": [
            {"id": 1, "name": "Alice"},
            {"id": 2, "name": "Bob"}
        ]
    },
    "meta": {
        "page": 1,
        "total": 50
    }
}

# Easy access with dictionary syntax
for user in api_response["data"]["users"]:
    print(user["name"])

🔗 Connection to Other Languages

Dictionaries in Different Languages:

Python	JavaScript	Java	Ruby
`dict`	Object	HashMap	Hash
`{"a": 1}`	`{a: 1}`	`new HashMap<>()`	`{a: 1}`
`d["a"]`	`obj.a` or `obj["a"]`	`map.get("a")`	`h[:a]`

Sets in Different Languages:

Python	JavaScript	Java	C++
`set()`	`Set`	`HashSet`	`unordered_set`
`{1, 2, 3}`	`new Set([1,2,3])`	`new HashSet<>()`	`{1,2,3}`
`a & b`	No operator	`a.retainAll(b)`	`set_intersection`

🔗 Connection to Caching and Memoization

Dictionaries are perfect for caching (storing computed results):

# Without caching (slow for large n)
def fibonacci_slow(n):
    if n <= 1:
        return n
    return fibonacci_slow(n-1) + fibonacci_slow(n-2)

# With caching (dictionary stores results)
fib_cache = {}

def fibonacci_fast(n):
    if n in fib_cache:
        return fib_cache[n]  # O(1) lookup!
    
    if n <= 1:
        result = n
    else:
        result = fibonacci_fast(n-1) + fibonacci_fast(n-2)
    
    fib_cache[n] = result  # Store for future use
    return result

# Python's built-in caching
from functools import lru_cache

@lru_cache(maxsize=None)  # Uses dictionary internally!
def fibonacci_cached(n):
    if n <= 1:
        return n
    return fibonacci_cached(n-1) + fibonacci_cached(n-2)

🔗 Connection to Real-World Applications

Configuration Files:

# config.json → dictionary
config = {
    "app_name": "MyApp",
    "version": "1.0.0",
    "settings": {
        "theme": "dark",
        "notifications": True,
        "max_connections": 100
    }
}

Frequency Analysis:

# Count word occurrences (natural language processing)
text = "the quick brown fox jumps over the lazy dog"
word_freq = {}
for word in text.split():
    word_freq[word] = word_freq.get(word, 0) + 1
# {"the": 2, "quick": 1, "brown": 1, ...}

Graph Representations:

# Social network (adjacency list)
friends = {
    "Alice": {"Bob", "Charlie"},
    "Bob": {"Alice", "Diana"},
    "Charlie": {"Alice", "Eve"},
    "Diana": {"Bob"},
    "Eve": {"Charlie"}
}

Remember: Dictionaries and sets aren't just Python features - they're fundamental computer science concepts used everywhere from databases to web APIs to machine learning!

Next Steps

In the next lesson, we'll explore advanced data structure operations - working with nested structures, combining different data types, and implementing common algorithms with our data structures.

Ready to master advanced data manipulation? The next lesson will show you powerful techniques for working with complex data!