Dictionaries and Sets

Learning Objectives: After this lesson, you'll master key-value pairs with dictionaries and unique collections with sets, understand when to use each data structure, and apply them to real-world problems.

Introduction to Dictionaries

A dictionary is Python's implementation of a hash table - a collection of key-value pairs where each key is unique. Think of it like a real dictionary where you look up a word (key) to find its definition (value).

Loading tool...

Dictionary Operations

Loading tool...

Dictionary Methods and Patterns

Loading tool...

Working with Nested Dictionaries

Loading tool...

Introduction to Sets

A set is an unordered collection of unique elements. Sets are perfect for:

  • Removing duplicates
  • Mathematical operations (union, intersection)
  • Membership testing
Loading tool...

Set Operations and Mathematics

Loading tool...

Practical Applications

Example 1: Word Frequency Counter

Loading tool...

Example 2: Student Grade Management

Loading tool...

Practice Exercises

Exercise 1: Contact Book

Loading tool...

Exercise 2: Inventory Management

Loading tool...

Exercise 3: Text Analysis

Loading tool...

Key Takeaways

Dictionaries store key-value pairs with fast O(1) lookup time
Dictionary methods - get(), keys(), values(), items(), update(), pop()
Sets store unique elements and support mathematical operations
Set operations - union (|), intersection (&), difference (-), symmetric difference (^)
Choose dictionaries for mappings and fast lookups by key
Choose sets for unique collections and mathematical operations
Nested structures enable complex data organization
Dictionary/set comprehensions provide concise creation syntax

Connections: Dictionaries and Sets Across Computer Science

🔗 Connection to Hash Tables (How Dictionaries Work)

Dictionaries are Python's implementation of hash tables, one of computer science's most important data structures:

How Hash Tables Work:

Dictionary: {"apple": 5, "banana": 3, "cherry": 8} Step 1: Hash Function "apple" → hash("apple") → 2764821906 → % 8 → Index 2 "banana" → hash("banana") → 8731806542 → % 8 → Index 6 "cherry" → hash("cherry") → 9842763102 → % 8 → Index 6 (collision!) Step 2: Storage (with collision resolution) ┌───┬─────────┐ │ 0 │ empty │ │ 1 │ empty │ │ 2 │ apple:5 │ ← Direct placement │ 3 │ empty │ │ 4 │ empty │ │ 5 │ empty │ │ 6 │banana:3 │ ← First in chain │ │cherry:8 │ ← Collision resolved │ 7 │ empty │ └───┴─────────┘

Why Dictionaries Are Fast (O(1)):

  • Direct access via hash computation
  • No need to search through all items
  • Compare to list search: O(n) - must check each item

Hash Function Requirements:

  1. Deterministic: Same input always gives same hash
  2. Uniform distribution: Spreads keys evenly
  3. Fast to compute: Quick hash calculation
# Python's built-in hash function print(hash("apple")) # Some large number print(hash("apple")) # Same number every time # Immutable types can be hashed hash(5) # ✅ OK hash("hello") # ✅ OK hash((1, 2)) # ✅ OK (tuple) # hash([1, 2]) # ❌ Error - lists are mutable

🔗 Connection to Databases

Dictionaries model database concepts:

Dictionary as Database Record:

# Single database row as dictionary user = { "id": 12345, "username": "alice_smith", "email": "alice@example.com", "age": 28, "is_active": True } # Collection of records (like database table) users_table = [ {"id": 1, "name": "Alice", "age": 28}, {"id": 2, "name": "Bob", "age": 35}, {"id": 3, "name": "Charlie", "age": 42} ] # Query by key (like database index) user_index = {user["id"]: user for user in users_table} alice = user_index[1] # O(1) lookup, just like database index!

Database Concepts in Python:

Database Table → List of dictionaries Table Row → Dictionary Column Names → Dictionary keys Field Values → Dictionary values Primary Key → Dictionary key for lookup Index → Separate dictionary for fast access Join → Combining dictionaries from multiple sources

Real Database Query vs Python:

-- SQL Query SELECT name, age FROM users WHERE age > 30;
# Python equivalent results = [ {"name": user["name"], "age": user["age"]} for user in users_table if user["age"] > 30 ]

🔗 Connection to Set Theory (Mathematics)

Python sets implement mathematical set theory:

Set Theory Fundamentals:

Universal Set (U): All possible elements Set A: {1, 2, 3, 4, 5} Set B: {4, 5, 6, 7, 8} Union (A ∪ B): {1, 2, 3, 4, 5, 6, 7, 8} Intersection (A ∩ B): {4, 5} Difference (A - B): {1, 2, 3} Symmetric Diff (A Δ B): {1, 2, 3, 6, 7, 8} Complement (A'): Everything not in A

Python Implementation:

A = {1, 2, 3, 4, 5} B = {4, 5, 6, 7, 8} A | B # Union: {1, 2, 3, 4, 5, 6, 7, 8} A & B # Intersection: {4, 5} A - B # Difference: {1, 2, 3} A ^ B # Symmetric difference: {1, 2, 3, 6, 7, 8} # Subset/superset {1, 2}.issubset(A) # True A.issuperset({1, 2}) # True A.isdisjoint(B) # False (they share elements)

Venn Diagrams:

A B ┌───┐ ┌───┐ │ │ │ │ │ 1 │ │ 6 │ │ 2 ├──────┤ 7 │ │ 3 │ 4 5 │ 8 │ └───┘ └───┘ A - B = {1,2,3} (left only) A & B = {4,5} (overlap) B - A = {6,7,8} (right only) A | B = {1-8} (everything) A ^ B = {1,2,3,6,7,8} (no overlap)

🔗 Connection to Other Data Structures

Performance Comparison:

OperationListDictionarySet
Access by index/keyO(1)O(1)N/A
Search for valueO(n)O(1)O(1)
InsertO(1)*O(1)O(1)
DeleteO(n)O(1)O(1)
Check membershipO(n)O(1)O(1)
Order preserved✅ Yes✅ Yes (3.7+)❌ No

*Append is O(1), insert at position is O(n)

When to Use Each:

# Use LIST when: # - Order matters # - Duplicates allowed # - Access by position needed shopping_list = ["milk", "eggs", "milk"] # OK to repeat # Use DICTIONARY when: # - Need key-value mapping # - Fast lookup by key # - Modeling objects/records user_scores = {"alice": 95, "bob": 87, "charlie": 92} # Use SET when: # - Need unique values only # - Mathematical set operations # - Fast membership testing seen_items = {"apple", "banana"} # Automatically removes duplicates

🔗 Connection to JSON and APIs

Dictionaries map directly to JSON, the universal data exchange format:

Python Dictionary ↔ JSON:

import json # Python dictionary user = { "name": "Alice", "age": 28, "emails": ["alice@work.com", "alice@home.com"], "address": { "street": "123 Main St", "city": "Boston" } } # Convert to JSON string json_string = json.dumps(user, indent=2) print(json_string) # { # "name": "Alice", # "age": 28, # "emails": ["alice@work.com", "alice@home.com"], # "address": { # "street": "123 Main St", # "city": "Boston" # } # } # Convert back to dictionary user_dict = json.loads(json_string)

Real API Response:

# API returns JSON → Python converts to dictionary api_response = { "status": "success", "data": { "users": [ {"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"} ] }, "meta": { "page": 1, "total": 50 } } # Easy access with dictionary syntax for user in api_response["data"]["users"]: print(user["name"])

🔗 Connection to Other Languages

Dictionaries in Different Languages:

PythonJavaScriptJavaRuby
dictObjectHashMapHash
{"a": 1}{a: 1}new HashMap<>(){a: 1}
d["a"]obj.a or obj["a"]map.get("a")h[:a]

Sets in Different Languages:

PythonJavaScriptJavaC++
set()SetHashSetunordered_set
{1, 2, 3}new Set([1,2,3])new HashSet<>(){1,2,3}
a & bNo operatora.retainAll(b)set_intersection

🔗 Connection to Caching and Memoization

Dictionaries are perfect for caching (storing computed results):

# Without caching (slow for large n) def fibonacci_slow(n): if n <= 1: return n return fibonacci_slow(n-1) + fibonacci_slow(n-2) # With caching (dictionary stores results) fib_cache = {} def fibonacci_fast(n): if n in fib_cache: return fib_cache[n] # O(1) lookup! if n <= 1: result = n else: result = fibonacci_fast(n-1) + fibonacci_fast(n-2) fib_cache[n] = result # Store for future use return result # Python's built-in caching from functools import lru_cache @lru_cache(maxsize=None) # Uses dictionary internally! def fibonacci_cached(n): if n <= 1: return n return fibonacci_cached(n-1) + fibonacci_cached(n-2)

🔗 Connection to Real-World Applications

Configuration Files:

# config.json → dictionary config = { "app_name": "MyApp", "version": "1.0.0", "settings": { "theme": "dark", "notifications": True, "max_connections": 100 } }

Frequency Analysis:

# Count word occurrences (natural language processing) text = "the quick brown fox jumps over the lazy dog" word_freq = {} for word in text.split(): word_freq[word] = word_freq.get(word, 0) + 1 # {"the": 2, "quick": 1, "brown": 1, ...}

Graph Representations:

# Social network (adjacency list) friends = { "Alice": {"Bob", "Charlie"}, "Bob": {"Alice", "Diana"}, "Charlie": {"Alice", "Eve"}, "Diana": {"Bob"}, "Eve": {"Charlie"} }

Remember: Dictionaries and sets aren't just Python features - they're fundamental computer science concepts used everywhere from databases to web APIs to machine learning!

Next Steps

In the next lesson, we'll explore advanced data structure operations - working with nested structures, combining different data types, and implementing common algorithms with our data structures.


Ready to master advanced data manipulation? The next lesson will show you powerful techniques for working with complex data!