Redis Data Structures

A deep dive into Redis's core data structures. Redis is not just a key-value store—it's a data structure server supporting strings, hashes, lists, sets, sorted sets, streams, and more. Understanding these structures is essential for building high-performance applications.

Strings

Strings are the most basic Redis data type. A Redis string can hold any kind of data: text, serialized JSON, binary data like images, or even integers and floats that Redis can manipulate atomically. Key characteristics: • Maximum size: 512 MB per string • Binary-safe: can store any binary data • Supports atomic increment/decrement for numeric values • Perfect for caching, counters, rate limiting, and session storage

# Basic string operations
SET user:1:name "John Doe"              # Store a string
GET user:1:name                         # Retrieve: "John Doe"
SETNX user:1:name "Jane"                # Set only if key doesn't exist (returns 0)
SETEX session:abc123 3600 "user_data"   # Set with 1-hour expiration

# Atomic counters
SET page:views 0
INCR page:views                         # Increment by 1 → 1
INCRBY page:views 100                   # Increment by 100 → 101
DECR page:views                         # Decrement by 1 → 100
INCRBYFLOAT price:item1 0.99            # Floating-point increment

# Bit operations (useful for feature flags, bloom filters)
SETBIT user:1:features 0 1              # Set bit 0 to 1 (feature enabled)
GETBIT user:1:features 0                # Get bit 0 → 1
BITCOUNT user:1:features                # Count set bits → 1

# String manipulation
APPEND user:1:name " Jr."               # Append → "John Doe Jr."
STRLEN user:1:name                      # Length → 12
GETRANGE user:1:name 0 3                # Substring → "John"
SETRANGE user:1:name 5 "Smith"          # Replace at offset 5

# Multiple keys
MSET user:1:city "NYC" user:1:age "30"  # Set multiple keys atomically
MGET user:1:name user:1:city user:1:age # Get multiple values

Use Cases: Strings

1. Caching: Store serialized objects, API responses, or rendered HTML 2. Session Storage: Use SETEX for auto-expiring session tokens 3. Rate Limiting: Implement sliding window counters with INCR + EXPIRE 4. Distributed Locks: Use SET NX PX for acquiring locks with timeout 5. Feature Flags: Use bit operations to track multiple boolean flags efficiently

# Rate Limiter Pattern (100 requests per minute)
key = "rate:user:123"
current = INCR key
if current == 1:
    EXPIRE key 60                       # Set 60-second window on first request
if current > 100:
    return "Rate limit exceeded"

# Distributed Lock Pattern
lock_key = "lock:resource:abc"
# Acquire lock (NX = only if not exists, PX = milliseconds TTL)
SET lock_key "owner_id" NX PX 30000
# Release lock (only if we own it - use Lua script for atomicity)
if GET lock_key == "owner_id":
    DEL lock_key

Hashes

Hashes are maps between string fields and string values—perfect for representing objects. They're memory-efficient when storing many small key-value pairs and provide O(1) access to individual fields. Key characteristics: • Can store up to 4 billion field-value pairs • Memory-optimized for small hashes (ziplist encoding) • Supports partial updates without reading/writing entire object • Ideal for user profiles, product details, configuration objects

# Basic hash operations
HSET user:1000 name "Alice" email "alice@example.com" age 28
HGET user:1000 name                     # Get single field → "Alice"
HMGET user:1000 name email              # Get multiple fields
HGETALL user:1000                       # Get all field-value pairs

# Incrementing numeric fields
HINCRBY user:1000 age 1                 # Increment age → 29
HINCRBYFLOAT user:1000 balance 99.50    # Add to balance

# Field existence and deletion
HEXISTS user:1000 phone                 # Check if field exists → 0
HSETNX user:1000 phone "555-1234"       # Set only if field doesn't exist
HDEL user:1000 phone                    # Delete field

# Metadata operations
HKEYS user:1000                         # Get all field names
HVALS user:1000                         # Get all values
HLEN user:1000                          # Count of fields → 3

# Scanning large hashes
HSCAN user:1000 0 COUNT 10              # Iterate with cursor (non-blocking)

Use Cases: Hashes

1. User Profiles: Store user attributes with granular field updates 2. Object Caching: Cache complex objects with field-level access 3. Counters by Category: Multiple counters under one key (page views by date) 4. Configuration Storage: Application config with easy updates 5. Shopping Carts: Product IDs as fields, quantities as values

# Shopping Cart Pattern
cart_key = "cart:user:123"
HSET cart_key "product:abc" 2           # 2x Product ABC
HSET cart_key "product:xyz" 1           # 1x Product XYZ
HINCRBY cart_key "product:abc" 1        # Add one more → 3
HDEL cart_key "product:xyz"             # Remove item
HGETALL cart_key                        # Get cart contents

# Tracking Daily Stats
stats_key = "stats:page:home"
HINCRBY stats_key "2024-01-15" 1        # Increment today's count
HINCRBY stats_key "2024-01-16" 1
HGETALL stats_key                       # Get all daily counts

Lists

Lists are linked lists of string values, supporting push/pop operations from both ends in O(1) time. They're ideal for queues, stacks, activity feeds, and any scenario requiring ordered data with fast insertion. Key characteristics: • Maximum length: 2^32 - 1 elements (over 4 billion) • O(1) push/pop at head or tail • O(N) access by index (linked list behavior) • Supports blocking operations for pub/sub patterns

# Basic list operations
RPUSH queue:jobs "job1" "job2" "job3"   # Add to right (tail)
LPUSH queue:jobs "job0"                 # Add to left (head)
LRANGE queue:jobs 0 -1                  # Get all → ["job0", "job1", "job2", "job3"]

# Queue operations (FIFO)
RPUSH queue:tasks "task1"               # Enqueue
LPOP queue:tasks                        # Dequeue → "task1"

# Stack operations (LIFO)
LPUSH stack:undo "action1"              # Push
LPOP stack:undo                         # Pop → "action1"

# Blocking operations (for workers)
BRPOP queue:jobs 30                     # Block up to 30 seconds for item
BLPOP queue:priority queue:normal 0     # Multi-queue priority with infinite wait

# Trimming (keep only recent items)
LPUSH notifications:user:1 "new notification"
LTRIM notifications:user:1 0 99         # Keep only last 100 notifications

# Index operations
LINDEX queue:jobs 0                     # Get first element
LSET queue:jobs 0 "updated_job0"        # Update element at index
LINSERT queue:jobs BEFORE "job2" "job1.5"  # Insert before element
LREM queue:jobs 1 "job1"                # Remove 1 occurrence of "job1"

# Moving between lists
LMOVE queue:jobs queue:processing LEFT RIGHT  # Atomically move element

Use Cases: Lists

1. Message Queues: Producer/consumer patterns with RPUSH/BLPOP 2. Activity Feeds: Recent user actions with automatic trimming 3. Background Job Queues: Task distribution across workers 4. Undo/Redo Stacks: Operation history 5. Real-time Leaderboard Updates: Push new scores to be processed

# Reliable Queue Pattern (with acknowledgment)
# Producer
RPUSH queue:pending "job_data"

# Worker
while True:
    # Move job to processing list atomically
    job = LMOVE queue:pending queue:processing RIGHT LEFT
    if job:
        try:
            process(job)
            LREM queue:processing 1 job      # Remove on success
        except:
            RPUSH queue:pending job           # Re-queue on failure

# Activity Feed with Capping
def add_activity(user_id, activity):
    key = f"feed:user:{user_id}"
    LPUSH key activity
    LTRIM key 0 999                       # Keep only 1000 most recent

Sets

Sets are unordered collections of unique strings. They provide O(1) membership testing and support powerful set operations like union, intersection, and difference. Key characteristics: • Unique members only (automatic deduplication) • O(1) add, remove, and membership test • Set operations: union, intersection, difference • Random member selection supported

# Basic set operations
SADD tags:post:1 "redis" "database" "nosql"
SADD tags:post:2 "redis" "caching" "performance"

SISMEMBER tags:post:1 "redis"           # Check membership → 1 (true)
SMEMBERS tags:post:1                    # Get all members
SCARD tags:post:1                       # Count members → 3

# Add/remove operations
SADD tags:post:1 "tutorial"             # Add member → 1 (added)
SADD tags:post:1 "redis"                # Add duplicate → 0 (already exists)
SREM tags:post:1 "nosql"                # Remove member

# Set operations
SINTER tags:post:1 tags:post:2          # Intersection → ["redis"]
SUNION tags:post:1 tags:post:2          # Union → all unique tags
SDIFF tags:post:1 tags:post:2           # Difference → tags only in post:1

# Store results
SINTERSTORE common:tags tags:post:1 tags:post:2  # Store intersection

# Random selection
SRANDMEMBER tags:post:1                 # Random member (non-destructive)
SRANDMEMBER tags:post:1 3               # 3 random members
SPOP tags:post:1                        # Remove and return random member

# Move between sets
SMOVE tags:post:1 tags:post:2 "tutorial"  # Atomically move member

Use Cases: Sets

1. Tagging Systems: Associate tags with content, find common tags 2. Social Graphs: Track followers/following relationships 3. Unique Visitors: Track daily unique IPs or user IDs 4. Real-time Recommendations: Find common interests between users 5. Lottery/Random Selection: Pick random winners from participants

# Social Graph Pattern
SADD followers:alice "bob" "charlie" "dave"
SADD followers:bob "alice" "eve"
SADD following:bob "alice" "charlie"

# Find mutual followers (who Alice and Bob both follow)
SINTER followers:alice followers:bob    # Common followers

# Find all people in network (2nd degree connections)
SUNION followers:alice followers:bob

# Daily Unique Visitors
today = "2024-01-15"
SADD visitors:daily:{today} user_id
SCARD visitors:daily:{today}            # Count unique visitors

# Random Giveaway Winner
SADD giveaway:participants "user1" "user2" "user3" "user4"
SPOP giveaway:participants              # Pick and remove winner

Sorted Sets (ZSets)

Sorted Sets are sets where each member has an associated score, keeping elements ordered by score. They combine the uniqueness of sets with the ordering capabilities needed for leaderboards, time-series, and priority queues. Key characteristics: • Members are unique, scores can repeat • Ordered by score (ascending by default) • O(log N) add, remove, and rank operations • Range queries by score or rank • Lexicographical ordering available when scores are equal

# Basic sorted set operations
ZADD leaderboard 100 "alice" 85 "bob" 92 "charlie"
ZSCORE leaderboard "alice"              # Get score → 100
ZRANK leaderboard "bob"                 # Get rank (0-based) → 0 (lowest)
ZREVRANK leaderboard "alice"            # Reverse rank → 0 (highest)

# Range queries
ZRANGE leaderboard 0 -1                 # All by rank (low to high)
ZREVRANGE leaderboard 0 2               # Top 3 (high to low)
ZREVRANGE leaderboard 0 2 WITHSCORES    # Top 3 with scores

# Score-based queries
ZRANGEBYSCORE leaderboard 80 100        # Members with score 80-100
ZRANGEBYSCORE leaderboard (80 100       # Exclusive lower bound
ZRANGEBYSCORE leaderboard -inf +inf     # All (useful with LIMIT)
ZRANGEBYSCORE leaderboard 80 100 LIMIT 0 10  # Paginate

# Score updates
ZINCRBY leaderboard 10 "bob"            # Increment score → 95
ZADD leaderboard XX 110 "alice"         # Update only if exists
ZADD leaderboard NX 75 "dave"           # Add only if not exists
ZADD leaderboard GT 120 "alice"         # Update only if new score > current

# Removal
ZREM leaderboard "charlie"              # Remove member
ZREMRANGEBYSCORE leaderboard 0 50       # Remove scores 0-50
ZREMRANGEBYRANK leaderboard 0 4         # Remove bottom 5

# Set operations
ZUNIONSTORE combined 2 leaderboard:week1 leaderboard:week2 WEIGHTS 1 2
ZINTERSTORE overlap 2 leaderboard:a leaderboard:b AGGREGATE MAX

Use Cases: Sorted Sets

1. Leaderboards: Real-time rankings with instant updates 2. Rate Limiting: Sliding window with timestamps as scores 3. Priority Queues: Tasks ordered by priority/deadline 4. Time-Series Data: Events sorted by timestamp 5. Range Queries: Find items within score ranges (prices, distances)

# Real-time Leaderboard
def update_score(game_id, user_id, score):
    key = f"leaderboard:{game_id}"
    ZADD key score user_id

def get_top_players(game_id, count=10):
    key = f"leaderboard:{game_id}"
    return ZREVRANGE key 0 count-1 WITHSCORES

def get_user_rank(game_id, user_id):
    key = f"leaderboard:{game_id}"
    return ZREVRANK key user_id + 1  # 1-based rank

# Sliding Window Rate Limiter
def is_rate_limited(user_id, limit=100, window_seconds=60):
    key = f"ratelimit:{user_id}"
    now = current_timestamp_ms()
    window_start = now - (window_seconds * 1000)

    pipeline:
        ZREMRANGEBYSCORE key 0 window_start     # Remove old entries
        ZADD key now now                         # Add current request
        count = ZCARD key                        # Count requests in window
        EXPIRE key window_seconds                # Auto-cleanup

    return count > limit

# Priority Queue
ZADD tasks:priority 1 "urgent_task"      # Priority 1 (highest)
ZADD tasks:priority 5 "normal_task"      # Priority 5
ZADD tasks:priority 10 "low_task"        # Priority 10
ZPOPMIN tasks:priority                   # Get highest priority task

Streams

Streams are an append-only log data structure introduced in Redis 5.0. They're designed for event sourcing, message queues with consumer groups, and real-time data ingestion—similar to Apache Kafka but simpler. Key characteristics: • Append-only (immutable entries) • Automatic ID generation with timestamps • Consumer groups for distributed processing • Message acknowledgment and pending entry list • Range queries and stream trimming

# Adding entries
XADD events * type "click" user_id "123" page "/home"
# Returns auto-generated ID: "1673847321145-0" (timestamp-sequence)

XADD events * type "purchase" user_id "123" amount "99.99"

# Reading entries
XRANGE events - +                       # All entries (oldest to newest)
XRANGE events - + COUNT 10              # First 10 entries
XREVRANGE events + - COUNT 10           # Last 10 entries
XLEN events                             # Stream length

# Reading from specific ID
XRANGE events 1673847321145-0 +         # From this ID onwards

# Blocking read (for real-time consumers)
XREAD BLOCK 5000 STREAMS events $       # Wait up to 5s for new entries
XREAD BLOCK 0 STREAMS events 0          # Read from beginning, block forever

# Consumer Groups (distributed processing)
XGROUP CREATE events mygroup $ MKSTREAM          # Create group
XREADGROUP GROUP mygroup consumer1 COUNT 10 STREAMS events >  # Read undelivered

# Acknowledgment
XACK events mygroup 1673847321145-0              # Mark as processed

# Pending entries (unacknowledged)
XPENDING events mygroup                           # Summary
XPENDING events mygroup - + 10 consumer1          # Details

# Claim stale messages
XAUTOCLAIM events mygroup consumer2 60000 0-0    # Claim messages idle > 60s

# Trimming
XTRIM events MAXLEN 1000                          # Keep only last 1000
XTRIM events MAXLEN ~ 1000                        # Approximate (faster)

Use Cases: Streams

1. Event Sourcing: Immutable log of all state changes 2. Message Queues: Reliable delivery with consumer groups 3. Activity Feeds: Chronological event streams 4. IoT Data Ingestion: High-throughput sensor data 5. Audit Logs: Tamper-evident activity records

# Event Sourcing Pattern
def record_event(stream, event_type, data):
    entry = {"type": event_type, **data}
    return XADD stream "*" entry

# Producer
record_event("orders:events", "created",
             {"order_id": "123", "user": "alice", "total": "99.99"})
record_event("orders:events", "paid",
             {"order_id": "123", "method": "card"})
record_event("orders:events", "shipped",
             {"order_id": "123", "tracking": "XYZ123"})

# Consumer Group Example
XGROUP CREATE orders:events processors $ MKSTREAM

# Worker 1
while True:
    entries = XREADGROUP GROUP processors worker1 COUNT 10 BLOCK 5000
                         STREAMS orders:events >
    for entry_id, fields in entries:
        process(fields)
        XACK orders:events processors entry_id

# Rebuild state from events
def get_order_state(order_id):
    events = XRANGE orders:events - +
    state = {}
    for event in events:
        if event["order_id"] == order_id:
            apply_event(state, event)
    return state

HyperLogLog

HyperLogLog is a probabilistic data structure for cardinality estimation—counting unique elements with minimal memory. It trades perfect accuracy for extreme space efficiency: ~12KB for billions of unique elements. Key characteristics: • Standard error: 0.81% • Fixed memory: ~12KB regardless of cardinality • Mergeable: combine multiple HLLs • Perfect for unique visitor counts, search queries

# Adding elements
PFADD visitors:today "user:1" "user:2" "user:3"
PFADD visitors:today "user:1"           # Duplicate (still counted once)
PFADD visitors:today "user:4" "user:5"

# Counting unique elements
PFCOUNT visitors:today                  # Estimated unique count → ~5

# Merging multiple HLLs
PFADD visitors:day1 "user:1" "user:2"
PFADD visitors:day2 "user:2" "user:3"
PFMERGE visitors:week visitors:day1 visitors:day2
PFCOUNT visitors:week                   # Unique across both days → ~3

Use Cases: HyperLogLog

1. Unique Visitor Counting: Daily/weekly/monthly unique users 2. Search Query Analytics: Unique search terms 3. Network Analysis: Unique IP addresses 4. A/B Testing: Unique users per experiment variant 5. Real-time Dashboards: Approximate unique counts at scale

# Unique Visitor Tracking
def track_visitor(page, visitor_id):
    today = datetime.now().strftime("%Y-%m-%d")
    PFADD f"visitors:{page}:{today}" visitor_id

def get_unique_visitors(page, date):
    return PFCOUNT f"visitors:{page}:{date}"

def get_weekly_uniques(page):
    keys = [f"visitors:{page}:{day}" for day in last_7_days()]
    temp_key = f"visitors:{page}:week"
    PFMERGE temp_key *keys
    count = PFCOUNT temp_key
    DEL temp_key
    return count

Bitmaps

Bitmaps aren't a separate data type—they're string values treated as bit arrays. They enable extremely memory-efficient boolean operations across millions of elements. Key characteristics: • 1 bit per boolean value • 512MB = 4 billion bits • O(1) bit operations • Bitwise AND, OR, XOR, NOT across keys

# User daily login tracking
# Bit position = user ID, value = logged in today
SETBIT logins:2024-01-15 1000 1         # User 1000 logged in
SETBIT logins:2024-01-15 1001 1         # User 1001 logged in
SETBIT logins:2024-01-15 1002 1

GETBIT logins:2024-01-15 1000           # Check if user logged in → 1

# Count users who logged in
BITCOUNT logins:2024-01-15              # Total login count

# Find users active on multiple days
BITOP AND active:both logins:2024-01-15 logins:2024-01-16
BITCOUNT active:both                    # Users active both days

# Find users active on any day
BITOP OR active:any logins:2024-01-15 logins:2024-01-16
BITCOUNT active:any                     # Users active either day

# Find first set bit (first user who logged in)
BITPOS logins:2024-01-15 1              # Position of first '1' bit

Use Cases: Bitmaps

1. User Activity Tracking: Daily active users with 1 bit per user 2. Feature Flags: Enable/disable features per user 3. Bloom Filters: Probabilistic membership testing 4. Real-time Analytics: Retention, cohort analysis 5. Permissions: Role-based access with bit flags

# 7-Day Retention Analysis
def record_login(user_id):
    today = datetime.now().strftime("%Y-%m-%d")
    SETBIT f"logins:{today}" user_id 1

def calculate_retention(cohort_date, days=7):
    cohort_key = f"logins:{cohort_date}"
    retained = []

    for i in range(1, days + 1):
        check_date = cohort_date + timedelta(days=i)
        check_key = f"logins:{check_date}"
        result_key = f"temp:retention:{i}"

        BITOP AND result_key cohort_key check_key
        retained.append(BITCOUNT result_key)
        DEL result_key

    cohort_size = BITCOUNT cohort_key
    return [r / cohort_size for r in retained]  # Retention percentages

# Feature Flags (8 features per byte)
def set_feature(user_id, feature_index, enabled):
    SETBIT f"features:{user_id}" feature_index (1 if enabled else 0)

def has_feature(user_id, feature_index):
    return GETBIT f"features:{user_id}" feature_index == 1

Geospatial Indexes

Redis Geo is built on Sorted Sets, storing longitude/latitude pairs and enabling location-based queries like radius searches and distance calculations. Key characteristics: • Based on Sorted Sets (geohash as score) • Radius queries in km or miles • Distance calculations • Geohash encoding/decoding

# Adding locations
GEOADD restaurants -73.935242 40.730610 "Joe's Pizza"
GEOADD restaurants -73.980000 40.750000 "Times Square Diner"
GEOADD restaurants -73.990000 40.720000 "Chinatown Noodles"

# Get coordinates
GEOPOS restaurants "Joe's Pizza"        # → [[-73.935242, 40.730610]]

# Calculate distance
GEODIST restaurants "Joe's Pizza" "Times Square Diner" km  # Distance in km

# Radius search
GEORADIUS restaurants -73.950000 40.740000 5 km WITHDIST WITHCOORD COUNT 10
# Returns nearby restaurants with distance and coordinates

# Search by member
GEORADIUSBYMEMBER restaurants "Joe's Pizza" 3 km WITHDIST

# Geohash
GEOHASH restaurants "Joe's Pizza"       # → ["dr5regy..."]

# Remove (uses ZREM since it's a Sorted Set)
ZREM restaurants "Chinatown Noodles"

Use Cases: Geospatial

1. Store Locators: Find nearest stores/restaurants 2. Ride-sharing: Match drivers and riders by proximity 3. Delivery Services: Restaurants within delivery radius 4. Social Apps: Find nearby users 5. Real Estate: Properties within distance of a location

# Nearby Restaurants API
def find_nearby_restaurants(lat, lon, radius_km=5, limit=20):
    return GEORADIUS restaurants lon lat radius_km km
           WITHDIST WITHCOORD COUNT limit ASC

# Driver Matching for Ride-sharing
def update_driver_location(driver_id, lat, lon):
    GEOADD active_drivers lon lat driver_id
    EXPIRE active_drivers 300            # Clear inactive after 5 min

def find_nearest_drivers(pickup_lat, pickup_lon, count=5):
    return GEORADIUS active_drivers pickup_lon pickup_lat 10 km
           WITHDIST COUNT count ASC

def remove_driver(driver_id):
    ZREM active_drivers driver_id

# Delivery Zone Check
def is_in_delivery_zone(restaurant_id, customer_lat, customer_lon, max_km=8):
    restaurant_location = GEOPOS restaurants restaurant_id
    if not restaurant_location:
        return False
    r_lon, r_lat = restaurant_location[0]
    distance = GEODIST restaurants_temp restaurant_id customer km
    return distance <= max_km

Memory Optimization Tips

Redis stores everything in memory, so understanding data structure overhead is crucial: Encoding Optimizations: • Small hashes (< 64 fields, < 512 bytes/value) use ziplist encoding • Small lists use quicklist with compressed nodes • Small sets (< 64 elements, all integers) use intset • Sorted sets with small elements use ziplist Configuration Thresholds (redis.conf): • hash-max-ziplist-entries 512 • hash-max-ziplist-value 64 • list-max-ziplist-size -2 • set-max-intset-entries 512 • zset-max-ziplist-entries 128 • zset-max-ziplist-value 64

# Check memory usage
DEBUG OBJECT mykey                      # Shows encoding type
MEMORY USAGE mykey                      # Bytes used by key

# Check encoding
OBJECT ENCODING user:1                  # → "ziplist" or "hashtable"
OBJECT ENCODING numbers_set             # → "intset" or "hashtable"

# Key design for memory efficiency
# Bad: Many small keys
SET user:1:name "Alice"
SET user:1:email "alice@example.com"
SET user:1:age "30"
# Good: Single hash
HSET user:1 name "Alice" email "alice@example.com" age "30"

# Use short key names in production
# Instead of: user:profile:settings:notifications
# Use: u:p:s:n

# Expire unused keys
EXPIRE cache:query:abc 3600
EXPIREAT session:xyz 1707000000

Knowledge is power.