Database Design

December 26, 2024

Database Design Patterns for Scalable Applications

Essential database design patterns for building scalable applications, covering normalization, denormalization, indexing strategies, and performance optimization.

Database Design
Scalability
SQL
NoSQL
Performance
Data Modeling
← Back to Blog

Database Design Patterns for Scalable Applications

Effective database design is crucial for building applications that can handle growth, maintain performance, and provide reliable data access. Understanding key patterns and when to apply them can make the difference between a successful application and one that struggles under load.

Database Normalization

First Normal Form (1NF)

Eliminate repeating groups and ensure atomic values:

-- Before (not normalized)
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name VARCHAR(100),
  items TEXT -- Multiple items in one field
);

-- After (1NF)
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name VARCHAR(100)
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INTEGER REFERENCES orders(id),
  product_name VARCHAR(100),
  quantity INTEGER,
  price DECIMAL(10,2)
);

Second Normal Form (2NF)

Remove partial dependencies:

-- Before (partial dependency)
CREATE TABLE order_details (
  order_id INTEGER,
  product_id INTEGER,
  customer_name VARCHAR(100), -- Depends only on order_id
  product_name VARCHAR(100),
  PRIMARY KEY (order_id, product_id)
);

-- After (2NF)
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name VARCHAR(100)
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INTEGER REFERENCES orders(id),
  product_id INTEGER,
  product_name VARCHAR(100)
);

Third Normal Form (3NF)

Remove transitive dependencies:

-- Before (transitive dependency)
CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  department_name VARCHAR(100), -- Depends on department_id
  department_id INTEGER
);

-- After (3NF)
CREATE TABLE departments (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100)
);

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  department_id INTEGER REFERENCES departments(id)
);

Denormalization Strategies

While normalization reduces redundancy, denormalization can improve read performance:

Materialized Views

-- Create materialized view for complex queries
CREATE MATERIALIZED VIEW user_order_summary AS
SELECT
  u.id as user_id,
  u.name,
  COUNT(o.id) as total_orders,
  SUM(o.total_amount) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- Refresh when data changes
REFRESH MATERIALIZED VIEW user_order_summary;

Pre-computed Aggregates

-- Add computed columns for frequently accessed data
ALTER TABLE users ADD COLUMN order_count INTEGER DEFAULT 0;
ALTER TABLE users ADD COLUMN total_spent DECIMAL(10,2) DEFAULT 0;

-- Update triggers to maintain computed values
CREATE OR REPLACE FUNCTION update_user_stats()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE users SET
      order_count = order_count + 1,
      total_spent = total_spent + NEW.total_amount
    WHERE id = NEW.user_id;
    RETURN NEW;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER order_insert_trigger
  AFTER INSERT ON orders
  FOR EACH ROW EXECUTE FUNCTION update_user_stats();

Indexing Strategies

Single Column Indexes

-- Index for frequently filtered columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

Composite Indexes

-- For queries with multiple WHERE conditions
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
CREATE INDEX idx_orders_date_status ON orders(created_at DESC, status);

-- Query using composite index
SELECT * FROM orders
WHERE user_id = 123 AND status = 'completed'
ORDER BY created_at DESC;

Partial Indexes

-- Index only active records
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;

-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(created_at DESC)
WHERE created_at > NOW() - INTERVAL '30 days';

Partitioning for Large Tables

Range Partitioning

-- Partition orders by month
CREATE TABLE orders (
  id SERIAL,
  user_id INTEGER,
  total_amount DECIMAL(10,2),
  created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024_01 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

List Partitioning

-- Partition by status
CREATE TABLE orders (
  id SERIAL,
  status VARCHAR(20),
  user_id INTEGER,
  created_at TIMESTAMP
) PARTITION BY LIST (status);

CREATE TABLE orders_pending PARTITION OF orders
  FOR VALUES IN ('pending', 'processing');

CREATE TABLE orders_completed PARTITION OF orders
  FOR VALUES IN ('completed', 'cancelled');

Query Optimization Patterns

Covering Indexes

-- Create index that covers the entire query
CREATE INDEX idx_users_covering ON users(name, email, created_at)
WHERE active = true;

-- This query uses only the index
SELECT name, email, created_at FROM users
WHERE active = true AND name LIKE 'John%';

Query Result Caching

// Application-level caching
const cache = new Map();

async function getPopularProducts(limit = 10) {
  const cacheKey = `popular_products_${limit}`;

  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);
  }

  const products = await db.query(`
    SELECT p.*, COUNT(oi.id) as order_count
    FROM products p
    LEFT JOIN order_items oi ON p.id = oi.product_id
    GROUP BY p.id
    ORDER BY order_count DESC
    LIMIT $1
  `, [limit]);

  cache.set(cacheKey, products);
  return products;
}

NoSQL Design Patterns

Document Embedding

// MongoDB - Embed related data
{
  _id: ObjectId("user123"),
  name: "John Doe",
  email: "john@example.com",
  orders: [
    {
      orderId: "order456",
      items: [...],
      total: 99.99,
      date: ISODate("2024-01-15")
    }
  ]
}

Reference Pattern

// MongoDB - Use references for large related data
{
  _id: ObjectId("user123"),
  name: "John Doe",
  email: "john@example.com",
  orderIds: [ObjectId("order456"), ObjectId("order789")]
}

{
  _id: ObjectId("order456"),
  userId: ObjectId("user123"),
  items: [...],
  total: 99.99
}

Performance Monitoring

Query Performance Analysis

-- PostgreSQL query analysis
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name;

-- Result shows execution plan and timing

Database Metrics

// Monitor key database metrics
const metrics = {
  queryLatency: await measureQueryLatency(),
  connectionPoolUsage: await getConnectionPoolStats(),
  slowQueries: await getSlowQueries(),
  tableSizes: await getTableSizes(),
  indexUsage: await getIndexUsageStats()
};

// Alert on performance issues
if (metrics.queryLatency.p95 > 1000) { // 1 second
  await alertTeam('High query latency detected');
}

Schema Evolution Patterns

Adding Columns Safely

-- Add column with default value
ALTER TABLE users ADD COLUMN phone VARCHAR(20) DEFAULT '';

-- Add column that cannot have NULL
ALTER TABLE users ADD COLUMN verified BOOLEAN NOT NULL DEFAULT false;

-- Add check constraint
ALTER TABLE users ADD CONSTRAINT check_phone_format
  CHECK (phone ~ '^\+?[1-9]\d{1,14}$');

Database Migrations

// Migration framework (example with Sequelize)
module.exports = {
  up: async (queryInterface, Sequelize) => {
    await queryInterface.createTable('users', {
      id: {
        allowNull: false,
        autoIncrement: true,
        primaryKey: true,
        type: Sequelize.INTEGER
      },
      name: {
        type: Sequelize.STRING,
        allowNull: false
      },
      email: {
        type: Sequelize.STRING,
        unique: true,
        allowNull: false
      }
    });
  },

  down: async (queryInterface, Sequelize) => {
    await queryInterface.dropTable('users');
  }
};

Best Practices Summary

1. Design for Your Use Case

  • OLTP: Normalized schemas for transactional consistency
  • OLAP: Denormalized schemas for analytical queries
  • Hybrid: Balance between normalization and performance

2. Plan for Scale

  • Index strategically: Cover common query patterns
  • Partition large tables: Improve query performance and maintenance
  • Cache frequently accessed data: Reduce database load

3. Monitor and Optimize

  • Track query performance: Identify bottlenecks early
  • Review indexes regularly: Remove unused indexes
  • Update statistics: Keep query planner informed

4. Ensure Data Quality

  • Use constraints: Enforce data integrity at database level
  • Validate at application level: Double-check important data
  • Audit changes: Track who changed what and when

Conclusion

Effective database design requires balancing multiple concerns: data integrity, query performance, scalability, and maintainability. The patterns discussed here provide a foundation for building robust, scalable database systems.

Remember that database design is an iterative process. Start with normalized schemas, measure performance, and denormalize strategically based on your application's actual usage patterns.

The most successful applications have databases that evolve with their requirements, continuously optimized based on real-world usage data and performance metrics.

Choose the right patterns for your specific use case, implement them thoughtfully, and be prepared to adapt as your application grows and requirements change.

Tags:

Database Design
Scalability
SQL
NoSQL
Performance
Data Modeling