minami

Intro

DBMS system different purposes:

DBMS three major categories:

DBMS Architecture

DBMS use a client/server model, where DBS instances (nodes) take the role of servers and application instances take the role of clients

System Architecture

![[Pasted image 20241103183620.png]]

Memory Versus Disk-Based DBMS

Durability in Memory-Based Stores

Core Concepts

Basic Principles

Durability Mechanism

  1. Write-ahead Logging

    • Operations must be written to sequential log file
    • Essential for operation completion
    • Enables recovery capabilities
  2. Backup Copy Management

    • Maintained as sorted disk-based structure
    • Modifications often asynchronous
    • Applied in batches to reduce I/O operations
    • Used for recovery alongside logs

Checkpointing Process

Key Differences from Disk-Based Systems

Performance Considerations

Storage Structure Optimization

  1. Memory-Based Advantages

    • Fast pointer following
    • Quick random memory access
    • More data structure options
    • Greater optimization possibilities
  2. Disk-Based Characteristics

    • Uses specialized storage structures
    • Optimized for disk access
    • Often uses wide, short trees
    • Requires special handling for variable-size data

Use Cases

Suitable Scenarios

Practical Implementation

Notes

Important distinction: In-memory database performance advantages come from fundamental architectural differences, not just from avoiding disk access.

Column Versus Row-Oriented Database Management Systems

Basic Structure

Database Components

Field Characteristics

Storage Classification

Primary Classification Methods

Partitioning Methods

  1. Horizontal Partitioning

    • Stores values belonging to same row together
    • Row-oriented approach
  2. Vertical Partitioning

    • Stores values belonging to same column together
    • Column-oriented approach

![[Pasted image 20241103223418.png]] Figure 1-2: Visual representation of storage methods

Common DBMS Examples

Row-Oriented Systems

Traditional relational databases including:

Column-Oriented Systems

Pioneer open source implementations:

Visual Reference Notes

The provided image illustrates:

Row-Oriented Data Layout in DBMS

Basic Concept

Example Data Structure

| ID | Name  | Birth Date  | Phone Number    |
|----|-------|-------------|-----------------|
| 10 | John  | 01 Aug 1981 | +1 111 222 333 |
| 20 | Sam   | 14 Sep 1988 | +1 555 888 999 |
| 30 | Keith | 07 Jan 1984 | +1 333 444 555 |

Key Characteristics

Record Structure

Data Access Patterns

  1. Record Operations

    • Multiple fields read together
    • Written together (e.g., during registration)
    • Individual field modifications possible
  2. Spatial Locality

    • Improved by storing entire rows together
    • Beneficial for row-based access patterns

Storage Implications

Block-wise Access

Performance Considerations

Use Cases

Notes

Column-Oriented Data Layout in DBMS

Basic Concept

Data Representation

Logical View (Table Format)

| ID | Symbol | Date        | Price      |
|----|--------|-------------|------------|
| 1  | DOW    | 08 Aug 2018 | 24,314.65 |
| 2  | DOW    | 09 Aug 2018 | 24,136.16 |
| 3  | S&P    | 08 Aug 2018 | 2,414.45  |
| 4  | S&P    | 09 Aug 2018 | 2,232.32  |

Physical Layout (Column-Based Storage)

Symbol: 1:DOW; 2:DOW; 3:S&P; 4:S&P
Date:   1:08 Aug 2018; 2:09 Aug 2018; 3:08 Aug 2018; 4:09 Aug 2018
Price:  1:24,314.65; 2:24,136.16; 3:2,414.45; 4:2,232.32

Key Features

Storage Organization

Data Reconstruction

  1. Metadata Requirements

    • Need metadata at column level
    • Used to identify associated data points across columns
    • Important for:
      • Joins
      • Filtering
      • Multirow aggregates
  2. Identification Methods

    • Virtual IDs (implicit identifiers)
    • Uses value position/offset for mapping
    • More efficient than explicit keys

Advantages

  1. Query Efficiency

    • Single-pass column reading
    • No need to read entire rows
    • Avoids loading unnecessary columns
  2. Analytical Capabilities

    • Excellent for computing aggregates
    • Efficient for:
      • Finding trends
      • Computing averages
      • Complex analytical queries

Modern Implementations

Column-Oriented File Formats

Column-Oriented Stores

Use Cases

Historical Context

Distinctions and Optimizations: Row vs Column Stores

Key Concept

Performance Optimizations

Cache Utilization

  1. Column Store Benefits

    • Multiple values from same column read in one run
    • Improved cache utilization
    • Enhanced computational efficiency
  2. CPU Optimization

    • Modern CPUs can use vectorized instructions
    • Single CPU instruction processes multiple data points
    • Parallel processing capabilities

Compression Advantages

  1. Data Type Grouping

    • Similar data types stored together
    • Numbers with numbers
    • Strings with strings
  2. Compression Benefits

    • Better compression ratios
    • Type-specific compression algorithms
    • Optimized compression methods per data type

Selection Criteria

Row-Oriented Preference

Best for:

Column-Oriented Preference

Optimal for:

Decision Framework

Key Considerations

  1. Access Patterns

    • How is data typically accessed?
    • What percentage of columns are usually needed?
    • Frequency of full record retrieval
  2. Workload Type

    • Nature of typical queries
    • Balance between read and write operations
    • Analytical vs transactional processing
  3. Performance Requirements

    • Cache efficiency needs
    • Compression importance
    • CPU optimization requirements

Best Practices

Data Files and Index Files

Core Concepts

Database systems use specialized file organization rather than simple filesystem hierarchies to store and manage data efficiently. This approach offers several key advantages over flat files.

Key Advantages of Specialized File Organization

1. Storage Efficiency

2. Access Efficiency

3. Update Efficiency

File Organization Structure

Data Storage

Index System

File Separation

  1. Data Files

    • Store actual data records
    • Contain the primary information
  2. Index Files

    • Store record metadata
    • Used to locate records in data files
    • Typically smaller than data files

Page Organization

Record Management

Record Operations

  1. New Records (Insertions)

    • Represented as key/value pairs
    • Added to appropriate pages
  2. Updates

    • Also handled as key/value pairs
    • Modify existing record data
  3. Deletions

    • Modern systems use deletion markers (tombstones)
    • Contain deletion metadata:
      • Key
      • Timestamp

Garbage Collection