CS585 Database Systems

Data

raw facts (not yet been processed)
building blocks of information
managed through data management

Information

produced by processing data
creates knowledge
accurate, relevant, and timely to facilitate sound decision-making

DB

shared and integrated computer structure that stores a collection of end-user data
includes metadata, which describes the data characteristics and relationships
integrated and managed with the end-user data

DBMS

DB management system
collection of programs that manage the DB structure and control access to its stored data
an intermediary between users and the DB
enables data sharing
presents an integrated view of the data to users
receives app requests and translates them into operations
goods. hides the DB's internal complexity from apps and users. better data integration and less data inconsistency. increased end-user productivity. improved data sharing, security, and access. improved decision-making. enhanced data quality by promoting accuracy, validity, and timeliness of data
bads. increased costs. management complexity. maintaining currency. vendor dependence. upgrade/replacement cycles

Types of DBs

single-user DB supports one user at a time
desktop DB runs on a PC
multiuser DB supports multiple users simultaneously
workgroup DB supports a small group of users or a specific department
enterprise DB supports numerous users across various departments
centralized DB has data at a single site
distributed DB is distributed across different sites
cloud DB is maintained using cloud data services that provide defined performance measures for the DB
general-purpose DBs contain a wide variety of data used in multiple disciplines
discipline-specific DBs contain data focused on specific subject areas
operational DB is designed to support a company's day-to-day operations
analytical DB stores historical data and business metrics used exclusively for tactical or strategic decision-making
data warehouse stores data in a format optimized for decision support
online analytical processing (OLAP) enables retrieving, processing, and modeling data from the data warehouse
business intelligence captures and processes business data to generate information that supports decision-making

Structural and Data Dependence

data dependence means data access changes when data storage characteristics change
data independence means data storage characteristics are changed without affecting the program's ability to access the data
practical significance of data dependence is the difference between logical and physical format

Data Redundancy

storing the same things again
island of Information means ` information scattered in different locations
Poor Data Security
Data Inconsistency
Likely to have entry errors for complex entries
-Data anomaly

Data Anomaly

when changes are made partially across redundant data
Update Anomalies
Insertion Anomalies
Deletion Anomalies

DBMS Functions

manages data dictionary (stores definitions of the data elements and their relationship)
manages data storage
performance tuning ensures efficient performance of the DB with storage and access speed
transforms and presents data to required data structures
manages security
multiuser access control
backup and recovery management
data integrity management (redundancy↓ and consistency↑)
DB access languages and APIs
DB communication interfaces (accepts end-user requests via multiple, different networks)
query language lets the user specify what must be done without having to specify how
structured query language (SQL) is the de facto query language and data access standard supported by the majority of DBMS vendors

Data Modeling

interactive and progressive process of creating a specific data model for a problem domain
data models are simple representations of complex real-world data structures
models are abstractions of real-world objects
data models
are communication tools
gives an overall view of a DB
organizes data
abstracts for the creation of good DB

Data Model Basic Building Blocks

entity is a unique and distinct object used to collect and store data. the entity has attributes (characteristics)
constraints are a set of rules to ensure data integrity
relationships describe associations among entities
one to many (1:M)
many to many (M:N or M:M)
one to one (1:1)

Naming Conventions

entity names are required to be descriptive of the objects in the business environment (use terminology that is familiar to the users)
attribute name is required to be descriptive of the data represented by the attribute
proper naming facilitates communication between parties and promotes self-documentation

Hierarchical Modeling

hierarchies are good for 1:M (tree) but not M:N (graph or multiple inheritance)
manage large amounts of data for complex manufacturing projects
represented by an upside-down tree that contains segments (the equivalent of a file system's record type)
goods. promotes data sharing. parent/child relationship promotes conceptual simplicity and data integrity. DB security is provided and enforced by DBMS. efficient with 1:M relationships
bads. requires knowledge of physical data storage characteristics. the navigational system requires knowledge of the hierarchical path. changes in structure require changes in all apps. implementation limitations No data definition. lack of standards

Network Modeling

captures M:N, looks like bipartite graphs
depicts both 1:M and M:N relationships
represent complex data relationships
improve DB performance and impose a DB standard
goods. conceptual simplicity. handles more relationship types. flexible data access. data owner/member relationship promotes data integrity. conforms to standards. includes data definition language and data manipulation language
bads. system complexity limits the efficiency. navigational system yields complex implementation, app dev, and management. structural changes require changes in all apps

Standard DB Concepts

schema is a conceptual organization of the entire DB as viewed by the DB administrator
subschema is a portion of the DB seen by the apps that produce the desired information from the data within the DB

Data Definition Language

enables the DB administrator to define the schema components

Data Manipulation Language

environment in which data can be managed and used to work with the data in the DB

Relational Model

based on a relation or table (matrix composed of intersecting tuples and attributes)
tuple: = rows
attribute = columns
describes a precise set of data manipulation constructs
goods. structural independence is promoted using independent tables. the tabular view improves conceptual simplicity. ad-hoc query capability is based on SQL. isolates the end user from physical-level details. improves implementation and management simplicity

RDBMS

performs basic functions provided by the hierarchical and network DBMS systems
makes the relational data model easier to understand and implement
hides the complexities of the relational model from the user

SQL-Based Relational DB Application

end-user interface allows the end user to interact with the data
collection of tables stored in the DB
each table is independent of another
rows in different tables are related based on common values in common attributes
SQL engine executes all queries

Entity Relationship Model

graphical representation of entities and their relationships in a DB structure
entity relationship diagram uses graphic representations to model DB components
entity instance or entity occurrence are rows in the relational table
connectivity is a term used to label the relationship types
goods. conceptual simplicity. effective communication tool. integrated with the dominant relational model
bads. limited constraint representation. limited relationship representation. no data manipulation language. loss of information content occurs when attributes are removed from entities to avoid crowded displays

External Model

end users' view of the data environment
er diagrams are used to represent the external views
external schema specifies the representation of an external view

Conceptual Model

Represents a global view of the entire DB by the whole organization
conceptual schema is the basis for the identification and high-level description of the primary data objects
has a macro-level view of the data environment
is software and hardware independent
logical design has the task of creating a conceptual data model

Internal Model

representing DB as seen by the DBMS mapping conceptual model to the DBMS
uses internal schema, a specific representation of an internal model
is software-dependent, but hardware-independent
has logical independence (changing the internal model without affecting the conceptual model)

Physical Model

operates at the lowest level of abstraction
describes the way data are saved on storage media such as disks or tapes
requires the definition of physical storage and data access methods
relational model aimed at the logical level
does do not require physical-level details
has physical independence (changes in the physical model do not affect the internal model)

Attributes

characteristics of entities
required attribute must have a value
optional attribute does not require a value
domains mean sets of possible values for a given attribute
identifiers (keys) are one or more attributes that uniquely identify each entity instance
composite identifiers are primary keys composed of more than one attribute
compound attributes are attributes that can be subdivided to yield additional attributes
simple attributes are attributes that cannot be subdivided
single-valued attributes are attributes those has only a single value
multivalued attributes are attributes that have many values and require creating
several new attributes, one for each component of the original multivalued attribute
a new entity composed of the original multivalued attribute's components
derived attributes are those whose value is calculated from other attributes

Relationships

are associations between entities that always operate in both directions
participants mean entities that participate in a relationship
connectivity describes the relationship classification
cardinality expresses the minimum and maximum number of entity occurrences associated with one occurrence of related entity

Connectivity and Cardinality

connectivity describes the types of relationships between entities in an ER model (1:1) (1:N) (M:N)
cardinality defines the numerical aspects of the relationship between entities by specifying the minimum and maximum number of entity instances that can participate in a relationship (min, max) for (1:1), (1:M), or (M:N)
sometimes min is called modality
Confusingly, the # rows in a table is ALSO called the table's cardinality (and the # of columns is called the table's degree).
Also confusingly, (1:1), (1:M), (M:N) are called cardinality ratios

Existence Dependence

weak (non-identifying) entity exists in the DB only when it is associated with another related entity occurrence
the primary key of the related entity does not contain a primary key component of the parent entity
strong (regular, identifying) entity exists apart from all of its associated entities
the primary key of the related entity contains a primary key component of the parent entity
How do you tell if one is strong or weak?
if any of the PK is FK, then it is existence-dependent
thus weak

Weak Entity Conditions

existence-dependent
has a primary key that is partially or totally derived from the parent entity in the relationship

Relationship Participation

Optional participation
one entity occurrence does not require a corresponding entity occurrence in a particular relationship
mandatory participation
one entity occurrence requires a corresponding entity occurrence in a specific relationship

Relationship Degree

indicates the number of entities or participants associated with a relationship
unary relationship means the association is maintained within a single entity
recursive relationship means a relationship exists between occurrences of the same entity set
binary relationship means two entities are associated
ternary relationship means three entities are associated

Associative Entities

Also known as a composite or bridge entities
used to represent an M:N relationship between two or more entities
is in a 1:M relationship with the parent entities
composed of the primary key attributes of each parent entity
may also contain additional attributes that play no role in the connective process

Developing an ER Diagram

create a detailed narrative of the organization's description of operations
identify business rules based on the descriptions
identify main entities and relationships from the business rules
develop the initial ERD
identify the attributes and primary keys that adequately describe entities
revise and review ERD

Extended Entity Relationship Model (EERM)

entity supertypes are generic entity types related to one or more entity subtypes
contains common characteristics
entity subtype contains unique characteristics of entity subtypes

Specialization Hierarchy

depicts an arrangement of higher-level entity supertypes and lower-level entity subtypes
relationships are described in terms of "is-a" relationships
subtype exists within the context of a supertype
every subtype has one supertype to which it is directly related
supertype can have many subtypes

Inheritance

enables an entity subtype to inherit attributes and relationships of the supertype
all entity subtypes inherit their primary key attribute from their supertype
at the implementation level, the supertype and its subtype(s) maintain a 1:1 relationship
entity subtypes inherit all relationships in which the supertype entity participates
lower-level subtypes inherit all attributes and relationships from their upper-level supertypes

Subtype Discriminator

attribute in the supertype entity that determines to which entity subtype the supertype occurrence is related
default comparison condition is the equality comparison

Disjoint and Overlapping Subtypes

disjoint subtypes contain a unique subset of the supertype entity set (nonoverlapping subtypes)
implementation is based on the value of the subtype discriminator attribute in the supertype
overlapping subtypes contain nonunique subsets of the supertype entity set
implementation requires the use of one discriminator attribute for each subtype

Completeness Constraint

specifies whether each supertype occurrence must also be a member of at least one subtype
partial completeness means not every supertype occurrence is a member of a subtype
total completeness means every supertype occurrence must be a member of any

Determination

state in which knowing the value of one attribute makes it possible to determine the value of another
is the basis for establishing the role of a key
based on the relationships among the attributes

Dependencies

functional dependence value of one or more attributes determines the value of one or more other attributes
determinant is an attribute whose value determines another
dependent is an attribute whose value is determined by the other attribute
Full functional dependence means an entire collection of attributes in the determinant is necessary for the relationship

Type of Keys

composite keys are composed of more than one attribute
key attributes are a part of a key
entity integrity are conditions where each row in the table has its own unique identity
all of the values in the primary key must be unique
no key attribute in the primary key can contain a null
a table must have entity integrity
null means an absence of any data value
unknown, or missing, or inapplicable
referential integrity means every reference to an entity instance by another entity instance is valid
superkey is an attribute or combination of attributes that uniquely identifies each row in the table
candidate key is a minimal (irreducible) superkey; a superkey that does not contain a subset of attributes that is itself a superkey
primary key is a candidate key selected to identify all other attribute values in any given row uniquely; it cannot contain null entries
foreign key is an attribute or combination of attributes in one table whose values must either match the primary key in another table or be null
secondary key is an attribute or combination of attributes used strictly for data retrieval purposes
natural keys are keys that are created from real-world entities (e.g., for a US resident, their SSN could be a natural key)
surrogate keys (make a brand new unique keys)
secondary, or 'alternate' keys

NULL

NOT NULL means placed on a column to ensure that every row in the table has a value for that column
UNIQUE means restriction placed on a column to ensure that no duplicate values exist for that column

Relational Algebra

theoretical way of manipulating table contents using relational operators
relvar is a variable that holds a relation
heading contains the names of the attributes, and the body includes the relation
relational operators have the property of closure
closure is a use of relational algebra operators on existing relations to produce new relations
select is a unary operator that yields a horizontal subset of a table
project is a unary opera that yields a vertical subset of a table
union combines all rows from two tables, excluding duplicate rows
union-compatible means tables share the same number of columns, and their corresponding columns share compatible domains
intersect yields only the rows that appear in both tables
tables must be union-compatible to yield valid results
difference yields all rows in one table that are not found in the other table
tables must be union-compatible to yield valid results
product yields all possible pairs of rows from two tables
divide uses one 2-column table as the dividend and one single-column table as the divisor, outputs a single column that contains all values from the second column of the div
join allows information to be intelligently combined from two or more tables
natural join links tables by selecting only the rows with shared values in their common attributes
equijoin: Links tables based on an equality condition that compares specified columns of each table
theta join is an extension of natural join, denoted by adding a theta subscript after the JOIN symbol
inner join only returns matched records from the tables that are being joined
outer join has matched pairs retained, and unmatched values in the other table are left null
left outer join yields all of the rows in the first table, including those that do not have a matching value in the second table
right outer join yields all of the rows in the second table, including those that do not have matching values in the first table

Data Dictionary and the System Catalog

data dictionary describes all tables in the DB created by the user and designer
system catalog: system data dictionary that describes all objects within the DB
avoid homonyms and synonyms
homonym: the same name is used to label different attributes
synonym: different names are used to describe the same attribute

Normalization Process

ensures that all tables are in at least 3NF
higher forms are not likely to be encountered in a business environment
works one relation at a time
starts by:
identifying the dependencies of a relation (table)
progressively breaking the relationship into a new set of relations

Types of Functional Dependencies

Partial dependency: Functional dependence in which the determinant is only part of the primary key
Assumption - One candidate key
Straight forward
Easy to identify
Transitive dependency: An attribute functionally depends on another non-key attribute

0NF → 1NF: eliminate repeating groups

Repeating group: A group of multiple entries of the same type can exist for any single key attribute occurrence
Existence proves the presence of data redundancies
Enable reducing data redundancies
Steps
Eliminate the repeating groups
Identify the primary key
Identify all dependencies
Create a dependency diagram showing relationships (dependencies) between the attributes - this will help us systematically normalize the table.

1NF Result

All key attributes are defined
There are no repeating groups in the table
All attributes are dependent on the primary key
All relational tables satisfy 1NF requirements
Some tables contain partial dependencies
Subject to data redundancies and various anomalies

1NF → 2NF: remove partial dependencies

remove not-so-relevant things = remove things that depend on PK
make new tables to eliminate partial dependencies = reassign corresponding dependent attributes
the table is in 2NF when it:
is in 1NF
includes no partial dependencies

2NF → 3NF: remove transitive dependencies

remove really, really not-so-relevant things = remove things that depend on regular columns
We promote the non-prime keys masquerading as PKs into actual PKs (give them their tables).
Whether we eliminate partial dependencies (to create 2NF) or transitive ones (to create 3NF), we follow the same process: create a new relation for each problem dependency!
The table is in 3NF when it:
is in 2NF
contains no transitive dependencies

Normalization: summary

We do this because if we don't,
data updates are less efficient because tables are larger
indexing is more cumbersome
no simple strategies for creating virtual tables known as views
1NF: eliminate repeating groups (partial:y, transitive:y)
2NF: eliminate redundant data (partial:n, transitive:y)
3NF: eliminate fields not dependent on key fields (partial:n, transitive:n)

Common SQL Data Types

Numeric (NUMBER(L,D) or NUMERIC(L,D))
Character (CHAR(L), VARCHAR(L) or VARCHAR2(L))
Date (DATE)

Primary Key and Foreign Key

Primary key attributes contain both a NOT NULL and a UNIQUE specification
RDBMS will automatically enforce referential integrity for foreign keys
Command sequences end with semicolons
ANSI SQL allows the use of the following clauses to cover CASCADE, SET NULL, or SET DEFAULT
ON DELETE and ON UPDATE

SQL Constraints

NOT NULL: column does not accept nulls
UNIQUE: all values in columns are unique
DEFAULT: Assigns value to attribute when a new row is added to the table
CHECK: Validates data when the attribute value is entered

Data Manipulation Commands

INSERT: Command to insert data into the table
Syntax - INSERT INTO tablename VALUES();
Used to add table rows with NULL and NOT NULL attributes
COMMIT: Command to save changes
Syntax - COMMIT [WORK];
Ensures DB update integrity
SELECT: Command to list the contents
Syntax - SELECT columnlist FROM tablename;
Wildcard character(*)
UPDATE: Command to modify data
Syntax - UPDATE tablename SET columnname = expression [, columnname = expression] [WHERE conditionlist];
WHERE condition
Specifies the rows to be selected
ROLLBACK: Command to restore the DB
Syntax - ROLLBACK;
Undoes the changes since the last COMMIT
DELETE: Command to delete
Syntax - DELETE FROM tablename
[WHERE conditionlist];
BETWEEN
Checks whether the attribute value is within a range
IS NULL
Checks whether the attribute value is null
LIKE
Checks whether attribute value matches given string pattern
IN
Checks whether the attribute value matches any value within a value list
EXISTS
Checks if the subquery returns any rows
ALTER TABLE (Used to add/remove table constraints)
ADD - Adds a column
MODIFY - Changes column characteristics
DROP - Deletes a column
DROP TABLE: Deletes table from DB
ORDER BY clause is useful when listing order is important
SELECT columnlist FROM tablelist
[WHERE conditionlist]
[ORDER BY columnlist [ASC | DESC]];
DISTINCT clause: Produces a list of unique values
MAX, MIN, SUM, AVG
Arithmetic operators perform in the order of:
Operations within parentheses
Power operations
Multiplications and divisions
Additions and subtractions
VIEWs are virtual tables

INNER JOIN and OUTER JOIN

INNER JOIN is
OUTER JOIN is
RIGHT OUTER JOIN brings in the right DB's elements

Relational Set Operators

SQL data manipulation commands are set-oriented
set-oriented: Operate over entire sets of rows and columns at once
UNION, INTERSECT, and Except (MINUS) work properly when relations are union-compatible
Union-compatible: The number of attributes is the same, and their corresponding data types are alike
UNION
Combines rows from two or more queries without including duplicate rows

Persistent Stored Module

block of code containing standard SQL statements and procedural extensions that are stored and executed at the DBMS server

Triggers

Procedural SQL code is automatically invoked by RDBMS when a given data manipulation event occurs.

Stored Procedures

Named collection of procedural and SQL statements = Advantages
Reduce network traffic and increase performance
Reduce code duplication using code isolation and code sharing

Stored Function

Named group of procedural and SQL statements that returns a value
As indicated by a RETURN statement in its program code
Can be invoked only from within stored procedures or triggers

Transaction

A logical unit of work that must be entirely completed or aborted

ACID(S)

Atomicity: All operations of a transaction must be completed. If not, the transaction is aborted.
Consistency: The permanence of the DB's consistent state
Isolation: Data used during a transaction cannot be used by the second transaction until the first is completed
Durability: Ensures that once transactions are committed, they cannot be undone or lost
Serializability: Ensures that the schedule for the concurrent execution of several transactions should yield consistent results

Concurrency Control

Coordination of the execution of the simultaneous transaction in a multiuser DB system
Ensures serializability of transactions in a multiuser DB environment
Lost updates occur in two concurrent transactions when the same data element is updated, but one of the updates is lost
Uncommitted data occurs when two transactions are executed concurrently. The first transaction is rolled back after the second transaction has already accessed uncommitted data
Inconsistent retrievals happen when a transaction accesses data before and after one or more other transactions finish working with such data

Scheduler

establishes the order in which the operations are executed within concurrent transactions
interleaves the execution of DB operations to ensure serializability and isolation of transactions
bases on concurrent control algorithms to determine the appropriate order
creates serialization schedule
serializable schedule: interleaved execution of transactions yields the same results as the serial execution of the transactions
Lock guarantees exclusive use of a data item to a current transaction
pessimistic locking use of locks based on the assumption that conflict between transactions is likely
lock manager is responsible for assigning and policing the locks used by the transactions

Lock Granularity

DB-level lock
Table-level lock
Page-level lock
Page or diskpage: Directly addressable section of a disk
Row-level lock
Field-level lock

Lock Types

Binary Lock has two states: locked (1) and unlocked(0)
Exclusive Lock exists when access is reserved for the transaction that locked the object
A Shared Lock exists when concurrent transactions are granted read access based on a common lock

Three Lock States

Using the shared/exclusive concept, there are THREE lock states
unlocked
shared (read) issued when a transaction wants to READ data, and no exclusive lock is held (on a data item)
exclusive (write) issued when a transaction seeks to WRITE data, and no lock is held (on a data item)

Two-Phase Locking (2PL)

Defines how transactions acquire and relinquish locks
Guarantees serializability but does not prevent deadlocks
Phases
Growing phase - Transaction acquires all required locks without unlocking any data
Shrinking phase - Transaction releases all locks and cannot obtain any new locks.
Two transactions cannot have conflicting locks
No unlock operation can precede a lock operation in the same transaction
No data are affected until all locks are obtained

Deadlocks (Deadly Embrace)

Occurs when two transactions wait indefinitely for each other to unlock data
Deadlock prevention. A transaction requesting a new lock is aborted when there is the possibility that a deadlock can occur. If the transaction is aborted, all changes made by this transaction are rolled back, and all locks obtained by the transaction are released. The transaction is then rescheduled for execution. Deadlock prevention works because it avoids the conditions that lead to deadlocking.
Deadlock detection. The DBMS periodically tests the DB for deadlocks. If a deadlock is found, the "victim" transaction is aborted (rolled back and restarted), and the other transaction continues.
Deadlock avoidance. The transaction must obtain all of the locks it needs before it can be executed. This technique avoids the rolling back of conflicting transactions by requiring that locks be obtained in succession. However, the serial lock assignment required in deadlock avoidance increases action response times.

Time Stamping

Assigns global, unique time stamp to each transaction
Produces explicit order in which transactions are submitted to DBMS
Uniqueness: Ensures no equal time stamp values exist
Monotonicity: Ensures time stamp values always increase

Phases of Optimistic Approach

Read
Transaction:
Reads the DB
Executes the needed computations
Makes the updates to a private copy of the DB values
Validation
The transaction is validated to ensure that the changes made will not affect the integrity and consistency of the DB.
Write
Changes are permanently applied to the DB.

DDBMS

Distributed DB management system (DDBMS): Governs storage and processing of logically related data over interconnected computer systems
goods. Data are located near the greatest demand site. Faster data access and processing. Growth facilitation. Improved communications. Reduced operating costs. User-friendly interface. Less danger of a single-point failure. Processor independence
bads. Complexity of management and control. Technological difficulty. Security. Lack of standards. Increased storage and infrastructure requirements. Increased training cost. Costs incurred due to the requirement of duplicated infra. Remote access is provided on a read-only basis. Restrictions on the number of remote tables accessed in a single transaction. Restrictions on the number of distinct DBs that may be accessed. Restrictions on the DB model that may be accessed. Concurrency control is important in a distributed DB environment

DDBMS Components

Computer workstations or remote devices
Network hardware and software components
Communications media
Transaction processor (TP): Software component of a system that requests data
Known as transaction manager (ITM) or application processor (AP)
Data processor (DP) or data manager (DM)
Software component on a system that stores and retrieves data from its location

Data & Processing Distribution: 3 variations

Single-Site Processing, Single-Site Data
Processing is done on a single host computer
Data stored on the host computer's local disk
Processing is restricted on the end user's side
Dumb terminals access DBMS
Multiple-Site Processing, Single-Site Data
Multiple processes run on different computers, sharing a single data repository.
Require network file server running conventional applications
Accessed through LAN
Client/server architecture (Reduces network traffic, Processing is distributed, Supports data at multiple sites.
Multiple-Site Processing, Multiple-Site Data
Fully distributed DB management system
Support multiple data processors and transaction processors at various sites.
Classification of DDBMS depending on the level of support for various types of DBs
Homogeneous: Integrate multiple instances of the same DBMS over a network
Heterogeneous: Integrate different types of DBMSs
Fully heterogeneous: Support different DBMSs, each supporting a different data model

Two-Phase Commit Protocol (2PC)

Phase 1 - Prepare Phase (Voting Phase):
The coordinator (a designated process or system managing the transaction) sends a prepare message to all participants asking them if they can commit the transaction.
Each participant executes the transaction up to the point where it will commit and lock the transaction resources to ensure data integrity. Still, it does not make the changes permanent yet.
Participants respond with a vote: "Yes" if they can commit (indicating they have successfully prepared and locked the resources without any issues) or "No" if they cannot (due to a failure or conflict).
Phase 2 - Commit Phase (Decision Phase):
If all participants vote "Yes": The coordinator sends a commit message to all participants. Each participant makes the transaction permanent (commits the changes) and releases any locked resources. Participants send an acknowledgment to the coordinator after committing.
If any participant votes "No": The coordinator sends an abort message to all participants. Each participant undoes any changes if necessary (rolls back the transaction) and releases any locked resources. Participants send an acknowledgment to the coordinator after aborting.

Distribution Transparency

Fragmentation transparency: The end user does not know the data is fragmented.
Location transparency: The end user does not know where fragments are located.
Location mapping transparency: The end user does not know how fragments are mapped.

Transaction Transparency

Ensures DB transactions will maintain the distributed DB's integrity and consistency
Ensures transaction is completed only when all DB sites involved complete their part
Distributed DB systems require complex mechanisms to manage transactions

Performance and Failure Transparency

Performance transparency: Allows a DDBMS to perform as if it were a centralized DB
Failure transparency: Ensures the system will operate in case of network failure
Replica transparency: DDBMS's ability to hide multiple copies of data from the user

Distributed DB Design

Data fragmentation: How to partition the DB into fragments
Breaks a single object into many segments
Information is stored in a distributed data catalog (DDC)
Horizontal fragmentation: Division of a relation into subsets (fragments) of tuples (rows)
Vertical fragmentation: Division of a relation into attribute (column) subsets
Mixed fragmentation: Combination of horizontal and vertical strategies
Data replication: Which fragments to replicate
Data copies stored at multiple sites served by a computer network
Mutual consistency rule: Replicated data fragments should be identical
Helps restore lost data
Data allocation: Where to locate those fragments and replicas
Centralized data allocation (Entire DB stored at one site)
Partitioned data allocation (DB is divided into two or more disjoined fragments and stored at two or more sites)
Replicated data allocation (Copies of one or more DB fragments are stored at several sites)

CAP Theorem

Consistency: always correct data.
Availability: requests are always filled.
Partition ("outage") tolerance: continue to operate even if (some/most) nodes fail.
In today's "BASE" (Basically Available, Soft_state, Eventually Consistent) model of non-relational (e.g., NoSQL) DBs, we prefer to sacrifice consistency in favor of availability. Data changes are not immediate but propagate slowly through the system until all replicas are consistent.

CS585 Database Systems

On this page