목차
Database Systems
Data != information
Database (DB) and Database Management System (DBMS)
Advantages of DBMS
Types of DBs based on user count, location, content, data currency, and structure
Evolution of Database Systems
Early DBs: file systems
Problems with file systems
Structural dependence
Redundancy of data
Data anomalies
Database Systems
DB vs file system
DBMS functions
Data dictionary management
Data storage management
Data transformation and presentation
Security management
Multiuser access control
Backup and recovery management
Data integrity management
Interfaces and communication
Key Concepts
Data vs. Information
Data: raw facts, building blocks of information
Information: produced by processing data, reveals meaning, enables decision making
Database: shared, integrated computer structure that stores end-user data and metadata
DBMS: collection of programs that manages the database structure and controls access to data
Advantages of DBMS: better data integration, increased productivity, improved data sharing, security, access, decision making, and quality
Types of Databases
Based on user count: single-user, multiuser (workgroup, enterprise)
Based on location: centralized, distributed, cloud
Based on content: general-purpose, discipline-specific
Based on data currency: operational, analytical (data warehouse, OLAP)
Based on structure: unstructured, structured, semi-structured (XML)
File system problems
Structural dependence, data dependence
Data redundancy, data anomalies (update, insertion, deletion)
DBMS functions: data dictionary, storage, transformation, security, multiuser access, backup and recovery, integrity management, interfaces and communication
Disadvantages of DB systems: increased costs, management complexity, maintaining currency, vendor dependence, frequent upgrade/replacement cycles
Data Modeling
Data modeling: iterative process of creating a specific data model for a problem domain
Data models: simple representations of complex real-world data structures
Importance of data models: communication tool, overall view of the database
Data Model Building Blocks
Entity: a unique and distinct object used to collect and store data
Attribute: characteristic of an entity
Relationship: describes an association among entities (1:M, M:N, 1:1)
Constraint: set of rules to ensure data integrity
Business Rules
Brief, precise, and unambiguous description of a policy, procedure, or principle
Enable defining basic building blocks and describing characteristics of the data
Sources: company policy, department managers, written documentation, interviews
Help standardize the company's view of data and serve as a communication tool
Translate into data model components (nouns → entities, verbs → relationships)
Naming Conventions
Entity names: descriptive of objects in the business environment, familiar to users
Attribute names: descriptive of the data represented
Proper naming facilitates communication and promotes self-documentation
Hierarchical and Network Models
Hierarchical: manages large amounts of data, represents 1:M relationships
Network: represents complex data relationships, depicts 1:M and M:N relationships
Relational Model
Based on a relation (table) composed of tuples (rows) and attributes (columns)
Advantages: structural independence, conceptual simplicity, ad hoc query capability
Relational Database Management System (RDBMS): performs basic functions, hides complexities
Entity-Relationship (ER) Model
Graphical representation of entities and their relationships in a database structure
Entity Relationship Diagram (ERD): uses graphic representations to model components
Advantages: visual modeling yields conceptual simplicity, effective communication tool
Disadvantages: limited constraint and relationship representation, no data manipulation language
Data Abstraction Levels
External Model: end users' view of the data environment, represented by ER diagrams
Conceptual Model: global view of the entire database, software and hardware independent
Internal Model: database as seen by the DBMS, software-dependent and hardware-independent
Physical Model: describes how data are saved on storage media, requires definition of physical storage and access methods
Entity Relationship Model (ERM)
Basis of an entity relationship diagram (ERD)
ERD depicts conceptual database as viewed by end users, including entities, attributes, and relationships
Attributes
Characteristics of entities
Required attribute: must have a value
Optional attribute: can be left empty
Domain: set of possible values for an attribute
Identifiers: one or more attributes that uniquely identify each entity instance
Composite identifier: primary key composed of more than one attribute
Attribute types
Composite attribute: can be subdivided into additional attributes
Simple attribute: cannot be subdivided
Single-valued attribute: has only a single value
Multivalued attribute: has many values, may require creating new attributes or entities
Derived attribute: value calculated from other attributes
Relationships
Association between entities that operates in both directions
Participants: entities that participate in a relationship
Connectivity: describes the relationship classification
Cardinality: expresses the minimum and maximum number of entity occurrences associated with one occurrence of a related entity
Existence dependence
Existence dependent: entity exists only when associated with another related entity occurrence
Existence independent: entity exists apart from related entities (strong or regular entity)
Relationship strength
Weak (non-identifying) relationship: primary key of related entity does not contain primary key component of parent entity
Strong (identifying) relationship: primary key of related entity contains primary key component of parent entity
Weak entity: existence-dependent and has a primary key partially or totally derived from parent entity
Relationship participation
Optional participation: entity occurrence does not require corresponding entity occurrence in a relationship
Mandatory participation: entity occurrence requires corresponding entity occurrence in a relationship
Relationship degree: number of entities or participants associated with a relationship
Unary relationship: association maintained within a single entity
Recursive relationship: relationship exists between occurrences of the same entity set
Binary relationship: two entities are associated
Ternary relationship: three entities are associated
Associative Entities
Also known as composite or bridge entities
Represent M:N relationships between two or more entities
In a 1:M relationship with parent entities
Composed of primary key attributes of each parent entity
May contain additional attributes that play no role in the connective process
Developing an ER Diagram
Create a detailed narrative of the organization's description of operations
Identify business rules based on the descriptions
Identify main entities and relationships from the business rules
Develop the initial ERD
Identify attributes and primary keys that adequately describe entities
Revise and review ERD
Extended Entity Relationship Model (EERM)
Result of adding more semantic constructs to the original entity relationship (ER) model
EER diagram (EERD): uses the EER model
Entity Supertypes and Subtypes
Entity supertype: generic entity type related to one or more entity subtypes, contains common characteristics
Entity subtype: contains unique characteristics of each entity subtype
Criteria for usage:
Different, identifiable kinds of the entity in the user's environment
Different kinds of instances should have one or more unique attributes
Specialization Hierarchy
Depicts arrangement of higher-level entity supertypes and lower-level entity subtypes
Relationships described in terms of "is-a" relationships
Subtype exists within the context of a supertype
Every subtype has one directly related supertype, while a supertype can have many subtypes
Provides means to support attribute inheritance, define subtype discriminator, and define disjoint/overlapping and complete/partial constraints
Inheritance
Enables an entity subtype to inherit attributes and relationships of the supertype
All entity subtypes inherit primary key attribute from their supertype
At implementation level, supertype and subtype(s) maintain a 1:1 relationship
Entity subtypes inherit all relationships in which supertype entity participates
Lower-level subtypes inherit all attributes and relationships from upper-level supertypes
Subtype Discriminator
Attribute in the supertype entity that determines to which entity subtype the supertype occurrence is related
Default comparison condition is the equality comparison
Disjoint and Overlapping Constraints
Disjoint subtypes: contain a unique subset of the supertype entity set (nonoverlapping subtypes)
Implementation based on the value of the subtype discriminator attribute in the supertype
Overlapping subtypes: contain nonunique subsets of the supertype entity set
Implementation requires the use of one discriminator attribute for each subtype
Completeness Constraint
Specifies whether each supertype occurrence must also be a member of at least one subtype
Types:
Partial completeness: not every supertype occurrence is a member of a subtype
Total completeness: every supertype occurrence must be a member of any subtype
Relational Modeling
Relational Tables
Logical view: 'relation'
Key concepts: keys, determinants, dependents, dependencies
Keys
Primary key (PK): attribute(s) that uniquely identify a row
Composite key: key composed of multiple attributes
Entity integrity: each row has a unique identity, no null values in PK
Types of keys: superkey, candidate key, primary key, foreign key, secondary key
Superkey: attribute(s) that uniquely identify a row
Candidate key: irreducible superkey
Foreign key: attribute(s) in one table that match the PK in another table
Secondary key: attribute(s) used for data retrieval purposes
Functional Dependency
Value of one or more attributes determines the value of other attributes
Determinant: attribute whose value determines another
Dependent: attribute whose value is determined by another attribute
Full functional dependence: entire collection of attributes in the determinant is necessary for the relationship
Nulls and Referential Integrity
Null: absence of a data value
Referential integrity: every reference to an entity instance by another entity instance is valid
Handling nulls: flags, NOT NULL constraint, UNIQUE constraint
Relational Algebra
Theoretical way of manipulating table contents using relational operators
Relvar: variable that holds a relation
Closure: use of relational algebra operators on existing relations produces new relations
Relational set operators:
SELECT: outputs a subset of rows
PROJECT: outputs a subset of columns
UNION: combines all rows from two tables, excluding duplicates
INTERSECT: yields only rows that appear in both tables
DIFFERENCE: yields all rows in one table that are not found in the other
PRODUCT: yields all possible pairs of rows from two tables
Joins
Combine information from two or more tables
Types of joins:
Natural join: links tables by selecting rows with common values in common attributes
Equijoin: links tables based on an equality condition comparing specified columns
Theta join: extension of natural join with a theta subscript
Inner join: returns only matched records from joined tables
Outer join: retains matched pairs and leaves unmatched values as null
Left outer join: yields all rows in the first table, including unmatched ones
Right outer join: yields all rows in the second table, including unmatched ones
Data Dictionary and System Catalog
Data dictionary: description of all user-created tables in the database
System catalog: system data dictionary describing all objects within the database
Homonyms (same name for different attributes) and synonyms (different names for the same attribute) should be avoided
Database Normalization
Normalization: evaluating and correcting table structures to minimize data redundancies
Reduces data anomalies and assigns attributes to tables based on determination
Normal forms: 1NF, 2NF, 3NF, BCNF, 4NF
Higher normal forms are better than lower normal forms
Denormalization: produces a lower normal form, increases performance, and introduces greater data redundancy
The Need for Normalization
Used while designing a new database structure or improving an existing one
Analyzes relationships among attributes within each entity
Improves the existing data structure and creates an appropriate database design
The Normalization Process
Objective: ensure each table conforms to well-formed relations
Each table represents a single subject
No data item is unnecessarily stored in more than one table
All nonprime attributes are dependent on the primary key
Each table is void of insertion, update, and deletion anomalies
Ensures all tables are in at least 3NF
Works one relation at a time by identifying dependencies and progressively breaking the relation into new relations
Functional Dependence Concepts
Functional dependence: attribute B is fully functionally dependent on attribute A if each value of A determines one and only one value of B
Fully functional dependence (composite key): attribute B is fully functionally dependent on a composite key A if it is functionally dependent on A but not on any subset of A
Types of Functional Dependencies
Partial dependency: functional dependence in which the determinant is only part of the primary key
Transitive dependency: an attribute functionally depends on another nonkey attribute
First Normal Form (1NF):
Eliminate repeating groups, identify the primary key, and identify all dependencies
All relational tables satisfy 1NF requirements
Second Normal Form (2NF):
Make new tables to eliminate partial dependencies and reassign corresponding dependent attributes
Table is in 2NF when it is in 1NF and includes no partial dependencies
Third Normal Form (3NF):
Make new tables to eliminate transitive dependencies and reassign corresponding dependent attributes
Table is in 3NF when it is in 2NF and contains no transitive dependencies
Requirements for a Good Normalized Set of Tables
Evaluate PK assignments and naming conventions
Refine attribute atomicity (atomic attribute: cannot be further subdivided)
Identify new attributes and relationships
Refine primary keys as required for data granularity (level of detail represented by the values stored in a table's row)
Maintain historical accuracy and evaluate using derived attributes
Denormalization
Design goals: creation of normalized relations and processing requirements and speed
Joining a larger number of tables reduces system speed
Defects in unnormalized tables:
Data updates are less efficient because tables are larger
Indexing is more cumbersome
No simple strategies for creating virtual tables known as views
Structured Query Language (SQL)
SQL is a nonprocedural language with a basic command vocabulary set of less than 100 words
Differences in SQL dialects are minor
SQL is an implementation of Ed Codd's relational set operators (SELECT, PROJECT, JOIN, UNION, INTERSECT, DIFFERENCE, PRODUCT, DIVIDE)
SQL Data Types
Numeric: NUMBER(L,D) or NUMERIC(L,D)
Character: CHAR(L), VARCHAR(L) or VARCHAR2(L)
Date: DATE
Data Definition Language (DDL) Commands
CREATE SCHEMA, CREATE TABLE, NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, DEFAULT, CHECK, CREATE INDEX, CREATE VIEW, ALTER TABLE, CREATE TABLE AS, DROP TABLE, DROP INDEX, DROP VIEW
Data Manipulation Language (DML) Commands
INSERT, SELECT, COMMIT, UPDATE, ROLLBACK, DELETE
Comparison operators, logical operators, and special operators (BETWEEN, IS NULL, LIKE, IN, EXISTS)
SELECT with ORDER BY, DISTINCT, aggregate functions (MIN, MAX, SUM, AVG), and GROUP BY
Creating Table Structures
Use one line per column definition and spaces to line up attribute characteristics and constraints
Table and attribute names are capitalized
Syntax: CREATE TABLE tablename(...);
Primary Key and Foreign Key
Primary key attributes contain both NOT NULL and UNIQUE specifications
RDBMS enforces referential integrity for foreign keys
Use ON DELETE and ON UPDATE clauses to specify actions on foreign key changes
SQL Constraints
NOT NULL, UNIQUE, DEFAULT, CHECK
Data Manipulation Commands
INSERT: add rows to a table
SELECT: list table contents
UPDATE: modify data
DELETE: delete rows from a table
COMMIT: save changes
ROLLBACK: restore the database
Additional SELECT Query Keywords
ORDER BY: specify listing order
DISTINCT: produce list of unique values
Aggregate functions: MIN, MAX, SUM, AVG
GROUP BY: create frequency distributions
HAVING: extension of GROUP BY, applied to GROUP BY output
Advanced Data Definition Commands
ALTER TABLE: make changes to table structure (ADD, MODIFY, DROP)
DROP TABLE: delete a table from the database
Joining Database Tables
Retrieve data from multiple tables using equality comparisons between foreign and primary keys
Use table aliases to identify source tables
Recursive joins: join a table to itself using aliases
Advanced SQL
SQL Join Operators
Join types: natural join, join using, join on, outer join (left, right, full)
Relational join operation merges rows from two tables based on common values, equality/inequality, or outer join conditions
Subquery: query inside another query, can return single value, list of values, virtual table, or no value
WHERE subqueries: uses inner SELECT subquery on the right side of a WHERE comparison expression
IN and HAVING subqueries: compare a single attribute to a list of values or restrict the output of a GROUP BY query
FROM subqueries: uses SELECT subquery in the FROM clause to create a virtual table
Attribute list subqueries: uses subquery expressions in the SELECT attribute list
Correlated subquery: executes once for each row in the outer query, inner query references a column of the outer subquery
Multirow Subquery Operators