Need a database?
Think about designing first to achieve the desired model.
Avoid the black box syndrome. Black box syndrome is just the idea of thinking that databases are software used to save and retrieve data. It is “kind of” true in the sense that it does that. However, there is a phase in the movie Ender’s Game, that touches me every time I see it. “The way we win matters”.
Using the database to do what it is supposed to do, using all its advantages and features, will make development much easier, systems much faster and clients much happier.
It all starts here! This is a process that should take some time since it will require understanding of the request as well as the business and its rules and of the database server as well.
What is data modeling?
Data modeling is the process of creating a specific data model for a determined problem domain! It is the first step when designing the database and the end goal is to make objects inside the database that resemble real-world environments.
Let’s think of an organization and how entities correlate to each other in that organization. For example, an employee. In basic terms, an employee is an entity and has a set of attributes like name, age, and so forth.
This same employee works inside of another entity, called department. The department, also, has a name, purpose, and so forth. When modeling, we, as designers, need to find the attributes that belong to those entities and that are of interest to the organization. After that, we must find the relationship between those entities.
Data modeling should be used, to abstract out the complexities within real-world environments and simple terms make the organization easier to understand in a virtual world.
The data model process should be interactive and progressive. If done correctly, the future objects inside the database will resemble almost perfectly the organization and the rules, also, determined by the organization.
What is a business rule?
When starting the process of data modeling, the designer should have in mind the procedures, principles, and policies pertaining to that organization. These are what we call business rules. Every organization has a way to pass that information on to development.
It may be through the project manager, analyst, developers. It will really depend on the size of the company and how well structured it is.
These rules must be well understood and implemented to avoid data anomalies. Nowadays, database systems are the main virtual infrastructure of any organization. If these rules and structures are not applied correctly, anything to be built on top of it will suffer eventually.
Here are some rule’s examples.
- For an entity Employee, what is the minimum age for an employee?
- What are the qualifications the employee needs to be part of a department?
Through these rules we can identify constraints and relationships.
What is a constraint?
A constraint can be any type of restriction imposed on the attribute of an entity. In the example above, we asked for employee age.
What is a relationship?
Relationship determines the connection between two or more entities. An example would be an employee who belongs to a department?
What is Data Definition Language (DDL)?
This is the language that allows database professionals to define and create objects that will represent the data model inside the database. In SQL Server, these are the commands used in DDL
- CREATE
- ALTER
- DROP
- TRUNCATE
- COLLATIONS
- DISABLE TRIGGER
- ENABLE TRIGGER
- UPDATE STATISTICS
We will dive deep into the first three, since they are the ones mostly used day to day. We will also talk about TRUNCATE.
In SQL SERVER, the majority, if not all, structural objects can be created in in two different ways. It can be done manually using SQL Server Management Studio (SSMS) or it can be done by using SQL scripts, using the command line, SSMS or other tools out there.
SSMS is an extremely amazing tool to be used and in the real world you will see that being used to develop in SQL Server. SSMS is a free tool and should be used above any other unless there are a very good reason to not use it.
The power in any development is in the scripting. Scripting save us, developers, the time to have to redo it. It is important to know how to do something manually for those one-off tasks, but scripting is mandatory in the database world.
We will demonstrate both, as much as we can, manual and scripting.
Before we can create a database model, we need a database object to work on.
When we first Login to the database through SSMS, this is what we see.
To manually create our first database, we just need to go to right click Databases -> New Database
After that, we need to give the name of the database and click OK.
There are some configurations as you can see in the Database files area. As of now, we will not worry about that. This type of information should also be thought through based on location where the database files will live, how much storage amount, among other things that a database administrator must figure it out.
Well, that was simple, right?
How do we do that scripting?
We can create a SQL file, using a text editor, and run it using the command line to create it for us, or we can use SSMS to create the file and run it strait in the database through SSMS.
We will use SSMS for all demonstrations.
USE master
GO
CREATE DATABASE Basics
ON
(
NAME = N'Basic',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\Basics.mdf',
SIZE = 64MB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 64MB)
LOG ON
(
NAME = N'Basic_LOG',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\Basics_Log.ldf',
SIZE = 512MB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 64MB)
There are many different arguments that can be set up when creating a database. I only used few of them, but if you want a full list, you visit this site. Full list of arguments to create a database.
You will see two file extensions here .mdf and .ldf.
The extension .mdf is the database file where the main data information will be stored.
The extension .ldf as the name says it will have log information.
CREATE, CREATE, CREATE.
This is probably the most used DDL command when modeling a database structure. We will see that statement thought the course for any new object. We will start with tables, so that, we can build something useful and talk about it.
What are tables in databases?
Tables are objects that hold all the database data. They are two-dimensional in a row-column format.
In SQL Server, each table can have up to 1,924 columns and “unlimited” rows. Unlimited is based on how much storage is available in the server.
They can also be called relations. Relations can be used interchangeably since the connection between these tables are done by relations. These relations are present by attributes in different columns that relate to each other. Foreign keys make that happens.
They can also be called entities. They should represent entities, or “objects’ inside an organization.
There can be regular, temporary, and variable tables.
Regular tables are tables that will live in the database up until it is purposely dropped.
Temporary tables are tables that lives only during a session.
Variable tables are tables that lives during an application execution.
We will focus on regular!
We can also classify them as Heap or clustered tables.
Heap tables are tables without a clustered index.
Clustered tables have one and only one clustered index.
When a table has a clustered index, the order of the data inside the table will be based on the attribute of the clustered index. Since the order of that table can only be established once based on the column, we can only have one clustered index.
When designing a database structure, there are some conceptual table types, that help us better model the data.
Conceptual types
- Core Tables
- Lookup Tables
- List Tables
- Detail Tables
- Cross-Reference Tables
Core Tables
These tables are the core of the business, in other words, the business will not function without them. To identify them, you must think about what the business needs in a day-to-day operation. For example, hospitals will have patients coming every day and all day long. So, Patient would be a core table. Same with doctors, address, prescriptions, and so forth.
One good way to identify them is to see how often rows will need to be inserted, updated, or deleted. If the answer is all day and every day. There is a good chance that this object will be a core table.
Some characteristics that should be implemented in a core table are an auto-numbered surrogate key, many different columns, and few non-clustered indexes. Business rules should be applied as much as possible in these tables.
Lookup Tables
This table will have very few columns and very few rows. The surrogate key should not be auto-numbered but should be a numeric type. Since it will hold very few records, the primary key should be of type TINYINT.
This table is used for classification purposes.
Examples are for categories, types, tags, etc.
This table comes with a set of records that will not be added, updated, or deleted any time soon. Examples are Gender. There will be only two or three records. They could be Male, Female, or Unknown.
List Table
This table can be a mix of a lookup table and a core table. In fact, some designers ignore this table and just call them core tables. They are similar to lookup tables because they will have a few columns. They are similar to the core table because records can always be inserted, updated, or deleted.
For that reason, it should have an auto-numbered surrogate key. It can have a few non-clustered indexes, but it should have only 1 for the main attribute.
One good example of this type of table is a Phone table. The phone number is likely to be the main column and that column should have a non-clustered index.
Detail Table
This table will hold specifics of the Core table. It should have only one reference per core table. This table should also have attributes that are not always needed to be retrieved in a core table. It can have an auto-numbered surrogate key; however, I prefer to use the same key from the core table.
Cross-Reference Table
This table holds columns from 2 or more tables. It can have an auto-numbered surrogate key. This table will eventually be huge vertically.
An example would be a table called Order. This table would have PersonId from a core table and ProductID from another core table.
Creating a table
Creating a table manually
Once connected to SSMS click on the + sign of the Server
Click on the + sign of the database Basics
Click on the + sign of the Tables
Notice that we will see few folders and they are all empty. The table we will create will be showing inside the Tables folder but outside of the other folders.
Right click on Tables->New-Table
We will create only one column. however, I urge you to play a little here with datatypes and column properties.
Here is an example of what was created.
Navigate around and click on places where you may be curios to see what happens. When scripting this, we will talk more about what is happening.
Click on the save button and give a name to the table.
Refresh the server and we will see the new table created.
To DROP this table, we just need to right-click on the table and click the delete button.
Again, we may never use this because as you can see it can be very time consuming when creating hundreds of tables. The best approach is to do it using a SQL script with DDL commands.
Creating a table with scripts
USE Basics
GO
CREATE TABLE Person
( PersonId INT IDENTITY (1, 1)
, PersonFirstName NVARCHAR(40) NOT NULL
, PersonMiddleName NVARCHAR (100) NOT NULL
, PersonLastName NVARCHAR(50) NOT NULL
, PersonCreatedBy INT NOT NULL
, PersonUpdatedBy INT NULL
, PersonCreatedDate DATETIME2 NOT NULL
)
GO
Execute that in SSMS and refresh the server again. you will see the new table in place.
This time, since we want to create a table inside the Basics database, we need to specify the USE Basics command.
Best practices when create tables:
- Table name should be singular
- Column names should carry the table and column names. Person is the table name and FirstName is the column name.
- Use indentation for each column
- Understand datatypes and which one to use it
- Declare constraints explicitly
- Constraints should be named
Create Table Anatomy
DDL Command – CREATE
Object – Table
DDL Command - CREATE
Object - Table
Table Name - Person
Open Delimiter - (
Column name - PersonID
Column Data type - INT
Property - Identity (This is used usually on the primary key column only.
Seed - 1
Increment - 1
Close delimiter - )
We will use the following example to go beyond a little bit more.
--DDL --Object --tablename
CREATE TABLE aTable
--delimiter
(
--columname --ColumnType --Constraint --Property --seed,increment --comma
aTableId INT PRIMARY KEY IDENTITY (1, 1),
--columname --ColumnType -- Constraint
--comma
aTableName NVARCHAR(20) NOT NULL,
--columname --ColumnType -- Constraint
--comma
ColumnToBeDropped NVARCHAR(20) NULL,
--columname --ColumnType -- Constraint --Value
aDefaultTableColumn NVARCHAR(20) DEFAULT 'Table Column',
--columname --ColumnType -- Constraint
--comma
aColumnCheck INT CHECK (aColumnCheck > 10),
--columname --ColumnType -- Constraint
--comma
aColumnUnique INT UNIQUE,
--columname --ColumnType --comma
aSecondColumnUnique INT,
CONSTRAINT UC_aTableSecondColumnUnique UNIQUE (aSecondColumnUnique)
)
GO
What is that green stuff? That is how we make comments without messing up our script.
ALTER, ALTER
When creating a model, most times, at the beginning of the process, we make mistakes, or a different requirement comes in or something is added and the list can go on and on.
In the initial designing, it may be easier to DROP the whole table and recreate it again, if using SQL scripts.
However, if there are data in place already, and the tables are being actively used in any application, we can not just drop the table. We must ALTER it.
In a table, any of those elements in the structure can be somehow modified. New elements can also be added at any time. We want to avoid those changes as much as we can, but they happen frequently.
here is how to use it:
ALTER TABLE aTable
DROP COLUMN aTableName
Simple, right? It acctually is, but there are few considerations to be done before we can say we fully understand this.
Constraints
In the alter example given above, we were able to drop a column of a table easily. However, it may not be the case if this column has a constraint.
What is a constraint?
Constraints specify the rules for the data in the table. Business rules are generally translated to constraints and their use limit the type of data to go in the table, ensures data integrity and reliability.
There can be two different types.
- Column Level Constraints
- Table Level Constraints
What determines if it is a column level or table level is the number of columns that belong to the same constraint. So, if only 1 then it is a column-level if more than 1 then it will be a table level.
Primary Key Constraint
This is the most known, most spoken about, most used. When we create a primary key, we also create by default a clustered index, and a statistics object.
The primary key is the type of constraint where we do not want to mess up. The reason is that it is quite hard to change it later on. It is possible to change by using the ALTER command but it can cause different issues.
Unique Constraint
This constraint is very similar to the Primary Key. The differences here is that his key will by default create a non-clustered index. The primary key, as mentioned before, will create a clustered index by default.
This constraint can also be clustered as long as the primary key does not exist, or the primary key is made as non-clustered. Remeber, a table can only have one clustered index.
When we create an unique cosntraint, the following objects are also created.
- One Key
- One Non-clustered index (Default behavior)
- One Statistic
Foreign Key constraint
Foreign key constraints ensures referential integrity between two or more tables. It creates the relation among tables.
The Foreign key in a table will either be the Primary key, or unique key of a different table. In this case the table having the primary key or unique key is considered the parent table and the table having the foreign key is considered the child table.
The Foreign key, can also have the primary key or unique key of itself. In this case this table is considered a self referencing table. We will see more about this when talking about relations.
Check constraint
Check constraints are used to restrict values of an attribute or column in a table. If the value to be inserted does not comply with the specified constraint, the record will not be inserted.
Default Constraint
The default constraint will provide a value for the column in case no value is assigned in an INSERT command.
NULL or NOT NULL Constraint
This will enforce that a value is inserted in a column. If no values are in the INSERT command, if the column is specified as NOT NULL, the record will fail to insert.
If this constraint is not specified, the NULL constraint will be the default.
Constraints Considerations
Best practices say that NULL and NOT NULL constraints should be explicitly defined.
All constraints, except for NULL and NOT NULL constraints, should have an explicit name created by the designer. If a name is not assigned, SQL SERVER will create a random name that will be harder to debug, if needed in the future.
When we create a name for it, these names will flow to all objects created and related to that constraint. In other words, if we create a Primary key and name it, the statistics, indexes and keys will carry the same name.
This is how we should name these constraints based on the type of the constraint.
- PRIMARY KEY CONSTRAINT PK_<TableName>_<ColumnName>(s)
- UNIQUE KEY CONSTRAINT UQ_<TableName>_<ColumnName(s)>
- DEFAULT CONSTRAINT DF_<TableName>_<ColumnName>
- CHECK CONSTRAINT CHK_<TableName>_<ColumnName>
DROP
This statement will remove the table structure definition of the database. Once the definition is gone, so will the data and any objects attached to it like indexes, constraints, etc.
Before using this statement, we must be aware of the complications since there is no going back if dropping a table.
We can also us this to only drop other objects while leaving the table structure intact.
Since a foreign key when set in another table, will ensure integrity, if we try to DROP a table with rows being referenced somewhere, the command will fail.
The solution to make sure it will work is by using ON DELETE CASCADE when creating the constraint. however, if not done correctly, it can be an even worse idea since it can wipe out the whole database.
How to DROP a table?
DROP TABLE aTable
ALTER & DROP
Dropping tables is not something that happens so much, unless in the beginning of the design process. However, it can be used a lot to modify other objects attached to the table in association with the ALTER command.
We can DROP a column, just like this
ALTER TABLE aTable
DROP COLUMN aTableId
GO
If a CONSTRAINT is associated with the column, we will get an error similar to this.
Msg 5074, Level 16, State 1, Line 38
The object 'UC_aTableSecondColumnUnique' is dependent on column 'aSecondColumnUnique'.
Msg 4922, Level 16, State 9, Line 38
ALTER TABLE DROP COLUMN aSecondColumnUnique failed because one or more objects access this column.
In that case, we need to DROP the constraint first.
ALTER & ADD
ALTER TABLE aTable
DROP CONSTRAINT UC_aTableSecondColumnUnique
GO
ALTER TABLE aTable
DROP COLUMN aSecondColumnUnique
GO
We can also ADD back the column and the constraint.
ALTER TABLE aTable
ADD aSecondColumnUnique INT UNIQUE
GO
ALTER TABLE aTable
ADD CONSTRAINT UQ_aTable_aSecondColumnUnique UNIQUE (aSecondColumnUnique)
GO
TRUNCATE
Keep data structure definition and objects, get rid of the data! That is what truncate does. In general, the data is lost when truncating it, but depending on the database, we still be able to recover it.
Other Commands
Depending on the database, there are other commands that can be used. They are RENAME and COMMENT. SQL Server does not support does command. Rename will simply rename an object. SQL SERVER has a built in stored procedure to that! It is called sp_rename.
This command must also be used very carefully. A complete application can break when changing names of a table or columns.
COMMENT is another DDL used in ORACLE. it is mainly used for explanatory remarks. SQL SERVER will also not support this command but the work around is the use of the built in stored procedure sp_addextendedproperty.
Summary
As a database professional, we must know and understand wht these DDL commands are doing, especially CREATE, ALTER, DROP, and TRUNCATE. We will encounter them very often.
These structures should resemble as much as possible the organization. If done correctly, applications on top of the database will be much easier to be developed.
If done incorrectly, all kinds of weird situations will come up in the application. Even worse, the workaround may be horrible.