Understanding Relational Databases
Simply put, you can think of databases as software that simply stores and organizes data in an efficient way. When we think of databases, we tend to think of what are called Relational Databases because they store data in a relational way. This means the data is related to each other in some way. For example, a relational database may have several tables that help a product ordering application. These tables may be named something like orders, customers, addresses, payments, etc... You would then be able to connect these tables through what are called primary and foreign keys.
Primary Keys
Primary keys are unique for each record in a database. Lets use this website for example and assume that we have users with the following names:
NAME |
John |
Lauren |
Carl |
We might store this data in a table called users. Well, just having one table in our websites database is not very useful. Say we also want to see what these users are posting we could also have a posts table that may look like the following:
POST |
Understanding Redshift |
Seattle |
AWS |
Our database now has two ENTIRE tables! They are not very useful, though, because we cannot see who is posting what. This is where primary keys become useful. Now, we could setup just one large table that contains all of the websites data, but this quickly grows unwieldy whenever the data models become more complex than a simple blog. So instead we can assign a number that increases by 1 each time we add a new user or post. So we can refactor the tables to look like the following.
user_id (This will be the name of our primary key for users) |
name |
1 |
John |
2 |
Lauren |
3 |
Carl |
post_id (This will be the name of the primary key for posts) |
post_title |
1 |
Understanding Redshift |
2 |
Seattle |
3 |
AWS |
We now have our tables organized with primary keys, but we are unable to join the two tables to see who posted what article. For that, we will need to make use of a foreign key.
Foreign/Secondary Keys
A foreign key is a column or a set of columns in a table whose values correspond to the values of the primary key in another table. In order to add a row with a given foreign key value, there must exist a row in the related table with the same primary key value.
For our example a good foreign key for the posts table would be user_id so that way we can easily see who posted what article. So we can refactor our table to look something like this:
post_id (This will be the name of the primary key for posts) |
post_title |
user_id |
1 |
Understanding Redshift |
1 |
2 |
Seattle |
1 |
3 |
AWS | 2 |
We are now able to easily see who posted what article by matching the user_id to the users table. Our SQL query would look something like this:
SELECT
p.post
, u.name
FROM posts p
JOIN users u
-- Join the foreign key of user_id from post to the primary key of user_id in users
ON p.user_id = u.user_id
Our results would then look like the following:
post | name |
Understanding Redshift |
John |
Seattle |
John |
AWS |
Lauren |
No Comments