Databases
You will need to know..
- Database Structure: Flat file versus Relational.
- Database keys: primary, foreign and secondary.
- ERD: Entity Relationship modelling.
- Normalisation to 3rd Normal Form,
- Methods of capturing, selecting, managing and exchanging data.
- SQL - see appendix 5d of the specification.
- Referential Integrity.
- Transaction Processing - ACID (Atomicity, Consistency, Isolation, Durability.
- Record Locking and Redundancy.
Database Structure: Flat File versus Relational
Flat File
If you were going to create a database at home with a small amount of data where security might not be an issue and you're not a technical expert, you may want flat file database. This is a database where all records are stored in one place - a flat file. Thus it is easier to understand. There is no organisation, no relationship between the records. Flat file databases can have redundancy as data is repeated. They are also harder to update. For example in the example here if the boss changes their phone number several records must be updated.
Relational
A relational database is more complex because records are organised into multiple tables. Records are cross-referenced between tables. This reduces duplication of data and redundant data. It also saves memory.
A relational database is easier to add or update data as just need to focus on the affected table. This is particularly necessary when handling large amounts of data.
It also reduces duplication significantly and improves data consistency and data integrity (referential integrity). Relational databases are more secure because you can create user-level permissions to access only certain parts of the database. For example, your teacher can see records containing your email address, timetable etc. but not the part of the database which stores staff salaries.
Database software to create relational databases exists such as Microsoft Access. However, you still need to know how to use it.
Database Keys
A primary key is a field used to as a unique identifier for a record. It must have a unique value for each record in the table e.g. UserID.
A primary key used in another table is called a foreign key. A foreign key is used to establish a link between tables.
Indexing fields supports a faster search on that field. You would index fields which are commonly used in a search such as Author in a database of books. But indexing takes up more space, so this is why they are not used on every field.
A secondary key shows the secondary value that is unique for each record. It can be used to identify the record and it is indexed to allow for faster searching.
Methods of Capturing Data
Watch the following video to find out about how data is captured and entered into a database.
Hashing
In terms of databases hashing has two purposes:
- It is an algorithm applied to a value. It is a one way function and can not be reversed. This is used to secure sensitive data such as passwords. Thus if a hacker was to gain access to a database they would still not be able to obtain the value. The main steps are shown below with password as an example:
- Run a hashing algorithm on the correct passwords and store just the output.
- To check an input again the hash.
- Run the same algorithm on the inputted password.
- Compare this to the retrieved hashed original password.
- If both are the same the password is correct.
- It is an algorithm applied to a value to decide where to insert the value into a database. This allows for quick data retrieval as data can be retrieved directly from that position.
Watch the video below for a full explanation.
ERDs: Entity Relationship Diagrams
An entity relationship diagram (ERD) is a graphical representation that depicts relationships between tables(entities) in a database. The relationship can be:
- One to One
- One to Many
- Many to Many (although this is considered bad practice)
Normalisation
Normalisation is the process of taking a flat file database and separating it out into a relational database. To do this you must follow specific rules to show each stages of the database transformation:
- 1st Normal Form
- No field with repeating data.
- Data is atomic - it can not be further broken down.
- Each record is unique. That is, it has a primary Key (separate or composite key).
- Each field has a unique name.
- 2nd Normal Form
- Is in First Normal Form.
- Every field is dependent on the primary key.
- 3rd Normal Form
- Is in 2nd Normal Form.
- Non-key field must not depend on another non-key field. Many to Many relationships are not allowed in 3rd NF. The solution is often to add a table between the entities.
Hint:
- To get to 1st Normal form your focusing on records and their data. You tend to separate records into more records if required.
- To get to 2nd or to 3rd Normal form your focusing on how fields relate to each other. You tend to be splitting tables into further tables.
Note: There are more forms but they are not required for your course. Watch the following videos in order for a step by step explanation.
SQL - The Programming Language for Databases
From the specification here is the list of SQL keywords you will need to know.
SQL Examples
Searching
Find out the name of all monkeys in a zoo and present in alphabetical order by name in ascending order:
SELECT Name from Animals
WHERE Type = "Monkeys"
ORDER BY Name
Insert a new record
A new panda has arrived! To insert a record you either must insert all fields or specify the fields you want to include. Below the fields are specified:
INSERT INTO Animals (Name, Age, Type)
VALUES ('Pete', 3, 'Panda');
Remove a table from the database
DROP TABLE Animals;
Deleting
A zoo has lost all its pandas, so the records in the Animals table must be deleted.
DELETE FROM Animals
WHERE type = "Panda"
Update a table
Change the age of Oscar the monkey to 5:
UPDATE Animals
SET age = 5
WHERE Name = "Oscar"
Note this would change the age of all monkeys called Oscar!
Referential Integrity
Referential integrity ensures data consistency. Databases are frequently updated by multiple users which leave them at risk of data becoming inconsistent or corrupt. It refers to the relationship between tables. Each table in a database must have a primary key, which could also be in other tables if the data is linked. When a primary key from one table appears in another table, it is called a foreign key.
If data in one table refers to data in another table, then the data referred to must exist. In other words referential integrity ensures that each foreign key links to an existing primary key.
If changes happen in the database that effect other fields, the changes must be cascaded through, For example, if a record is removed from a table then all references of that record must be removed throughout the database. Database software such as Microsoft Access support data consistency by including features to define the relationship between tables and enforce referential integrity.
Transaction Processing
Transactionsmust be processed according to certain rules ensure that they make sense:
- Atomic: they must succeed or fail, but never partially succeed.
- Consistent: the transaction must only change the database according to the rules of the database.
- Isolated: each transaction must not affect or overwrite other concurrent transactions. Records should be locked until the transaction completes.
- Durable: once a transaction has been started it must remain no matter what happens.
- As data is entered it should be validated and verified as it must be accurate (Data Protection Act) and security measures must be taken to prevent tampering with the data (The Computer Misuse Act).
- Transactions should maintain referential integrity - changes made in one table should consider data linked in other tables.
- Data should have redundancy - if some data is lost it should be recoverable elsewhere (e.g. a RAID setup or mirroring).