Delphi Sources - Elements of Database Design - English Delphi Programming Guide

Elements of Database Design

Although this is a book about Delphi programming, not databases, I feel it's important to discuss a few elements of good (and modern) database design. The reason is simple: If your database design is incorrect or convoluted, you'll either have to write terribly complex SQL statements and server-side code, or write a lot of Delphi code to access your data, possibly even fighting against the design of the TDataSet class.

Entities and Relations

The classic relational database design approach, based on the entity-relation (E-R) model, involves having one table for every entity you need to represent in your database, with one field for each data element you need plus one field for every one-to-one or one-to-many relation to another entity (or table). For many-to-many relations, you need a separate table.

As an example of a one-to-one relation, consider a table representing a university course. It will have a field for each relevant data element (name and description, room where the course is held, and so on) plus a single field indicating the teacher. The teacher data really should not be stored within the course data, but in a separate table, because it may be referenced from elsewhere.

The schedule for each course can include an undefined number of hours on different days, so they cannot be added in the same table describing the course. Instead, this information must be placed in a separate table that includes all the schedules, with a field referring to the class each schedule is for. In a one-to-many relation like this, many records of the schedule table point to the same one record in the course table.

A more complex situation is required to store information about which student is taking which class. Students cannot be listed directly in the course table, because their number is not fixed, and the classes cannot be stored in the student's data for the same reason. In a similar many-to-many relation, the only approach is to create an extra table representing the relation—it lists references to students and courses.

Normalization Rules

The classic design principles include a series of so-called normalization rules. The goal of these rules is to avoid duplicating data in your database (not only to save space, but mainly to avoid ending up with incongruous data). For example, you don't repeat all the customer details in each order, but refer to a separate customer entity. This way you save memory, and when a customer's details change (for example, because of a change of address), all of the customer's orders reflect the new data. Other tables that relate to the same customer will be automatically updated as well.

Normalization rules imply using codes for commonly repeated values. For example, suppose you have a few different shipment options. Rather than include a string-based description for these options within the orders table, you can use a short numeric code that's mapped to a description in a separate lookup table.

The previous rule, which should not be taken to the extreme, helps you avoid having to join a large number of tables for every query. You can either account for some de-normalization (leaving a short shipment description within the orders table) or use the client program to provide the description, again ending up with a formally incorrect database design. This last option is practical only when you use a single development environment (let's say, Delphi) to access this database.

From Primary Keys to OIDs

In a relational database, records are identified not by a physical position (as in Paradox and other local databases) but by the data within the record. Typically, you don't need the data from every field to identify a record, but only a subset of the data, forming the primary key. If the fields that are part of the primary key must identify an individual record, their value must be different for each possible record of the table.

Note

Many database servers add internal record identifiers to tables, but they do so only for internal optimization; this process has little to do with the logical design of a relational database. These internal identifiers work differently in different SQL servers and may change among versions, so you shouldn't rely on them.

Early incarnations of relational theory dictated the use of logical keys, which means selecting one or more fields that indicate an entity without risk of confusion. This is often easier to say than to accomplish. For example, company names are not generally unique, and even the company name and its location don't provide a complete guarantee of uniqueness. Moreover, if a company changes its name (not an unlikely event, as Borland can teach us) or its location, and you have references to the company in other tables, you must change all those references as well and risk ending up with dangling references.

For this reason, and also for efficiency (using strings for references implies using a lot of space in secondary tables, where references often occur), logical keys have been phased out in favor of physical or surrogate keys:

Physical Key A single field that identifies an element in a unique way. For example, each person in the U.S. has a Social Security Number (SSN), but almost every country has a tax ID or other government-assigned number that identifies each person. The same is typically true for companies. Although these ID numbers are guaranteed to be unique, they can change depending on the country (creating troubles for the database of a company that sells goods abroad) or within a single country (to account for new tax laws). They are also often inefficient, because they can be quite large (Italy, for example, uses a 16-character code—letters and numbers—to identify people).

Surrogate Key A number identifying a record, in the form of a client code, order number, and so on. Surrogate keys are commonly used in database design. However, in many cases, they end up being logical identifiers, with client codes showing up all over the place (not a great idea).

Warning

The situation becomes particularly troublesome when surrogate keys also have a meaning and must follow specific rules. For example, companies must number invoices with unique and consecutive numbers, without leaving holes in the numbering sequence. This situation is extremely complex to handle programmatically, if you consider that only the database can determine these unique consecutive numbers when you send it new data. At the same time, you need to identify the record before you send it to the database—otherwise you won't be able to fetch it again. Practical examples of how to solve this situation are discussed in Chapter 15, "Working with ADO."

An extension to the use of surrogate keys is the use of a unique Object Identifier (OID). An OID is either a number or a string with a sequence of numbers and digits; it's added to each record of each table representing an entity (and sometimes to records of tables representing relations). Unlike client codes, invoice numbers, SSNs, or purchase order numbers, OIDs are random: They have no sequencing rule and are never visible to the end user. This means you can use surrogate keys (if your company is used to them) along with OIDs, but all the external references to the table will be based on OIDs.

Another common rule suggested by the promoters of this approach (which is part of the theories supporting object-relational mapping) is the use of system-wide unique identifiers. If you have a table of client companies and a table of employees, you may wonder why you should use a unique ID for such diverse data. The reason is that you'll be able to sell goods to an employee without having to duplicate the employee information in the customer table—you can refer to the employee in your order and invoice. An order is placed by someone identified by an OID, and this OID can refer to many different tables.

Using OIDs and object-relational mapping is an advanced element of the design of Delphi database applications. I suggest that you investigate this topic before embracing medium or large Delphi projects because the benefit can be relevant (after some investment in studying this approach and building some basic support code).

External Keys and Referential Integrity

The keys identifying a record (whatever their type) can be used as external keys in other tables—for example, to represent the various types of relations discussed earlier. All SQL servers can verify these external references, so you cannot refer to a nonexistent record in another table. These referential integrity constraints are expressed when you create a table.

Besides not being allowed to add references to nonexistent records, you're generally prevented from deleting a record if external references to it exist. Some SQL servers go one step further: As you delete a record, instead of denying the operation, they can automatically delete all records that refer to it from other tables.

More Constraints

In addition to the uniqueness of primary keys and the referential constraints, you can generally use the database to impose more validity rules on the data. You can ask for specific columns (such as those referring to a tax ID or a purchase order number) to include only unique values. You can impose uniqueness on the values of multiple columns—for example, to indicate that you cannot hold two classes in the same room at the same time.

In general, simple rules can be expressed to impose constraints on a table, whereas more complex rules generally imply the execution of stored procedures activated by triggers (every time the data changes, for instance, or there is new data).

Again, there is much more to proper database design, but the elements discussed in this section can provide you with a starting point or a good refresher.

Note

For more information about SQL's Data Definition Language and Data Manipulation Language, see the chapter "Essential SQL" in the electronic book described in Appendix C, "Free Companion Books on Delphi."

Unidirectional Cursors

In local databases, tables are sequential files whose order either is the physical order or is defined by an index. By contrast, SQL servers work on logical sets of data that aren't related to a physical order. A relational database server handles data according to the relational model: a mathematical model based on set theory.

For this discussion, it's important for you to know that in a relational database, the records (sometimes called tuples) of a table are identified not by position but exclusively through a primary key, based on one or more fields. Once you've obtained a set of records, the server adds to each of them a reference to the following record; thus you can move quickly from a record to the following one, but moving back to the previous record is extremely slow. For this reason, it is common to say that an RDBMS uses a unidirectional cursor. Connecting such a table or query to a DBGrid control is practically impossible, because doing so would make browsing the grid backward terribly slow.

Some database engines keep the data already retrieved in a cache, to support full bidirectional navigation on it. In the Delphi architecture, this role can be played by the ClientDataSet component or another caching dataset. You'll see this process in more detail later, when we focus on dbExpress and the SQLDataset component.

Note

The case of a DBGrid used to browse an entire table is common in local programs but should generally be avoided in a client/server environment. It's better to filter out only part of the records and only the fields you are interested in. If you need to see a list of names, return all those starting with the letter A, then those with B, and so on, or ask the user for the initial letter of the name.

If proceeding backward might result in problems, keep in mind that jumping to the last record of a table is even worse; usually this operation implies fetching all the records! A similar situation applies to the RecordCount property of datasets. Computing the number of records often implies moving them all to the client computer. For this reason, the thumb of the DBGrid's vertical scrollbar works for a local table but not for a remote table. If you need to know the number of records, run a separate query to let the server (and not the client) compute it. For example, you can see how many records will be selected from the EMPLOYEE table if you are interested in those records having a salary field higher than 50,000:

select count(*)
from Employee
where Salary > 50000

Tip

Using the SQL instruction count(*) is a handy way to compute the number of records returned by a query. Instead of the * wildcard, you could use the name of a specific field, as in count(First_Name), possibly combined with either distinct or all, to count only records with different values for the field or all the records having a non-null value.