Although this is a book about Delphi programming, not databases, I feel it's important to discuss a few elements of good (and modern) database design. The reason is simple: If your database design is incorrect or convoluted, you'll either have to write terribly complex SQL statements and server-side code, or write a lot of Delphi code to access your data, possibly even fighting against the design of the TDataSet class.
The classic relational database design approach, based on the entity-relation (E-R) model, involves having one table for every entity you need to represent in your database, with one field for each data element you need plus one field for every one-to-one or one-to-many relation to another entity (or table). For many-to-many relations, you need a separate table.
As an example of a one-to-one relation, consider a table representing a university course. It will have a field for each relevant data element (name and description, room where the course is held, and so on) plus a single field indicating the teacher. The teacher data really should not be stored within the course data, but in a separate table, because it may be referenced from elsewhere.
The schedule for each course can include an undefined number of hours on different days, so they cannot be added in the same table describing the course. Instead, this information must be placed in a separate table that includes all the schedules, with a field referring to the class each schedule is for. In a one-to-many relation like this, many records of the schedule table point to the same one record in the course table.
A more complex situation is required to store information about which student is taking which class. Students cannot be listed directly in the course table, because their number is not fixed, and the classes cannot be stored in the student's data for the same reason. In a similar many-to-many relation, the only approach is to create an extra table representing the relation—it lists references to students and courses.
The classic design principles include a series of so-called normalization rules. The goal of these rules is to avoid duplicating data in your database (not only to save space, but mainly to avoid ending up with incongruous data). For example, you don't repeat all the customer details in each order, but refer to a separate customer entity. This way you save memory, and when a customer's details change (for example, because of a change of address), all of the customer's orders reflect the new data. Other tables that relate to the same customer will be automatically updated as well.
Normalization rules imply using codes for commonly repeated values. For example, suppose you have a few different shipment options. Rather than include a string-based description for these options within the orders table, you can use a short numeric code that's mapped to a description in a separate lookup table.
The previous rule, which should not be taken to the extreme, helps you avoid having to join a large number of tables for every query. You can either account for some de-normalization (leaving a short shipment description within the orders table) or use the client program to provide the description, again ending up with a formally incorrect database design. This last option is practical only when you use a single development environment (let's say, Delphi) to access this database.
In a relational database, records are identified not by a physical position (as in Paradox and other local databases) but by the data within the record. Typically, you don't need the data from every field to identify a record, but only a subset of the data, forming the primary key. If the fields that are part of the primary key must identify an individual record, their value must be different for each possible record of the table.
Early incarnations of relational theory dictated the use of logical keys, which means selecting one or more fields that indicate an entity without risk of confusion. This is often easier to say than to accomplish. For example, company names are not generally unique, and even the company name and its location don't provide a complete guarantee of uniqueness. Moreover, if a company changes its name (not an unlikely event, as Borland can teach us) or its location, and you have references to the company in other tables, you must change all those references as well and risk ending up with dangling references.
For this reason, and also for efficiency (using strings for references implies using a lot of space in secondary tables, where references often occur), logical keys have been phased out in favor of physical or surrogate keys:
External Keys and Referential Integrity
The keys identifying a record (whatever their type) can be used as external keys in other tables—for example, to represent the various types of relations discussed earlier. All SQL servers can verify these external references, so you cannot refer to a nonexistent record in another table. These referential integrity constraints are expressed when you create a table.
Besides not being allowed to add references to nonexistent records, you're generally prevented from deleting a record if external references to it exist. Some SQL servers go one step further: As you delete a record, instead of denying the operation, they can automatically delete all records that refer to it from other tables.
In addition to the uniqueness of primary keys and the referential constraints, you can generally use the database to impose more validity rules on the data. You can ask for specific columns (such as those referring to a tax ID or a purchase order number) to include only unique values. You can impose uniqueness on the values of multiple columns—for example, to indicate that you cannot hold two classes in the same room at the same time.
In general, simple rules can be expressed to impose constraints on a table, whereas more complex rules generally imply the execution of stored procedures activated by triggers (every time the data changes, for instance, or there is new data).
Again, there is much more to proper database design, but the elements discussed in this section can provide you with a starting point or a good refresher.
In local databases, tables are sequential files whose order either is the physical order or is defined by an index. By contrast, SQL servers work on logical sets of data that aren't related to a physical order. A relational database server handles data according to the relational model: a mathematical model based on set theory.
For this discussion, it's important for you to know that in a relational database, the records (sometimes called tuples) of a table are identified not by position but exclusively through a primary key, based on one or more fields. Once you've obtained a set of records, the server adds to each of them a reference to the following record; thus you can move quickly from a record to the following one, but moving back to the previous record is extremely slow. For this reason, it is common to say that an RDBMS uses a unidirectional cursor. Connecting such a table or query to a DBGrid control is practically impossible, because doing so would make browsing the grid backward terribly slow.
Some database engines keep the data already retrieved in a cache, to support full bidirectional navigation on it. In the Delphi architecture, this role can be played by the ClientDataSet component or another caching dataset. You'll see this process in more detail later, when we focus on dbExpress and the SQLDataset component.
If proceeding backward might result in problems, keep in mind that jumping to the last record of a table is even worse; usually this operation implies fetching all the records! A similar situation applies to the RecordCount property of datasets. Computing the number of records often implies moving them all to the client computer. For this reason, the thumb of the DBGrid's vertical scrollbar works for a local table but not for a remote table. If you need to know the number of records, run a separate query to let the server (and not the client) compute it. For example, you can see how many records will be selected from the EMPLOYEE table if you are interested in those records having a salary field higher than 50,000:
select count(*) from Employee where Salary > 50000
|Copyright © 2004-2021 "Delphi Sources" by BrokenByte Software. Delphi Programming Guide||