Understanding Data Modeling

The process of modeling, or coming up with the data model, is central to the developing of a WorkflowFirst application. Therefore it's worth spending some time getting familiar with data models as they are used and defined in WorkflowFirst.

For those unfamiliar with data modeling, a data model defines the structures in which the information in the database will be held. In a spreadsheet, for example, the data model consists of the column, sheet and range names. Those are quite distinct from the data itself, which is usually held in rows underneath the column headers. Generally, the data model consists of the categories into which each record of the data is separated. The structure or class of a record is known as a type. An example of a type is a customer, or an order. All the types related to a specific domain are collectively called the data model.

Creating an application in WorkflowFirst involves designing a type of data model called a hierarchical data model. A hierarchical data model is unique in that the types within the model follow a parent-child relationship.

Lets look at a concrete example. A spreadsheet designed to hold company information might have one sheet with column headers labeled as Company Name and Address. In another table or another sheet, there may be another list of contacts at that company with columns labeled as Contact Name and Position. Each contact may have a column called Company Name also, so that we know which contact belongs to which company. That is a classic example of a relational model.

An example of this relational structure can be seen below:


In a hierarchical data model, there are no longer two separate tables that can then be related. Instead there is one top-level table, the Companies, and that has a child table (or sublist) called Contacts. Each Company record can then contain one or more Contact records. The relationship is explicit and not implicit. It is that container/ownership structure that allows the model to be hierarchical.

You may be familiar with hierarchical structures in your experience of using folders and files, using Microsoft Windows' explorer for example.

The benefit of a hierarchical data model is that the relationship between different tables (or, as we call them, types) is very clear. It is not a matter of seeing multiple tables of data and then wondering how to relate that information. Instead, all of the information and its relationships are immediately obvious and accessible.

An example of a hierarchical data mode is shown below:


In this example, you can see that underneath the company ACME Corp we have the list of Contacts related to that company. We no longer need to include the Company Name in that list because the company is implied from its parent. In a hierarchical database, retrieving the ACME Corp record would automatically let us access the contacts belonging to ACME Corp.

Designing a hierarchical data model involves making decisions upfront about the type of relationships between types. The key to deciding whether a type should be a child of another type is a matter of ownership: If a record in the parent type is deleted, would you expect the records of the child type to be deleted also? If the answer is yes then you are probably looking at a parent-child relationship.

There may be a time when the relationship between two types isn't that of containment or ownership. If the relationship is rather a matter of reference, and you would not expect the parent record to delete the record it is referring to, then you should consider it a link.

The difference between ownership or reference (link) can sometimes be difficult to decide upon. Lets take a look at our Company Information File example from above. What if we wanted to record a list of countries. In each country we want to record information on their GDP, population, import/export restrictions and things of that nature. Now each company belongs to a specific country (for simplicity we'll assume that a company only has one country). If we put the industry definition underneath the company type as a child type, then when we delete the company we will also delete the country. That might be fine is we were not sharing the information, but each country may refer to several companies. So we wouldn't want to delete the country because it would also delete information that is used by other companies. Because this relationship is not of ownership, but rather a shared reference, we would define this as a link . Links are described in more detail in a later section.

The process is not always clear cut. There may be times when the choice is more ambiguous. In the above example, should the Country be the top level type, and the Company type be underneath that? Another example, let's say we add 'letters' to our customer information database. In there, we record any correspondence from or to the company. On the one hand we might say that each letter is owned by the company. But then what if we send a letter to multiple customers? Should we have a copy of it under each customer? Or would it have been better to have the Letters type as a separate, independent table that itself can link to one or more customers?

Fairly often, the need will arise to choose a primary relationship that finds a compromise based on the most common scenario. This may make exceptions to the norm harder to deal with, but the overall focus should be on the most common scenarios and making them the easiest to handle. This is why a hierarchical data model demands an understanding of typical business operations, it demands that developers talk to the users to understand what their most common scenarios actually are, so they can be prepared when making these design decisions.

The final point to make about data modeling is naming. Coming up with sensible names for types and field names is critical when creating data models. These names should describe exactly what is intended to be expressed, without being too generic or too specific. It may often be a good idea to come up with a glossary of terms that all the users can generally agree upon and use this glossary as your reference point. Names that are too ambiguous lead to confusion.

Tip : WorkflowFirst provides a section to maintain a glossary of terms used throughout an application. In your application record, drilldown (the Hamburger icon) to the Glossary section and there you'll easily be able to add terms and definitions that everyone can see and share. 

So it should be clear now that understanding how to structure a data model is the key to understanding business requirements and being able to translate those into a format that a computer system can utilize to create a software application. 


Next Topic:
v4.2.0.956 (beta)
Up Since 2/29/2024 12:02:23 AM