Wednesday, September 23, 2009

Entity Type Identification

In my experience as an educator, one of my most frustrating experiences was the day a student asked for a deeper explanation of how to identify concepts of importance to the organization, also known as entity types. Unfortunately, I did not have time to cover the topic further in lecture. If I ever return to teaching undergraduates, I will hold a clinic outside of lecture to cover this essential topic.

Actually a more precise definition for entity type is a concept of importance to the organization, about which it wishes to capture or report information. Entity types are also called data classes, especially by the object-oriented (OO) types out there. Many practitioners will refer them to as entities for short, although this is technically incorrect, as entity is more properly thought of as an instance. A whole group of entities that are categorically similar with the same properties are then entity types.

Think of the most common entity type of them all: Person. Persons all must have at most one PreferredName, SurName, BirthDate, and BirthPlaceName. We all may have at most one SSNumber, MaidenName, or HighSchoolGraduationDate. And we may occupy one or more Roles of interest to the Organization, e.g., Employee, Customer, Vendor Contact and so on. Conceptually, all of these Roles are assumed by Persons and all of these Persons may assume many Roles.

So entity types are also class-type nouns: Persons, places (GeopoliticalAreas), or things. Things can be tangible like a Book or Product or they can be intangible, like a Course or PartSpecification . Indeed, the most relevant and most difficult entity types to identify are related to events. An ItemSale or a PurchaseOrderEntry are common itemization events. These event entity types are sometimes discovered while resolving associative entity types, this is not necessarily so. In my experience, in fact, maintaining inherited dependency between parent entities and a single child while data modeling will uncover these event-dependent associations. An organizational subject matter expert (SME) may not even recognize these entity types as important because they are implicit, but they are absolutely fundamental to the organization and thereby the data model. This can make them difficult to name. A process model may indeed be helpful to cross reference to be sure that all elementary business events are mappable to a "root" entity type.

In closing, remember an organizational entity-type is not a table, report, control object or any other programming construct. It is something that organizational SMEs will (eventually) recognize as important to conduct of business within that organization.