This is a new feature for me, focusing on facets of data modeling, RDF, SPARQL, SHACL, and related semantic content, as well as their applicability in machine learning, making it easier to separate out these from my more industry (or broad) pieces. I would describe this in a broader sense as hypergraph modeling – how to utilize advanced semantic graph techniques to encode information.
The Centrality of Events
Events can be particularly challenging to model, in part they have temporal aspects. An event cuts the timeline into three parts –
- the time before the event, in which that event exists in potential,
- the time during the event, in which the event has begun but has not yet terminated,
- the time after the event, in which the event exists as a distinct historical object.
Events are important because they both describe a number of different types of interactions and because their temporal nature means that they change over time. Almost every ontology worth its salt has an event class of some sort, though my personal belief is that most such Event classes are too overengineered (and in many cases are just poorly designed). I would go so far as to say that an Event object by itself is an abstract object that should be inherited, and should be seen as being a way of binding one or more Things together at a specific period of time and a location. This can be modeled as follows:
graph LR classDef thing fill:#888,stroke:#333,stroke-width:1px,color:white; classDef location fill:#008,stroke:#333,stroke-width:1px,color:white; classDef person fill:#ff0,stroke:#333,stroke-width:1px; classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px; classDef event fill:#808,stroke:#333,color:white,stroke-width:1px; classDef artifact fill:#0f0,stroke:#333,color:white,stroke-width:1px; classDef literal fill:#f0f0c0 classDef blanknode fill:#000,stroke:#333,color:white,stroke-width:1px; Thing:Thing1(["<b>Thing</b>\nThing1"]):::thing Thing:Thing2(["<b>Thing</b>\nThing2"]):::thing Thing:Thing3(["<b>Thing</b>\nThing3"]):::thing Event:Event1(["<b>Event</b>\nEvent1"]):::event Location:Loc1(["<b>Location</b>\nLoc1"]):::location Thing:Thing1 -- has event --> Event:Event1 Thing:Thing2 -. has event .-> Event:Event1 Event:Event1 -. produces .-> Thing:Thing3 Event:Event1 -- has location --> Location:Loc1 Event:Event1 -- has start time --> start1[[Start Time]]:::literal Event:Event1 -. has end time .-> end1[[End Time]]:::literal
In the diagram, a dotted link indicates an optional link and the open square boxes indicate literal values.
This is an example of where you have a pattern (not just a class) where you want to subclass. For instance, one common form of event is a meeting, in which you have one or more people (things) coming to a venue (location) at a specific time. Meetings have an identical structure, assuming inheritance:
graph LR classDef person fill:#ff0,stroke:#333,stroke-width:1px; classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px; classDef event fill:#808,stroke:#333,color:white,stroke-width:1px; classDef venue fill:#008,stroke:#333,color:white,stroke-width:1px; classDef work fill:#844,stroke:#333,color:white,stroke-width:1px; classDef literal fill:#f0f0c0 classDef blanknode fill:#000,stroke:#333,color:white,stroke-width:1px; Person:JaneDoe(["<b>Thing::Person</b>\nJane Doe"]):::person; Person:MaryScot(["<b>Thing::Person</b>\nMary Scot"]):::person; Meeting:CoffeeshopMeeting(["<b>Event::Meeting</b>\nCoffeeshop Meeting"]):::event; Work:WorkPlan1(["<b>Thing::Work</b>\nWork Plan 1"]):::work; Meeting:CoffeeshopMeeting -- hasStartTime --> startTime1[[19:00]]:::literal; Meeting:CoffeeshopMeeting -- "has venue\n(has location)" --> Venue:Coffeeshop[[Coffeeshop]]:::venue; Meeting:CoffeeshopMeeting -- "produces" --> Work:WorkPlan1; Venue:Coffeeshop(["<b>Location::Venue</b>\nCoffeeshop"]):::meeting; Person:JaneDoe --"has meeting\n(has event)" --> Meeting:CoffeeshopMeeting; Person:MaryScot -- "has meeting\n(has event)" --> Meeting:CoffeeshopMeeting; %% Person:hasMeeting(["<b>Person</b>\nhas Meeting"]):::property %% Event:hasStartTime(["<b>Event</b>\nhas Start Time"]):::property
The notations Thing::Person and has meeting/(has event) show the base class and the inherited class (and inherited/base properties respectively). The same graph without the indicated base classes looks as follows:
graph LR classDef person fill:#ff0,stroke:#333,stroke-width:1px; classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px; classDef event fill:#808,stroke:#333,color:white,stroke-width:1px; classDef venue fill:#008,stroke:#333,color:white,stroke-width:1px; classDef work fill:#844,stroke:#333,color:white,stroke-width:1px; classDef literal fill:#f0f0c0 classDef blanknode fill:#000,stroke:#333,color:white,stroke-width:1px; Person:JaneDoe(["<b>Person</b>\nJane Doe"]):::person; Person:MaryScot(["<b>Person</b>\nMary Scot"]):::person; Meeting:CoffeeshopMeeting(["<b>Meeting</b>\nCoffeeshop Meeting"]):::event; Work:WorkPlan1(["<b>Work</b>\nWork Plan 1"]):::work; Meeting:CoffeeshopMeeting -- hasStartTime --> startTime1[[19:00]]:::literal; Meeting:CoffeeshopMeeting -- "has venue" --> Venue:Coffeeshop[[Coffeeshop]]:::venue; Meeting:CoffeeshopMeeting -- "produces" --> Work:WorkPlan1; Venue:Coffeeshop(["<b>Venue</b>\nCoffeeshop"]):::meeting; Person:JaneDoe --"has meeting" --> Meeting:CoffeeshopMeeting; Person:MaryScot -- "has meeting" --> Meeting:CoffeeshopMeeting; %% Person:hasMeeting(["<b>Person</b>\nhas Meeting"]):::property %% Event:hasStartTime(["<b>Event</b>\nhas Start Time"]):::property
This is worth discussing in depth, as it reflects differing design philosophies. One such philosophy, seen a lot with OWL-centric semantic models, is to reduce the overall number of classes and properties as much as possible, typically by creating an “upper ontology” that asserts only a couple of dozen types of classes overall. Exemplified by ontology models such as GIST, this approach works well when you are looking at capturing very broad patterns, but suffers when you need precision in how you differentiate classes. At its most extreme, you would have one Thing class and one “has related” predicate, and would rely heavily upon shape patterns to distinguish behavior.
At the other extreme is the case where you have tens of thousands of classes in your ontology, with comparatively few distinctions between classes. These usually have very shallow inheritance structures, and are characteristic of situations where you have multiple ontologies active at any given time. Sometimes these are unavoidable (usually when dealing with ingestion from many sources) but because there are so many competing patterns involved there really are no effective patterns to latch onto, making queries difficult to write and extraordinary complex to perform.
Understand that a class exists simply as a convenience to the modeler. It’s a bundle of attributes and constraints that have been given a name. You give things names when you want to refer to them in some manner, and you give sets (collections of things) names when you want to refer to things with the same overlapping set of characteristics. That is data modeling in a nutshell.
Annotations and RDF-star
Time introduces its own complexity as previously mentioned. Until a specific moment in time – an event exists only in potential. If I say that I meeting is scheduled for 7pm and it is only 5pm, the statement about the start time of that meeting indicates an expected time. What if twelve people were invited to the meeting, but three showed up early, three showed up on time, two showed up within 15 minutes, two were detained for more than an hour and ended up missing the meeting and two had to leave early; did the meeting actually happen? In an era where Zoom meetings are standard, this is a scenario that happens quite often.
If the meeting was a voting or participatory meeting, then standard practice is that a quorum (2/3) should be present to hold the meeting within a certain time window. In the above scenario for the twelve members of the meeting, a quorum would occur when eight members are active at the meeting, This may seem like splitting hairs, but if the meeting was a stock-holder’s meeting, such considerations have financial ramifications.
Modeling frequently comes down to determining whether a particular attribute is significant or not. Knowing a timeline becomes important in courts, and because of the tripartate nature of events (has not yet occurred, is occurring, is no longer occuring), certain properties are indeterminate until all phases have taken place.
These properties, however, are not really tied exclusively to the meeting, but is also tied to attendance. In this particular situation, I’ve pulled out two attendees: Jane Doe and Mary Scott.
graph LR classDef person fill:#ff0,stroke:#333,stroke-width:1px; classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px; classDef event fill:#808,stroke:#333,color:white,stroke-width:1px; classDef production fill:#fC0,stroke:#333,stroke-width:1px; classDef productionType fill:#f00,stroke:#333,color:white,stroke-width:1px; classDef sponsor fill:#00f,stroke:#333,color:white,stroke-width:1px; classDef agent fill:#804020,stroke:#333,color:white,stroke-width:1px; classDef organizer fill:#040,stroke:#333,color:white,stroke-width:1px; classDef literal fill:#f0f0c0 Person:JaneDoe(["<b>Person</b>\nJane Doe"]):::person; Event:CoffeeshopMeeting(["<b>Event:Meeting</b>\nCoffeeshop Meeting"]):::event; Event:CoffeeshopMeeting -- hasStartTime --> startTime3[[19:00]]:::literal; Person:JaneDoe --has meeting --> Event:CoffeeshopMeeting; Person:MaryScot -- has meeting --> Event:CoffeeshopMeeting; Person:hasMeeting(["<b>Person</b>\nhas Meeting"]):::property Event:hasStartTime(["<b>Event</b>\nhas Start Time"]):::property assert1{{"assert1"}} assert1 -- subject --> Person:JaneDoe; assert1 -- predicate --> Person:hasMeeting assert1 -- object --> Event:CoffeeshopMeeting assert1 -- hasStartTime --> startTimeJD[[19:12]]::::literal assert2{{"assert2"}} assert2 -- subject --> Person:MaryScot; assert2 -- predicate --> Person:hasMeeting assert2 -- object --> Event:CoffeeshopMeeting assert2 -- hasStartTime --> startTimeMS[[18:45]]::::literal Person:MaryScot(["<b>Person</b>\nMary Scot"]):::person; assert3{{"assert3"}} assert3 -- subject --> Event:CoffeeshopMeeting; assert3 -- predicate --> Event:hasStartTime; assert3 -- object --> startTime3 assert3 -- hasCorrectedTime -->correctedTime1[[19:12]]:::literal
In effect, I have created an annotation on the phrase “Jane Doe has meeting at Coffeeshop Meeting.” This may seem odd, but keep in mind that what is being described here is neither the person object nor the meeting object, but rather the interaction of the two. It is, in other words, a reification of the statement.
In Turtle, the RDF-Star notation can be used to build such assertions. For instance, the above can be rendered as:
<<Person:JaneDoe Person:hasMeeting Meeting:CoffeeshopMeeting>> Assertion:hasStartTime "19:12"^^xsd:DateTime.
Where the expression in double-angle brackets corresponds to assert1
. This puts the onus of modeling onto the assertion class which is arguably good for visualization purposes, but it may actually make more sense to say that an assertion may have multiple annotations, and it is the annotation that holds temporal information:
<<Person:JaneDoe Person:hasMeeting Meeting:CoffeeshopMeeting>>
Assertion:hasAnnotation [
Annotation:hasStartTime "19:12"^^xsd:DateTime;
Annotation:hasEndTime "20:00"^^xsd:DateTime;
Annotation:hasObserver Person:JohnSchue;
].
This can be visualized as follows:
graph LR classDef person fill:#ff0,stroke:#333,stroke-width:1px; classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px; classDef event fill:#808,stroke:#333,color:white,stroke-width:1px; classDef annotation fill:#000,stroke:#333,color:white,stroke-width:1px; classDef literal fill:#f0f0c0 Person:JaneDoe(["<b>Person</b>\nJane Doe"]):::person; Event:CoffeeshopMeeting(["<b>Event:Meeting</b>\nCoffeeshop Meeting"]):::event; subgraph assertion1; direction LR; Person:JaneDoe --has meeting --> Event:CoffeeshopMeeting; end assertion1 -- has annotation --> Annotation:Annot1["_:Annotation1"]:::annotation Annotation:Annot1 -- hasStartTime --> startTimeJD[[19:12]]:::literal Annotation:Annot1 -- hasEndTime --> endTimeJD[[20:00]]:::literal Annotation:Annot1 -- hasObserver --> Person:JohnShue(["<b>Person</b>\nJohn Schue"]):::person
The visualization here is a short-hand – rather than specifying subject, predicate, and object separately, we use a subgraph to pull this information into a single unit. The use of the annotation in turn is to create one or more substructures. For instance, one annotation might contain event times as observed by one person, while a second annotation might contain the same event times as observed by a second person, as well as potentially giving an indication of reliability about the data.
By the way, in case you’re curious, the mermaidjs code for rendering this information is as follows:
graph LR
classDef person fill:#ff0,stroke:#333,stroke-width:1px;
classDef property fill:#808080,stroke:#000,color:white,stroke-width:1px;
classDef event fill:#808,stroke:#333,color:white,stroke-width:1px;
classDef annotation fill:#84F,stroke:#333,color:white,stroke-width:1px;
classDef literal fill:#f0f0c0
Person:JaneDoe(["<b>Person</b>\nJane Doe"]):::person;
Event:CoffeeshopMeeting(["<b>Event:Meeting</b>\nCoffeeshop Meeting"]):::event;
subgraph assertion1;
direction LR;
Person:JaneDoe --has meeting --> Event:CoffeeshopMeeting;
end
assertion1 -- has annotation -->
Annotation:Annot1["_:Annotation1"]:::annotation
Annotation:Annot1 -- hasStartTime --> startTimeJD[[19:12]]:::literal
Annotation:Annot1 -- hasEndTime --> endTimeJD[[20:00]]:::literal
Annotation:Annot1 -- hasObserver --> Person:JohnShue(["<b>Person</b>\nJohn Schue"]):::person
One of the reasons I like working with a visualization-first approach to modeling is that it can frequently simplify our perceptions of the model. Reifications and annotations can often appear to be far more complex than they are when everything gets reduced to just a node and chain view, but creating visual conventions (such as different colors or patterns of subgraphs, or the use of rectangular vs oval nodes for IRIs compared to blank nodes) the graphs can become both more meaningful and usually more compact.
I suspect that visualization-first modeling will becoming important in part because it is easier to project exemplars, or specific examples (use cases), to a more generalized model than it is attempting to build top-down models that allow little room for experimentation. Additionally, with the move towards increasingly prompt-based modes of design and thinking, it is likely that we can build small prototype exemplar designs into the model then scale these designs up and see where conflicts arise, long before we pull live data into the database.
Summary
In the article “Modeling Corner: Events, Meetings and RDF-Star”, the author delves into the complexities of data modeling, focusing on the role of events. Events inherently possess temporal aspects and are challenging to model and are seen as abstract objects that bind one or more ‘Things’ together at a specific time and location. Using a meeting as an example, the author also explores the concept of inheritance in modeling and discusses differing design philosophies in modeling, ranging from reducing the overall number of classes and properties to having tens of thousands of classes in an ontology.
The article further addresses the complexities introduced by time in modeling, using the example of a meeting that exists only in potential until it occurs. The author introduces RDF-Star, a notation for building assertions in Turtle, a language for RDF, which allows for the creation of annotations on phrases, effectively reifying the statement. The author concludes by emphasizing the importance of a visualization-first approach to modeling, arguing that it simplifies our perceptions of the model, making the graphs more meaningful and compact.
Kurt Cagle is the editor of The Cagle Report. He lives in Bellevue, Washington, with his wife, kids, and weird furry wingless sociopathic dragons (meow).
You must log in to post a comment.