This is the second of three articles about ChatGPT and Knowledge Graphs. In the first article, I looked at how ChatGPT has a lot of utility in the knowledge graph space. This article focuses more on how ChatGPT works at a conceptual level and how it compares to knowledge graphs.
ChatGPT has dominated the tech scene in the last few months, with the OpenAI initiative releasing ChatGPT 3.0, 3.5, and 4.0 within months of one another. The technology has been hailed as the end of the programmer or reviled as the next hype in the hype train (or perhaps vice versa), but there is no gainsaying that it has impacted data technology. On the flip side, there are significant questions about whether LLMs are likely going to replace both traditional and semantic data systems.
I don’t have a solid answer for the second question, though have my opinions. This article aims to look at the difference between large language models and knowledge graphs to understand better whether one is a replacement for, an augmentation of, or unrelated to the other. To do that, it’s worth understanding the underlying data models involved in each of these technologies.
Node Graph to Concept Context
At their core, both Large Language Models (which I’ll call LLMs for conciseness) and Knowledge Graphs (KGs) share concepts as a foundation. In the KG world, a concept is a thing that has a unique identifier of some sort, typically called a URI or IRI. This means you can always get the same concept back if you have this identifier.
With an LLM, on the other hand, you do not have such a formal identifier. Instead, you have the notion of a session and, within that session, a context. The session is initiated when you ask ChatGPT a question (or open a stream to the ChatGPT API). As the parser works through the prompt, it builds up a context that determines what information is known, then changes that context as more information comes in. It also gets stored in the context when it generates the output. Additionally, if the person is a previous querant, the previous conversations can also be referenced to shape that context further, in essence, using phrases to “solve” the equations that determine where in the information space future questions should be addressed.
This might seem like a significant limitation at first, and in some respects, it is. With a knowledge graph, most of the critical identifiers revolve around relationships, but some arguments having appropriate URIs is also significant from a master data management situation. However, in practice, the latter URIs are not that important because most information about entities typically also have the concept of local identifiers – string literals that represent ID numbers associated with people, things, media, property and so forth.
If I want to know where my FedEx package is, I will not write a query that will pass the URI of that package, likely because I will never actually know it. Instead, I query for the thing of type FedExPackage that has an identification tracking number of “1922591252” (not an actual number, mind you). I get back a handle – the URI for that package – that can then be used to retrieve what’s known about that package.
In the LLM case, what gets passed back is similarly a handle to a concept, with the context set up. If I ask ChatGPT to tell me where my Fedex package just by saying, “Find where 1922591252 is?” the application will first reason that the number must be an address or ID is, then will look at what comes back. If only one contextual item returns, this will be treated as the target, and relationships can be filled in from that ancillary response. If not, it will attempt to disambiguate the term by asking where the number came from, possibly making multiple suggestions.
In other words, while the mechanism for how the LLM stores its data differs from how a knowledge graph stores it, the mechanisms for resolutions are surprisingly similar in practice. The principle difference is that your indexes are deeper and more complex, and can be thought of as vectors rather than single values.
Fuzzy Pointers and Bayesian Trees
LLMs can be thought of as revenge for the Bayesians. In the Disney animated film Frozen, Princess Anna of Arundel, a lovelorn and very lonely young woman, who is more than a bit naive about others, meets Prince Han of the Southern Isles (Denmark). In the inevitable duet, Han sings, “We finish each other’s ______,” to which Anna jumps in, “Sandwiches!”. This is how Bayesian token selection works. There’s a probability (95%) that most people will say “sentences” because this is common when two people are supposedly in love with one another.
However, in that rate 5% case, there’s another action that may end up being used (in this case, where they do indeed finish each other’s lunches, say). However, this one-word change has a profound effect because it derails the conversation from the expected (the love duet about synchronicity) to dinner (the unexpected).
By itself, this might not make that much of a difference – conversations get derailed all the time, and usually, people pause and then attempt to reconnect or the original train. However, in the broader context of the movie, which became a major Franchise for Disney, this minor change shifts the probabilities, as more and more people think about the latter term when they hear the phrase “We finish each other’s ….”. In the context of the session, the probabilities have become different, and these probabilities become the acting graph of the session.
The model as it exists right now does not immediately make these changes back into itself … but these ARE still nonetheless recorded (and think about the implications of THAT for a bit). It means that when the model IS updated, these Baysians represent a new expectation for language.
These Bayesians are roughly the equivalent of predicates in knowledge graphs or, more appropriately, are the equivalents of reified statements in the knowledge graphs. This could be shown in Turtle-Star as:
<< :we :finishEachOthers :sentences >> :hasReification reif:r1.
<< :we :finishEachOthers :sandwiches >> :hasReification reif:r2.
reif:r1 reif:hasWeight 0.95.
reif:r2 reif:hasWeight 0.05.
bayesian:b1 bayesian:hasDistribution (reif:r1 reif:r2).
flowchart LR Message1 --> Parser Context --> Parser Parser --> Tokenizer Tokenizer --> TM[Template Match] TM --> Transformer Transformer --> OS[Output Stream] Transformer --> Tokenizer Transformer --> Context OS --> NM[Next Message] NM --> Parser
For the phrase :finishEachOthers, the sum of the weights of the reifications would add up to one.
This isn’t quite how an LLM works, but it’s close enough to see the analogy. These Bayesians, in turn, create branches that determine paths through the language space. This model is used for parsing prompts, but they are also used as transformers for generating responses. In essence, transformers try to determine, based on context, what the most likely response is, then uses the user’s response to that response to adjust the paths to establish a new context.
Critics of LLMs often say that the problem with such models is that they are stochastic – that is to say, they rely upon probability rather than the direct application of rules. That’s a bit misleading, however. A more accurate way of thinking about it is that the model determines the most likely set of rules based on contextual data and then applies these rules to generate a response. When the context changes, the set of rules changes, and the transformations accordingly also change.
Though considerably more complex, this isn’t all that different from a constraint pattern in OWL or a validation rule in SHACL. A better analogy may be XSLT acting on “documents” or “messages.” It is possible to create a master XSLT that can then call out subcomponents for identifying patterns (templates). The templates act as generators (or transformers), constructing new output based on message and context information. The output gets sent as a message back to the caller, who then performs a new action based on this message. Because XSLT (from version 2 on) can write to the environment, this also means that it can effectively change its environment – its context – based on feedback, albeit at a more fundamental level than transformers can.
flowchart LR Message1 --> Parser Context --> Parser Parser --> Tokenizer Tokenizer --> TM[Template Match] TM --> Transformer Transformer --> OS[Output Stream] Transformer --> Tokenizer Transformer --> Context OS --> NM[Next Message] NM --> Parser
This feedback loop is essentially the same regardless of the transformative technology, with the primary difference being the extent to which the context informs the operation of the parser. In RDF, session-state would be maintained in a specialized graph through the use of SPARQL Update, in LLMs, it’s maintained primarily by in-memory concept structures, in XSLT, it would be maintained as XML or JSON files.
Performance and Interoperability
At this point, it can be said that the time to retrieve a particular set of information is considerably faster for knowledge graphs than for LLMs. However, once you factor in the time to write the queries to retrieve the data you’re interested in, the equation tends to tilt much more toward LLMs for all but the most common set of queries.
LLMs employ natural language queries. This means that the system will attempt to interpret the query in such a way as to extract the desired information from the model in the closest manner possible. Curiously enough, if you write SPARQL, the ChatGPT model will attempt to interpret that query and generate a response.
select ?capital ?country where {
?s a Class:Country.
?a :hasCapital ?capital} limit 20
=================================
As an AI language model, I am unable to perform live queries on databases or generate real-time results. However, I can provide you with a list of some countries and their capitals based on my knowledge up to September 2021. Please note that this information might not be up-to-date, and I cannot guarantee its accuracy. Here are 20 countries and their capitals:
Afghanistan - Kabul
Albania - Tirana
Algeria - Algiers
Andorra - Andorra la Vella
Angola - Luanda
Antigua and Barbuda - Saint John's
Argentina - Buenos Aires
Armenia - Yerevan
Australia - Canberra
Austria - Vienna
Azerbaijan - Baku
Bahamas - Nassau
Bahrain - Manama
Bangladesh - Dhaka
Barbados - Bridgetown
Belarus - Minsk
Belgium - Brussels
Belize - Belmopan
Benin - Porto-Novo
Bhutan - Thimphu
If you need more specific or up-to-date information, I recommend searching the web or using a reliable source like the CIA World Factbook.generat
This is, admittedly, a straightforward query, easily interpreted. Still, it illustrates that a natural language model is a powerful tool for situations where you don’t know the underlying structure of the data. What’s as significant is that if you know the ontology you’re working on within a knowledge graph, you can do considerably more with a query that generates a query.
Given data in the schema.org ontology, write a SPARQL query that will retrieve the most recent country name, capital city and population as a SELECT statement. Return just the code.
===========================================
PREFIX schema: http://schema.org/
SELECT ?countryName ?capitalCity ?population
WHERE {
?country a schema:Country ;
schema:name ?countryName ;
schema:hasCapital ?capital ;
schema:population ?population ;
schema:dateModified ?dateModified .
?capital a schema:City ;
schema:name ?capitalCity .
FILTER NOT EXISTS {
?country schema:dateModified ?otherDateModified .
FILTER (?otherDateModified > ?dateModified)
}
}
ORDER BY ?countryName
By asking that only the code be returned, you have a mechanism to turn a natural language request into a (generally) accurate SPARQL script that could be sent to a triple store. This, frankly, is huge. SPARQL is a powerful language, but it is also one that relatively few people know well. Because ontologies vary from system to system, this makes it possible to query such knowledge graphs without having to go through a painful discovery period.
By changing from schema.org to GeoNames (another well-known geospatial ontology), the system warns that the ontology doesn’t have a most-recent property. Still, it can provide the rest of the information:
The GeoNames ontology does not provide a date modified property to determine the most recent data. However, I can provide a SPARQL query that retrieves the country name, capital city, and population using the GeoNames ontology. Here's the query:
PREFIX gn: <http://www.geonames.org/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?countryName ?capitalCity ?population
WHERE {
?country a gn:A.PCLI ;
rdfs:label ?countryName ;
gn:population ?population ;
gn:capital ?capital .
?capital rdfs:label ?capitalCity .
}
ORDER BY ?countryName
This query uses the GeoNames ontology to retrieve country names, capital cities, and populations. It does not include the most recent data filtering, as GeoNames does not have a "date modified" property. The results are ordered by the country name.
So what ontologies does ChatGPT recognize? Well, I asked it that very question:
As an AI language model, I can recognize and work with a wide range of ontologies, including those commonly used for knowledge representation and semantic web applications. Some well-known ontologies that I can recognize include:
- RDF (Resource Description Framework) – a general framework for describing resources and their relationships.
- RDFS (RDF Schema) – an extension of RDF that provides a vocabulary for describing classes and properties.
- OWL (Web Ontology Language) – a more expressive language than RDF and RDFS, used for creating complex ontologies and reasoning about them.
- SKOS (Simple Knowledge Organization System) – an RDF-based language for representing knowledge organization systems like thesauri, classification schemes, and taxonomies.
- FOAF (Friend of a Friend) – an ontology for describing people, their relationships, and their activities.
- Dublin Core – a set of standardized vocabulary terms for describing resources such as digital materials, physical objects, and web content.
- Schema.org – an ontology developed by major search engines to provide a shared vocabulary for describing web content to improve search engine indexing.
- GeoNames – an ontology for representing geographical information, including countries, cities, and other geographical features.
- DBpedia – an ontology derived from Wikipedia data that covers a wide range of domains and knowledge areas.
This list is not exhaustive, and I can work with many other general and domain-specific ontologies. If you have a specific ontology in mind, feel free to ask, and I can help you with questions or queries related to that ontology.
A little more work revealed that several other heavily used ontologies (such as FIBO, NIEM, QUDT, Mesh, UMLS, and SNOMED) have also been indexed by ChatGPT, as well as a number of common XML schemas.
What does this mean in practice? One of the big problems in the Linked Data space is that you have a Tower of Babel-like problem of a lack of ontology interoperability. LLMs provide a way to translate between two different ontologies if they’ve been indexed. Such translations are not likely to be perfect (both because not all ontologies have equivalent concepts and because the LLMs have to work with the specificity of the relevant ontology or schema) but significantly reduce the work of the mappings. It can also be expensive if the data files involved are significant, as RDF tends to be, especially in the sciences. One can use LLMs to generate the mappings as SPARQL Update or XSLT.
I recently worked on a program that took me six months to convert documents from a primitive text format to a NIEM format for a government agency. With an LLM-trained model, I suspect the same program could be done in less than a week, and that is primarily for testing.
Where Do Knowledge Graphs Fit?
Given all this, the prognosis for knowledge graphs is grim, but I don’t believe that’s quite the case. Knowledge graphs are still a superior tool for vocabulary management – capturing how an organization uses language to define processes and products. Knowledge graphs are a natural fit for supply chain management, partly because properties can be secured and partly because they are readily updateable. Training an LLM is not an insignificant undertaking; keeping such LLMs up to date is a significant undertaking, significantly as the scope of the LLMs expands.
Knowledge graphs are similarly superior when it comes to dealing with historical information. A knowledge graph can easily maintain two graphs: an eternal graph, which records events, and a “now” graph, which represents the state of reality at present.
Knowledge graphs are better at annotations, typically tied to reifications, and are considerably faster when performing graph analytics. Most LLMs have a graph at the core, but they are optimized for language tokenization, not data access. Additionally, this annotation capability makes data governance much easier within knowledge graphs.
It’s especially worth emphasizing that RDF Knowledge Graph content is supercharged input for LLMs. This has to do with the underlying Bayesian models that LLMs are built on top of. For most content, the chains of tokens that are used for making the “memory patterns” in the LLM model tend to be bushy – lots of potential alternative patterns, which means lots of computations to both identify/train the model and to look up information in the model. With semantic data, on the other hand, connections between information maps are much cleaner (the hard work of classification has essentially already been done) and hence can be consumed more efficiently than text-based data. From an anthropomorphic standpoint, RDF (whether as Turtle or JSON-LD) is more easily digestible than text.
Graph Neural Networks also factor into the newest generations of these models. I promise another article on this particular topic shortly.
I expect knowledge graphs to fit into the LLM framework primarily as first-level data providers feeding into the LLM model, acting as aggregators and clearinghouses to maintain data quality and provenance. On the flip side, I can see a hybrid model emerging where LLMs become the public interface for querying into knowledge graphs based on user queries and for its maintenance purposes. Knowledge Graph can serve as local (cached) sources of information that can then be incorporated into models based upon the value of the data. Keep in mind that LLMs face the same issue as Knowledge Graphs – they become more powerful the larger they become, but performance suffers once the data gets into a large enough domain.
I also believe that certain problem domains (such as Solid Pods or Publishing Nodes) are much better served by a knowledge graph than by LLMs, at least for several more years. These provide a nice balance between file and data systems and play an integral part in working with private data. This is a topic for another discussion, however.
Conclusion
I’ve been talking with associates at TopQuadrant about what’s happening in the ChatGPT space and the impacts that ChatGPT is having both there and throughout what I’d colloquially call the knowledge industry. Many of the points I’m making here have arisen from those discussions, and I applaud TQ for exploring these questions. Make no mistake about it – generative AI affects anyone working in the knowledge graph arena because there is a lot of overlap between the two approaches.
I will say that I think that LLM-based systems are the future of knowledge systems within businesses within the next decade. Still, I believe RDF solutions are also the future of knowledge systems … and have believed this for a while. They are not mutually exclusive. I’ve found that knowledge graphs tend to represent clusters of knowledge because domain knowledge is inevitably clustered. I’d argue that where things fall apart is when we try to federate that knowledge. I think the Solid approach can work for federation, but this still necessitates a mechanism for discovery that likely will still take years to articulate fully.
To that extent, I’d say that the closest analogy in how these two kinds of systems interoperate is to see knowledge graphs as being short-term memory – dynamic, malleable, precise, addressable – while machine learning is information processed by sleep – more contextual, somewhat more amorphous, less addressable, but capable of easier querying. Anyone who has spent a night cramming for an exam only to have it all vanish the following day as they look at the test understands this. Use clusters of knowledge graphs to accumulate information and keep that information current; use knowledge graphs to help build a deeper conceptual framework and encourage discovery.
It’s part of why I believe concentrating on transparency in LLMs is so futile. Memories are synthetic and arguably are more dreamlike – they may be informed by many different sources, often with different fractal structures. However, those memories come from knowledge that can maintain provenance, authority, and local context and, most importantly, can benefit from stewardship and governance. This is where knowledge graphs are perfectly positioned.
1 thought on “ChatGPT (LLMs) vs. Knowledge Graphs”
You must log in to post a comment.