In my last article (Why Prompts Are The Future of Knowledge Graphs), I explored why a prompt-based approach to knowledge graph queries made sense, just as they do for ChatGPT. Still, the article reached a length where digging into details pushed it into white paper territory (always a problem). Part II is my opportunity to look at different “prompts” that may apply to such a hypothetical engine and use cases that may better explain the principles involved. You are welcome to explore these; I even encourage building technologies based on the ideas covered here.
Each section will cover a particular prompt or conversation. I recognize with prompts that the first rule of such prompts is that you NEVER know what kind of questions or requests are going to be asked. Still, as it turns out, most people do tend to follow natural language patterns that can be taken advantage of when attempting to uncover meanings. Some of these can be expanded with the use of neural nets (and I’ll discuss this as well). Still, you can get a lot of mileage without resorting to “traditional” machine learning, especially when you have a perfectly good knowledge graph in the background.
The Google Keyword Prompt
The Google paradigm is still one of the most pervasive of all prompts and usually involves listing the relevant keywords for the things that people are looking for:
Tesla Model S, Silver, Best Price, Seattle Area
There are immediately several useful things that can be inferred from this prompt.
- When people work with prompts, they tend to go from most specific to least specific – what kinds of entities are they looking for, immediate qualifiers for these entities, then secondary qualifiers, and so forth. This creates a rough graph traversal pattern: find the candidate thing, then work outward from that candidate to determine whether constraints are satisfied.
- Punctuation and order both help identify tokens. Commas in prompts, for instance, often delimit multi-word tokens such as Best Price, Seattle Area.
- The distinction between class and instance is fuzzy at best for most people. Technically,
Tesla Model S
is a brand name, which can be seen as being a form of category. - Partial matches (saying
Model S
, rather thanTesla Model S
) will still match the search query, but the confidence is lower. Never expect that you will have full matches, but give bonus points if you do happen to have one. - Best Price implies an ordering using price as a key, just as Newest implies an ordering based on how new the vehicle itself is, not how recency the posting is (Most Recent better captures that). These are also terms that can be stored within the taxonomy.
What gets returned from this query should be a summary block that contains enough information to disambiguate entries and provide handles for doing something with the data (such as contact information for the dealer). Summaries are a type of presentation that will vary from one type to the next, which suggests that summary builders should be functionally connected to type or shape definitions. Defining a standard for CSS within your knowledge graph (such as each entry being wrapped in a block with of the form <section class="summary typename">...</section>
can go a long way towards making consistent and manageable output.
The Find Prompt
The expressions Get
, Find
, Retrieve
, Search for
and so forth are all roughly synonyms, but should be thought of as directives. In some cases, they may provide different interpretations based upon the directive in question. They almost always appear at the beginning of a query, and their intent is similar to the Google Prompt – locate the summaries (or other related presentation forms) of those resources that share the subcategorization being given, with some parameterizations:
Find all silver Tesla Model S cars in the Seattle Area for sale, best price
This is essentially the same query as the one above. Sometimes there may be anthropomorphism:
Would you please final all Tesla Model S cars in silver in the Seattle for sale at the best price?
Again, this query can tell you several things:
- Prepositions, in general, should be treated as punctuations unless containment is involved (e.g.,
in
has comparatively little semantic value, butoutside of
is significant). - Anthropomorphic terms such as
Would you
,Could you
, etc., could be interpreted as asking permission from the prompt engine to act or querying a capability. The prompt engine should respond something like: “This system is capable of performing the action requested. Would you like me to do so?” and initiate the action if answered in the affirmative. It is, in essence, teaching the user not to ask the question in the first place. - Sometimes you get English standard form qualifiers for terms (e.g.,
silver Tesla Model S cars
). There are several potential sets of tokens here, includingsilver
,silver Tesla
,silver Tesla Model
,silver Tesla Model S
,silver Tesla Model S cars
,Tesla
,Tesla Model
,Tesla Model S
,Tesla Model S cars
,Model
,Model S
,Model S cars
,S cars
, andcars
. As a general rule of thumb, the longest token that matches a given term in the taxonomy will also be the most significant term, with the next longest being the second most relevant, and so forth. - Similarly, in taxonomies, the most specific matches (those closest to the leaves of the taxonomy tree) also have a higher priority than more generalized matches.
- This implies, of course, that the prompt engine does a pre-query of the knowledge graphs in order to determine these weights and optimize the final query. This two-part process – using a query to build a query – is actually a powerful way to separate concerns and again stresses the fact that prompt queries are generally pipelines of heterogeneous types of queries, each optimized for different tasks.
The This
Prompt, Aliases and Transactions
The result of any query internally should be a named graph. Let me emphasize this: The result of any query internally should be a named graph.
That query may be used to construct a response or some other action, but it is persisted throughout a session, and its URI is queued in a stack. This makes it possible to have multiple prompt query/response sessions and refer to that query graph result based on how deep in the stack it is. The most recent query is always given the name this
.
It is also possible to assign an alias to any named graph with expressions like call this alias
or set alias = this
. Expressions such as with alias
or using alias
can then identify which graph is being referred to (if not specified, it will ALWAYS be the most recent graph queried, aka this
).
As an example:
Find all silver Tesla Model S cars in the Seattle Area for sale and call this ModelS.
> Response
Find all silver Tesla Model Y cars in the Seattle Area for sale and call this ModelY.
>Response
Combine ModelS and ModelY and call this TeslaCars. Clear ModelS and ModelY.
The alias TeslaCars now contains a mix of all silver Model S and Model Y cars in the Seattle area, while both these named graphs have been emptied and deallocated. Note in the last line that both and
and the period (.) break apart transactions.
There’s one critical distinction between LLMs and knowledge Graphs – only the latter are transactional systems. It’s surprisingly easy to break the session state of an LLM, but this has no consequence on the underlying model. Breaking the session state of a knowledge graph could be very destructive, so more safeguards are needed for prompts.
The Show
Prompt, as, and Presentations
The Show prompt takes a named graph and renders it in some form of presentation. Typically when a prompt is executed, there is an implicit Show statement:
Find all silver Tesla Model S cars in the Seattle Area for sale. Show this as default.
This statement takes the graph and generates a page containing multiple sections, each with a summary entry. The generator of the output checks to see if there is a SHACL shape for that class or taxonomy category that contains a definition for what the default looks like. If nothing exists for that class, it walks up the relevant inheritance tree until it finds a class that does include a default and uses that.
There are a few prenamed (and generally protected) graphs. The prompts show classes
and show properties
show lists of classes and the properties associated with given classes while show presentations
showing the currently available presentations in the system. show namespaces
gives the prefix/namespace map, and show graphs
shows the current named graphs, and whether the user can modify the graphs.
The as
keyword indicates the presentation to use on the specified named graph. This will, of course, be system-dependent but will likely include things such as full,
newform,
editform
, table
, list
, page
, post
, digraph
, text
, markdown
, rdf-xml
, json-rdf
, json-ld
, turtle
, trig
, csv
, excel
, and so forth.
The including
keyword is used as a way of parameterizing the presentations and is used especially with tables, posts, pages, and diagrams. For example, for the TeslaCars named graph from the previous section, a table might be given as:
show TeslaCars as table including price, model, year, drive
Links are usually bound to the default labels and pass the graph URI of the relevant object to create a hypertext link in the interface. A graph URI is (more or less) what you get when you do a DESCRIBE in Sparql, and may or may not be the same URI as the object itself (there are arguments both pro and con on this).
Some of the presentations may have fairly sophisticated interfaces. For instance, I could see a graphiQL editor being invoked via the show
interface, while newform and editform make it possible to pop up a visual editor to create or modify new entries.
There are a few additional directives that round out capabilities. CLEAR clears named graphs (if the user has write permission) and CLEAR SESSION clears the whole session. LOAD loads graphs, and can be amended with LOAD LOCAL, LOAD URL, or LOAD TEXT. SELECT, CONSTRUCT, DESCRIBE, and ASK shadow the normal SPARQL language calls, though are adapted to work with environmental variables, while VALIDATE provides SHACL support.
Knowledge Graph Notebooks
By now, some of you may be thinking “Hmm, interactive chatbots, contextual awareness, fairly loose semantics – this sounds a whole lot like Jupyter Notebooks for RDF. To me, that’s where most of this is leading. The approach discussed here might not be as freeform as LLMs, but again its worth emphasizing that LLMs are only locally transactional – you cannot change the state of an LLM without rebuilding it (that’s not quite true, but it is far from easy).
In many respects, notebooks are a perfect tool for the 21st century. They capture the discovery process that is intrinsic to programming, they can be shared, and increasingly they allow for the interchange of data in a contextual manner. Knowledge graphs need to be easier to work with, and prompts (and notebooks) are a major step forward toward achieving that goal.
Kurt Cagle is the Editor in Chief for The Cagle Report and the Principal of Semantical,LLC, a knowledge management consultancy. He can be reached at kurt.cagle@gmail.com, or on Linked In. He is looking for business opportunities.
You must log in to post a comment.