This is the first analysis for The Cagle Report, a look at the Dgraph JSON Graph database with GraphQL support.
The following article is a paid analysis of a software product undertaken with the understanding that it is intended to be as unbiased and objective as possible. As such, while I will identify features that I feel make Dgraph worthwhile (and I would not have undertaken this analysis in the first place if I did not feel it is worthy of analysis), I also will point out issues that other platforms may better handle.
Defining Knowledge Graphs
The term “knowledge graph” first emerged about five years ago to describe a highly connected network graph designed to hold semantic information. Graphs underlie almost all data structures used in computing, but there are a few features of knowledge graphs that differentiate them from other structures:
- Addressable. Globally unique identifiers mark concepts rather than simply terms, making for an open-world assumption
- Schematically Accessible. Schema and structure should be accessible from instance data.
- Hypergraph. Predicate objects can take on multiple values.
- Reifiable. Supports reification, or annotations about assertions.
These are fairly advanced restrictions and, to some extent, overlap. Elastic has a graph structure, but it is based upon specific term matching, and you generally cannot query the schemas of Elastic’s graph, nor can resources have forward-facing arrays (an essential requirement for hypergraphs). Neo4J can support most of these same properties, albeit with some limitations. LLM-based AIs are knowledge-graph-like but are not (currently) addressable.
Most knowledge graphs are built around RDF triples, with each triple consisting of two labeled vertices (or nodes) and a connecting edge (predicate or property). This approach satisfies the above restrictions but also means that most knowledge graph implementations are generally built around Turtle – a fairly robust data encoding language that traditionally does not work well with JSON, which has become the de facto standard for data interchange.
GraphQL and Dgraph
In 2015, Facebook (now Meta) released its GraphQL library, making it possible to work with JSON data stores like graphs. GraphQL works upon the idea that knowledge graph-like properties (most notably the idea of representing concepts as nodes with globally unique identifiers) can be applied to JSON. That types could be added to a schema to identify collections of resources.
This form of graph was not, by itself, an RDF graph, but instead provided a way to access collections based upon types, retrieve individual entries given a resource key, and get clean access to labels consistently, all of which RDF also supported. It also used partial normalization – breaking down JSON hierarchies into subcomponents with individual identifiers. This normalization is especially important because it means that if two different hierarchies reference the same object, the system need only maintain one such object, not multiple copies. By avoiding “reference by copy”, the system avoids duplicates that can, in turn, be a magnet for contradictory or incompatible data. This problem plagues traditional document (JSON or XML) databases but isn’t that much of an issue with relational databases.
GraphQL has proven to be quite popular with web developers, as it solves two critical problems. First, GraphQL provides a mechanism for automatically creating, updating, and deleting JSON content using web service mutations. The often tedious process of creating content can be significantly simplified by specifying a schema that can be queried.
Similarly, when querying that same data, the person making the request can shape the response to better match their needs rather than relying upon the response from the person sending the data. The ability to shape the response at query time means less time spent rebuilding JSON on the client side while at the same time cutting down on the number of steps necessary to traverse deep data structures. The interface for managing this is the GraphiQL client, which can be used as is or as an editor to create such queries as templates.
Yet for all this, GraphQL does face limitations. Maintaining links by reference can get very complex when used with traditional JSON stores such as Mongo or CouchDB, both of which predate GraphQL by several years, and the native code used to manage the GraphQL server doesn’t handle certain features – such as transitive closures or annotations – all that well. This often means that a significant amount of the code used for GraphQL operations (query and mutation) needs to be custom-written based on the back-end platform. Finally, GraphQL by itself is not an RDF data store, so it isn’t all that compatible with the RDF stack that is still used extensively in the knowledge graph community.
What this suggests is that if you were to build an application that supported necessary knowledge graph functionality while at the same time being built on the foundations of GraphQL, this hybrid product would have sufficient capabilities to be a contender in the knowledge graph market. This is precisely the case for Dgraph.
Dgraph was launched in 2016 out of Palo Alto under the company of the same name. Written in the Go language, Dgraph is a purpose-built GraphQL server with an underlying hybrid RDF/Property graph model that would work well with JSON while still gaining many of the benefits of RDF.
Dgraph Strengths and Weaknesses
Dgraph’s strengths are extensive, and even the following list is not fully comprehensive:
- Ease of Use. Geared to make using GraphQL as easy and seamless as possible
- Cloud Native. Unlike many older graph stores, Dgraph is easily deployable in the cloud and can create shared content across multiple graphs.
- Performant. Using Go and modern internal indexing makes Dgraph competitive in the broader database space, not just among knowledge graphs.
- RDF Oriented. While Dgraph’s native language is JSON, it can read and create RDF nt data with the system.
- Facets. Beyond regular RDF, DGraph can add facets corresponding to rdf-star expressions. These are invaluable for both annotations and version management.
- DQL. The DgraphQL is designed as a declarative language for graph traversal and generative output, complementing what exists for GraphiQL.
- LLM Aware. There is no question that the future of knowledge graphs will almost certain align with large language model systems such as ChatGPT. Dgraph is developing a distinct API for working with AI-based systems and driving the production of such systems.
- Extensions and Lambdas. The ability to use DQL for both query and mutation processes also means that extensions can be written (in Javascript) for performing data analytics, generating reports, making comparisons, running transitive closures, and searching documents. Lambdas are extensions in host languages for running certain functions, making performing transformations and workflow gating possible.
- Named Graph Support. RDF Triples can be stored and accessed in separate named graphs, making graph workflows possible. This capability is true multitenancy – each graph exists within its own security context. However, if two graphs have the same security context, they should both be externally available.
A powerful and surprisingly sophisticated graph database emerges that rethinks many of the norms and expectations of what a graph database should be.
For all that, there are a few weaknesses in the product or at least capabilities that don’t come out of the box:
- W3C RDF Stack (RDF, Sparql, OWL, SHACL, etc.). RDF nt is supported as an input/output format. It should be noted that converting from nt to other formats (up to and including JSON-LD and Turtle) is relatively straightforward in Javascript, so this isn’t that major a limitation. Schema is established using GraphQL and DQL templates.
- No JSON-LD. GraphQL is generally more useful, and schema can be written to convert JSON-LD formats to GraphQL templates.
- No Network Visualization. MermaidJS, GraphViz/DOT/ VisJS, d3.js, and a host of other tools can be used with Dgraph RDF and/or DQL to visualize network diagrams and data models.
- Marginal Inferencing. It is possible to do inferencing with Dgraph, but this is more cumbersome to do with any GraphQL system than RDF systems – it’s one of the few areas where RDF systems beat out labeled property graph systems. Having said that, basic inferencing can be done via lambdas.
It’s also worth noting that Dgraph exists under an open-source (Apache 2.0) license, though enterprise features are available as a separate commercial offering. Dgraph runs on Linux Ubuntu and can be run under Kubernetes, Docker, or Amazon virtual machines of varying configurations (and cost).
Dgraph Use Cases
Knowledge graph (KG) use cases are always difficult to pin down because you can use KGs in so many different ways. However, the ability to use (and transform) JSON so effectively with Dgraph makes the platform especially well suited for web-centric applications, including the following:
Semantic Publishing Systems
Publishing should be a natural use case for semantic systems, but in practice, there’s a severe impedance layer between working with JSON and Javascript objects on the client and Turtle on the server that makes creating such a system quite problematic. With Dgraph, this is no longer the case. With a very thin nodeJS layer acting primarily to manage caching and access, Dgraph can manage taxonomies, publishing objects and widgets, user roles, workflows, entity enrichment within content, and semantic interconnectivity. Such systems are comparable in functionality to WordPress, WebFlow, Drupal, or others but with far less complexity.
Product Documentation Systems
Documentation systems are a form of a semantic publishing system that typically ties together text content, images and vector graphics, video transcripts, conceptual taxonomies, PDFs, parts catalogs, and related materials. This can also be used for help systems and chat interfaces.
Large Language Model Editors
ChatGPT and similar systems utilize several different mechanisms for construction, including ingestion of documents, entity extraction, natural language processing,, and similar methods, along with reinforcement learning and graph embeddings. Dgraph, in particular, is well-suited to ingesting and curating the content that goes into almost all aspects of large language model creation. Indeed, in many respects, you can think of Dgraph as being the complement of an LLM, providing the means to curate content, retain provenance and governance, identify core concepts that can then be referenced by key in the LLM, perform master data management, and update the model with new information. This makes a Dgraph-based knowledge graph superior to a traditional Turtle-based one for the tasks involved.
Scene Graph / Supply Chain / Digital Twin Manager
Scene graphs play an important role in gaming, simulations, and increasingly digital twins, in effect identifying the location and position of resources as they move within a 3D space or globe and maintaining metadata about those resources in a broader context. Again, this is an area where encapsulating information as JSON while maintaining RDF structures makes Dgraph a superior choice to Turtle-based systems, especially given the comparative ease of mutating content in that environment.
Graph Analytics / Route Analysis
One key distinction of Dgraph compared to most RDF graph systems is that edges in DGraph can hold facets. Such facets can hold annotational information about the edge (which, like most property graphs, identifies the edge uniquely rather than as something that has a label). DQL can assign multiple facets to any given edge, and can also then traverse across an edge path to calculate weighted paths used for optimization. Doing something similar in RDF through RDF-*is possible, but it is far less efficient than using property graph calculations.
Data Catalogs
The hybrid nature of Dgraph also makes it a good candidate for data catalogs, especially when combined with the ability to run CLIs or Lambdas through NodeJS, Python, or Go. Again, there is a fair amount of overlap between data catalogs, service API managers, and product documentation systems.
AI and Others
This list can get extensive, as Dgraph combines the ability to contain and query JSON structures with the robustness of semantic systems. As AI systems become more commonplace, this ability to query and transform JSON makes Dgraph a natural tool for integrating different AI models in a way few other platforms can equal.
Dgraph’s power extends beyond these use cases, allowing data meshes of interconnected specialized knowledge graphs for extensive enterprise applications. As businesses embrace AI and deal with large-scale data, Dgraph emerges as a robust solution for seamless integration and advanced data management.
In summary, Dgraph emerges as a formidable player in knowledge graphs. Offering a marriage of GraphQL’s adaptability to JSON data stores and RDF’s semantic integrity, it promises a knowledge graph system that is user-friendly, cloud-native, performant, and RDF-oriented.
While it does have a few limitations, such as its compatibility with the W3C RDF Stack and lack of out-of-the-box network visualization, these can be mitigated with the right tools and knowledge. Furthermore, Dgraph’s open-source nature and enterprise offering allow for a broad range of applications, from semantic publishing systems to large language model editors and data catalogs.
The potential of Dgraph lies in its novel integration of different technologies, offering a flexible, efficient, and versatile tool for handling complex data structures. As more developers, businesses, and data scientists look for ways to manage and use knowledge graphs effectively, Dgraph appears to be a promising solution. Its ability to accommodate the varied demands of contemporary web-centric applications and AI models gives it a unique edge in the knowledge graph market.
Overall, Dgraph has proven to be an exceptional product that redefines the norms and expectations of what a graph database can achieve. Dgraph sets a new standard for the future of knowledge graph technologies by addressing the shortcomings of traditional JSON and RDF-based systems.
Kurt Cagle is the editor of The Cagle Report. He lives in Bellevue, Washington, with his wife, kids, and weird furry wingless sociopathic dragons (meow).
You must log in to post a comment.