Case study

Case Study: True Reasoning

By Legaché

This case study compares the kind of evidence, rigor, and reasoning depth available in Cassandra versus a typical RAG/LLM pipeline.

On this page

Background Content Issue Question 1: Stringency Follow-up: Are you sure?Depth, Nuance, and Reasoning Conclusion

Section 1

Background

According to the global advisory firm Gartner, unstructured data makes up to 80% of enterprise information. Information such as documents, reports, emails, logs, support tickets, and much more are siloed in unactionable files and stored much in the same way that data has been stored since the invention of the filing cabinet. There have been steps toward modernization—keyword search was a boon in the 90s, and the advent of cloud computing was a huge thrust to managing data at scale with efficiency and convenience—but again, this data is stored in files much the same way as it was 100 years ago. Progress pushed big data to the brink, stored it far away, and made it easily shareable: a resounding success, but not a stopping point.

Large language models (LLMs) were supercharged with the 2020 breakthrough of Retrieval Augmented Generation (RAG), and their combined power has been the gold standard in enterprise AI use ever since. Unfortunately, this combo has critical shortcomings: RAG and LLMs retrieve text, not truth. They don’t provide reasoning, they provide similarity. And most concerningly, they hallucinate and compound their fallibility as document sets scale.

LLMs and RAG have improved exponentially, but there is a fundamental flaw in the algorithm that inhibits both scaling and reasoning. In the same way we have extracted the limits of performance out of the internal combustion engine, the RAG/LLM combo is handicapped and confined to performance constraints inherent to its architecture. What is needed is a new way to harness the capabilities of that engine—a superior transmission and accoutrements that elegantly magnify neural bandwidth.

A paradigm shift in data analysis, interaction, and storage.

Cassandra was built as the solution to the problems of modern data management: an Information Relationship Management (IRM) system that understands your world, demonstrates trust through rigorous audit pipelines, and scales with your needs. There are two operative words in that title that merit exploration: “information” and “relationship.” Information in a vacuum is useless. It’s through nuanced connection with other key insights that meaning is cultivated, and the more connections you can highlight and preserve, the greater the value of any individual piece of information.

In the information that follows, we explore the depth and breadth of reasoning ability that Cassandra affords the user. Specifically, we compare the richness of information possible using Cassandra relative to the typical RAG/LLM pipeline outputs used in status quo enterprise systems.

Section 2

Content

For this example, I uploaded a corpus of intellectual property agreements into Cassandra as well as ChatGPT 5.2. The exact four contracts analyzed:

INTELLECTUAL PROPERTY AGREEMENT between THE BABCOCK & WILCOX COMPANY and BABCOCK & WILCOX ENTERPRISES, INC. dated as of June 26, 2015.
https://www.sec.gov/Archives/edgar/data/1630805/000119312515276710/d43214dex1017.htm
INTELLECTUAL PROPERTY AGREEMENT, dated as of May 14, 2016 (this “Agreement”), is by and between WestRock Company, a Delaware corporation (“Parent”), and Ingevity Corporation, a Delaware corporation (“SpinCo”).
https://sec.gov/Archives/edgar/data/1653477/000157104916015307/t1601330_ex2-1.htm
INTELLECTUAL PROPERTY AGREEMENT between MARV ENTERPRISES, LLC, and PREMIER BIOMEDICAL, INC, and TECHNOLOGY HEALTH, INC.
https://www.sec.gov/Archives/edgar/data/1515740/000165495420005442/biei_ex102.htm
INTELLECTUAL PROPERTY AGREEMENT between VISUALANT INCORPORATED and KENNETH TURPIN (Turpin).
https://sec.gov/Archives/edgar/data/1074828/000119983505000315/exhibit_10-1.txt

Cassandra interface screenshot. — Figure 1.1 Cassandra

ChatGPT interface screenshot. — Figure 1.2 ChatGPT

Section 3

Issue

LLMs tend to make baseless subjective assessments and lack rigor in decision-making and judgment calls. In addition, they gallantly reaffirm their misinformed conclusions to the user and do not provide sufficient evidence to support the conclusion. In sum, they lack reasoning ability over complexity. For that reason, I chose a somewhat qualitatively dubious question to elicit a unique response from the respective systems.

The question:

“Which contract has the most stringent terms and conditions?”

This question is important because within it lie implicit assumptions about what is important to stakeholders, which in most cases are people and the resources they own or represent. Below are the exact responses recorded from Cassandra and ChatGPT.

Section 4

Question 1: Stringency

Cassandra response screenshot. — Figure 1.3 Cassandra

ChatGPT response screenshot. — Figure 1.4 ChatGPT

Cross-referencing the answers, we can see that Cassandra provides relevant and explicitly detailed information related to the “tight financial commitments” and the austere “reversion triggers” of the Premier Biomedical agreement, all with citations directly to the source documents. We see a cap on funds of $40,000,000 and a mention of the obligatory payment dates, all with citations.

By comparison, ChatGPT formed an entirely different conclusion: that the Babcock & Wilcox agreement is more stringent. Its primary piece of evidence is that the agreement is “the longest and most structurally dense” and that “it governs every major IP category,” concluding that length and breadth of topics covered imply stringency. Also, there is no citation to support any of the claims.

On the surface, the ChatGPT response is reassuring and inspiring, but a cursory investigation of the claims reveals a substance-less response, without one citation to support the questionable claim. The comparison is stark: Cassandra unearthed the value implicit to stakeholders, a judgment call rooted in hard financial obligations and timelines. Notably, Cassandra acknowledged the breadth of the Babcock agreement and contextualized its value as it relates to the whole. See Figure 1.5 for a closer look.

Cassandra evidence and citations screenshot. — Figure 1.5 Cassandra

Cassandra successfully identified the perpetual nature of the agreement that ChatGPT claimed was most stringent, but also noted the royalty-free terms, which “are comparatively permissive.” By acknowledging the stringent nature of the Babcock contract while contextualizing its relevance to what is meaningful, the unforgiving financial terms and rights in the Premier contract become critical. This is demonstrable context awareness, and this is the level of insight professionals need in an information-saturated market where focus and attention are the scarcest resources of all.

The next query was designed to check the systems against each other to see if they would redact or modify their claims.

Section 5

Follow-up: Are you sure?

The question to both:

“Are you sure that the Babcock & Wilcox IP agreement is not more stringent?”

Cassandra follow-up response screenshot. — Figure 1.6 Cassandra

ChatGPT follow-up response screenshot. — Figure 1.7 ChatGPT

To its credit, ChatGPT acknowledged that the financial and time considerations embedded within the Premier agreement merit mentioning and that there could be a difference in transactional severity versus structural stringency. But let’s inspect the claim “even though it feels harsher in some places.” As the basis for a holistic quantitative and qualitative analysis of the contracts, the term “feel” is not relevant and positions the user toward an existential analysis of what should be a straightforward assessment of values as they pertain to what people value—time, finances, and property.

To further examine the trajectory of ChatGPT’s claims, we can see the expanded explanation in Figure 1.8 below. ChatGPT is now citing “existence, control, and future behavior” as the benchmark for stringency. There are mentions of specific clauses with extremely vague summaries and a citation-like link underneath each claim that allows you to re-download the entire document you uploaded and read it yourself.

ChatGPT expanded explanation screenshot. — Figure 1.8 ChatGPT

By comparison, Cassandra’s conclusion to the same question exhibits a crystal-clear reasoning path that showcases the depth of response needed to qualify as actionable insight, all with the assurance of multiple citations directly to the passages they were procured from. See Figure 1.9.

Cassandra cited reasoning screenshot. — Figure 1.9 Cassandra

Section 6

Depth, Nuance, and Reasoning

The last line of inquiry was to ask Cassandra about ChatGPT’s claim: that the Babcock & Wilcox agreement is indeed more “structurally stringent.” In Figure 2.0 below, we can examine the results.

Cassandra analysis of structural stringency screenshot. — Figure 2.0 Cassandra

The key term in this conclusion that showcases the depth of Cassandra’s reasoning abilities is “potentially.” With this statement alone, we reveal a level of nuance that the real world and all its complexities are fraught with. Where ChatGPT without hesitation reaffirmed and concluded unquestioningly, then proceeded to build a case to support its conclusion, Cassandra did quite the opposite. Cassandra contextualized and reasoned through source documents to support its claims with citations to direct passages. When asked to investigate the validity of a claim from another system, instead of either agreeing or disagreeing it surfaced key evidence that “potentially” supports the claim—but as we will see below, there are other critical considerations before a definitive judgment can be made.

Section 7

Conclusion

Below is the final set of considerations by which Cassandra introduced plausibility, nuance, and critical context into what is not a clear apples-to-apples comparison nor a black-and-white-esque decision-making tree. See Figure 2.1.

Cassandra concluding considerations screenshot. — Figure 2.1 Cassandra

The true answer to this question is that it’s not answerable. It cannot be definitive because the evidence needed to make a quantifiable judgment is not available in the snippets (documents) provided—this is a crucially important insight. Cassandra identified the “unknown-unknowns,” and that is a critical component of context awareness: the type of cognition needed to scale with modern demands where often there are no easy answers. Cassandra will never mention “feeling” anything, to quote ChatGPT in Figure 1.7. What matters is what is known and what is unknown and how that fits together in the context meaningful to the user. Cassandra allows users to explore the depth and nuance of data that contributes to meaningful conclusions while maximizing efficiency. In an environment where attention, energy, and time are the scarcest resources, Cassandra offers users the power to focus their attention on the work that really matters with fidelity, depth, and assurance.

“Information is the most fundamental unit of the universe.”

— Demis Hassabis, Google DeepMind CEO