Mind GraphPersistent Organizational Memory for the Age of Small Language Models
Gaurav Gupta
gauravg@deepnative.ai
Native AI Labs Inc.
Published Tuesday April 7, 2026 · Native AI Labs Inc.
Abstract
Modern AI agents suffer from a fundamental architectural limitation: they are amnesiac by design. Each interaction begins from a blank slate, forcing users to re-establish context, re-explain relationships, and re-teach preferences. The prevailing industry response — scaling model parameters into the hundreds of billions — addresses breadth of knowledge but not depth of organizational understanding. We argue that the critical bottleneck in enterprise AI is not model capability but contextual continuity: the ability to observe, remember, and evolve an understanding of how a specific organization operates.
This paper introduces Mind Graph, a persistent organizational memory system built on temporal knowledge graphs that provides AI agents with continuously evolving context about an organization's people, decisions, workflows, and relationships. We present architectural and empirical arguments that Small Language Models (SLMs), when augmented with Mind Graph's structured memory, can approach and in many cases match frontier Large Language Models (LLMs) on enterprise tasks that depend on organizational context — at a fraction of the cost, latency, and privacy exposure. Our architecture combines dual-layer entity extraction, multi-resolution deduplication, temporal fact management, and hybrid retrieval to create an AI system that doesn't just process language, but learns how your business actually works.
1 Introduction
1.1 The Amnesiac Agent Problem
Consider the following scenario: a sales manager asks their AI assistant, "What's the latest on the Acme deal?" A frontier LLM — regardless of whether it has 70 billion or 700 billion parameters — cannot answer this question. It has no memory of the Acme deal. It doesn't know who is leading the deal, what was discussed in last week's meeting, or that the procurement contact changed roles two days ago. The model may possess encyclopedic knowledge of sales methodologies, negotiation frameworks, and CRM best practices, yet it cannot recall a single fact about the user's actual business.
This is not a failure of intelligence. It is a failure of memory. The dominant paradigm in AI development has treated language models as stateless functions: input goes in, output comes out, and the slate is wiped clean. Context windows have expanded from 4,096 tokens to over 1 million, yet this expansion addresses only the breadth of a single interaction, not the continuity across interactions. An agent with a 1-million-token context window that resets every session is still, fundamentally, amnesiac.
The consequences of this architectural limitation are measurable. According to Gartner, at least 30% of generative AI projects are abandoned after proof of concept, with escalating costs and unclear business value cited as primary factors [1]. McKinsey's 2025 State of AI report reveals a stark adoption-to-value gap: while 88% of organizations now use AI in at least one function, only 39% report any impact on earnings [2]. The technology works in demos but fails in deployment because it lacks the organizational context required to deliver business-specific value.
1.2 The Roommate Model
We propose an alternative mental model for enterprise AI. Rather than a brilliant stranger who must be briefed from scratch at every encounter, consider a roommate — someone who shares your space, silently observes how you work, remembers what matters, and gradually builds an ever-deepening understanding of your patterns, preferences, and priorities.
This roommate doesn't need to be the smartest person in the room. They need to be the most attentive. They remember that you prefer morning meetings, that the Q3 forecast was revised downward, that Sarah from engineering disagrees with the new deployment timeline, and that "ASAP" from the CEO means by end of day but "ASAP" from the VP means within the week. This knowledge isn't found in any training corpus. It emerges from sustained observation of a specific organizational context.
The roommate model inverts the prevailing assumption in AI development. Instead of asking "How can we make the model smarter?", it asks "How can we make the model remember?" This reframing leads to a fundamentally different architecture — one where a smaller, more efficient model augmented with persistent, evolving memory outperforms a larger, stateless model that must reconstruct context from scratch at every interaction.
1.3 Our Contribution
This paper makes three primary contributions:
- Mind Graph — a temporal knowledge graph architecture that serves as persistent organizational memory for AI agents, featuring dual-layer extraction, multi-resolution entity deduplication, temporal fact management with invalidation, and hybrid retrieval combining keyword matching, graph traversal, and vector similarity.
- The SLM + Memory thesis — we present evidence and architectural arguments that Small Language Models (under 14B parameters), when augmented with structured organizational memory, can approach frontier LLM performance on context-dependent enterprise tasks while offering 10-100x cost advantages, sub-second latency, and complete data sovereignty.
- The Continuous Learning Loop — a system design in which AI agents continuously observe organizational communication streams, extract entities and relationships, resolve them against known organizational knowledge, manage temporal validity, and inject retrieved context into agent reasoning — creating a compounding intelligence that improves with every interaction.
2 The Context Bottleneck
2.1 Why Enterprise AI Deployments Fail
The enterprise AI landscape presents a paradox. Models are more capable than ever, yet organizational value remains elusive. The root cause is not capability but context. Hallucination — the generation of plausible but factually incorrect content — remains the primary barrier to enterprise AI adoption, with 62% of enterprise users citing it as their biggest concern [3]. More alarmingly, 47% of enterprise AI users report making at least one major business decision based on hallucinated content [3]. These hallucinations are not random; they are systematic consequences of models operating without access to the information they need.
The cost of this context gap is quantifiable. Fortune 500 companies lose an estimated $31.5 billion annually by failing to share knowledge effectively [4]. When employees depart, 42% of institutional knowledge unique to their role leaves with them [5]. Tacit knowledge — the informal, experiential understanding of how an organization actually operates — constitutes the majority of an organization's total knowledge base [5], yet it is precisely this knowledge that current AI systems cannot access.
2.2 The Inadequacy of Scale
The industry's primary response to the context problem has been to scale model parameters. The assumption — rooted in the scaling laws articulated by Kaplan et al. [6] — is that larger models will naturally acquire the ability to handle more complex contextual reasoning. This assumption has proven partially correct for general knowledge but fundamentally misguided for organizational knowledge. A model trained on the entirety of the public internet knows what a sales pipeline is, but it does not know your sales pipeline. It can explain organizational theory, but it cannot explain your organization. This distinction between general knowledge (learnable from public data) and organizational knowledge (inherent to a specific entity) represents the boundary where scale ceases to provide returns.
Furthermore, the economic argument for perpetual scaling is weakening. Goel et al. demonstrate that CO2 emissions of training LLMs scale linearly with both parameter count and dataset size, making continuous scaling environmentally unsustainable [7]. The cost of frontier model API access, while declining rapidly, remains non-trivial at enterprise scale [8].
2.3 Tacit Knowledge and the SECI Model
The challenge of organizational memory is well-established in management science. Nonaka and Takeuchi's seminal SECI model (1995) describes four modes of knowledge conversion within organizations: Socialization (tacit to tacit), Externalization (tacit to explicit), Combination (explicit to explicit), and Internalization (explicit to tacit) [9]. What makes this framework relevant to AI systems is the concept of Ba — the shared context space in which knowledge creation occurs. Mind Graph operationalizes this insight: by treating every conversation, email, calendar event, and document as a potential carrier of organizational knowledge, it performs continuous externalization.
2.4 Data Sovereignty and Privacy
Beyond capability, the context bottleneck has a privacy dimension. When organizations send business communications to frontier model APIs for processing, they expose sensitive operational data to third-party infrastructure. Seventy-one percent of organizations cite cross-border data transfer compliance as their top regulatory challenge [10]. The EU AI Act, becoming fully applicable in August 2026, introduces additional requirements for AI systems handling enterprise data [10]. Small Language Models deployed on-premise or at the edge eliminate this concern entirely.
3 Small Language Models: When Depth Beats Breadth
3.1 The Empirical Case Against Scale Dependence
A growing body of evidence demonstrates that smaller, domain-specialized models routinely outperform frontier LLMs on targeted tasks. This is not a marginal effect — the performance advantages are substantial and consistent across domains.
- Reasoning and STEM. Microsoft's Phi-4 (14B) outperforms GPT-4o on MATH (80.4% vs. 74.6%) and GPQA (56.1% vs. 50.6%) [11]. Phi-4-reasoning-plus (14B) achieves 78-81% on AIME 2025, comparable to DeepSeek-R1 (671B) [12].
- Medicine. Meerkat-7B became the first 7B model to surpass the USMLE passing threshold, outperforming GPT-3.5 (175B) by 9.7% with 25x fewer parameters [13]. Me-LLaMA outperforms GPT-4 on 5 of 8 medical benchmark datasets [14].
- Finance, Law, Code. FinGPT competes with frontier models on sentiment analysis [15]. SaulLM outperforms general-purpose models including GPT-4 and Llama-3-70B on legal benchmarks [16]. Phind-CodeLlama-34B achieved 73.8% pass@1 on HumanEval, surpassing GPT-4's 67% [17].
- Breadth of evidence. The LoRA Land study fine-tuned 310 models across 10 base architectures, finding that 4-bit LoRA fine-tuned models outperformed GPT-4 by 10 points on average — each fine-tuned for less than $8 per model [18].
3.2 The Densing Law
Xiao et al. formalize this trend as the Densing Law: capability density (capability per parameter) doubles approximately every 3.5 months [19]. Published in Nature Machine Intelligence, this finding suggests that equivalent model performance can be achieved with exponentially fewer parameters over time. The implication is that the advantage of frontier-scale models is perpetually eroding — what required 175 billion parameters circa 2020 requires 14 billion in 2025 and may require 1 billion by 2027.
3.3 The Economics of Small
The cost differential between SLMs and frontier LLMs is not incremental — it is structural. Inference: SLMs can match or surpass LLMs on tool use, function calling, and RAG tasks at 10-100x lower token cost [20]. Fine-tuning: QLoRA enables fine-tuning a 65B-parameter model on a single 48GB GPU [22]. For smaller models, the LoRA Land results demonstrate viable task-specific fine-tuning for under $8 per model [18]. Deployment: Apple's on-device 3B-parameter model achieves sub-10ms latency on A17 Pro silicon [23]. Qualcomm's latest NPUs achieve 220 tokens/second with INT2/INT4 quantization [25].
3.4 SLMs for Agentic Workflows
The NVIDIA position paper "Small Language Models are the Future of Agentic AI" [20] articulates why SLMs are not just cost-efficient alternatives to LLMs for agent systems but are inherently more suitable:
- Latency sensitivity. Agentic workflows compound latency across 5-10 tool calls — sub-100ms per-step inference is a functional requirement, not a luxury.
- Structured output reliability. SLMs with guided decoding produce valid JSON and tool calls with higher consistency than frontier LLMs using prompt-based formatting.
- Deployment density. A single GPU that serves one frontier LLM can serve dozens of SLM instances, enabling per-tenant model isolation.
4 The Limitations of Current Approaches
4.1 Retrieval-Augmented Generation: Necessary but Insufficient
RAG has emerged as the primary pattern for grounding LLM outputs in factual data [28]. However, standard RAG — built on flat vector similarity search over document chunks — suffers from several fundamental limitations: the lost-in-the-middle problem where retrieval performance degrades for content in the middle of long contexts [29]; multi-hop reasoning failure when answers require traversing relational chains [30]; temporal blindness where vector similarity cannot distinguish current from superseded facts; and structural amnesiawhere each query is independent. In Neo4j's benchmark, GraphRAG achieves 90%+ accuracy on complex entity queries while standard vector RAG degrades to near-0% as the number of entities per query exceeds five [31].
4.2 Fine-Tuning: Static Knowledge in a Dynamic World
Fine-tuning encodes knowledge directly into model parameters. While effective for stable domains, it fails for organizational knowledge that changes continuously. An organization's active deals, team assignments, project statuses, and interpersonal dynamics shift daily. Fine-tuning produces a snapshot that begins decaying immediately. Re-fine-tuning is expensive (early enterprise GenAI deployments range from $5M-$20M [1]), slow (days to weeks), and provides no mechanism for surgical updates.
4.3 Agent Frameworks: Stateless by Design
Contemporary agent frameworks — LangChain, CrewAI, AutoGen — provide sophisticated tool-use and multi-agent orchestration but treat memory as an afterthought. CrewAI explicitly optimizes by "focusing on task-specific data rather than retaining extensive conversation histories" [32]. These frameworks are plumbing, not memory. They can orchestrate a sequence of API calls with impressive reliability, but they cannot answer "What happened with the Acme deal last week?" because they don't remember last week.
4.4 Memory Startups: Layers, Not Alternatives
A new class of startups — Mem0, Zep, Letta (formerly MemGPT) — has emerged to address the memory gap. Each offers a valuable contribution, but all share a critical limitation: they are memory layers for frontier LLMs, not alternatives to them. Mem0 provides a cloud-first API-based memory layer [33], but memory is stored on Mem0's servers and extraction is passive. Letta proposes an OS-inspired three-tier memory architecture [34] but has no integration path for domain-specialized small models. Zep introduces a temporal knowledge graph [35] — most architecturally aligned with our own — but remains a memory layer rather than an integrated system that combines memory with domain-specialized inference.
4.5 Frontier LLM Memory Features: Unstructured Is Not Enough
Beyond dedicated memory startups, the frontier LLM providers themselves are acquiring memory capabilities. These features represent meaningful progress — but they store memory as unstructured natural language summaries managed by the LLM itself, which introduces fundamental limitations:
- No auditability. No mechanism to trace a remembered fact to its source document or conversation.
- No temporal validity. Unstructured memories have no notion of fact supersession.
- No compositional reasoning. The memory cannot support multi-hop traversal.
- No organizational resolution. LLM-managed memory cannot deduplicate "Jas", "Jasminder", and "J.S. Gulati" into a single canonical person entity.
- No data sovereignty. Memory resides on the provider's infrastructure.
5 Mind Graph: Architecture
5.1 Design Philosophy
Mind Graph is not a database, a search engine, or a chatbot enhancement. It is an organizational nervous system — a continuous process that observes, extracts, resolves, stores, invalidates, and retrieves knowledge about how a specific organization operates. The name "Mind Graph" reflects two design intentions. First, the system serves as the organization's mind — a persistent memory that accumulates and evolves understanding over time. Second, the underlying data structure is a graph — entities connected by typed, weighted, temporally-scoped relationships that, when visualized, exhibit scale-free properties with densely connected hub entities and sparse peripheral nodes.
5.2 Knowledge Representation
Mind Graph stores organizational knowledge across four primary data structures:
Entities represent the nouns of organizational knowledge — people, organizations, projects, topics, events, locations, and products. Each entity carries a normalized name for deduplication, a semantic description that accumulates knowledge across extraction events, a 768-dimensional embedding vector (Matryoshka prefix of a 3072-dimensional Gemini embedding), source provenance tracking, and a connection count serving as a hub score for relevance ranking.
Relations represent the verbs — typed connections between entities drawn from a controlled vocabulary of 37 predicates organized into six semantic groups: Identity/Structure (IS_A, HAS_A, LOCATED_IN, HOLDS_ROLE), Commerce (PRODUCES, SELLS, SUPPLIES, OFFERS), Innovation (LAUNCHED, DEVELOPED, ADOPTED, RESEARCHES), Performance (HAS_REVENUE, INCREASED, RESULTED_IN), People (WORKS_AT, MANAGES, REPORTS_TO, MENTORS), and Temporal (FOUNDED, JOINED, LEFT, COMPLETED, ACQUIRED).
Each relation carries temporal metadata: t_valid (when the fact became true), t_invalid (when superseded, null if still active), statement_type (fact, opinion, or prediction), and temporal_type (static, dynamic, or atemporal). Chunks store raw text segments (up to 1,200 tokens with 100-token overlap) as a naive RAG fallback. Episodes track provenance — the source type, timestamp, and metadata for every extraction event.
MIND GRAPH Temporal Knowledge Graph ┌─────────────────────────┐ │ │ │ Sarah ──WORKS_AT──▶ Acme │ │ (person) [dynamic] (org) │ │ t_valid: 2025-03-01 │ │ │ │ │ MANAGES [dynamic] │ │ ▼ │ │ John ──WORKS_AT──▶ Beta │ │ (person) t_invalid: (org) │ │ 2025-09-15 │ │ │ └──────────────────────────────┘
5.3 Dual-Layer Extraction Pipeline
Mind Graph employs a two-phase extraction strategy designed to balance immediacy with depth.
Phase 1: Inline Extraction (Synchronous). Every incoming message triggers real-time extraction before the agent responds. The system pre-filters trivial messages, loads recent organizational memory (last 50 tasks, 30 insights) to prevent duplicate extraction, invokes an efficient utility model (currently Gemini-2.5-Flash; targeted for migration to an on-device SLM) with an analyst prompt to extract tasks, insights, contexts, contacts, and relations, validates extracted entities (e.g., rejects tasks assigned to non-members), and performs semantic upsert with embedding-based deduplication (configurable thresholds; 0.7 for tasks, 0.75 for insights).
Phase 2: Background Indexing (Asynchronous). A periodic batch process performs deeper extraction on accumulated documents: text is split into ~1,200-token chunks with 100-token overlap; entity and relation extraction runs per chunk with the full predicate vocabulary; a gleaning pass re-runs extraction on each chunk (a technique from GraphRAG [37]); results are aggregated across chunks; entities are resolved against organizational profiles and contacts; and relations undergo temporal invalidation checks.
5.4 Multi-Resolution Entity Deduplication
Entity deduplication is perhaps the most critical challenge in organizational knowledge graph construction. The same person may be referenced as "Gaurav", "Gaurav Gupta", "GG", or "the new PM on the Acme account". Mind Graph employs a four-layer deduplication strategy:
- Layer 1: Exact Normalized Match. Entity names are lowercased, whitespace-collapsed, and matched against the
entity_name_normalizedindex. - Layer 2: Organizational Person Resolution. For person entities, the system queries the organization's
profilesandcontactstables with exact match, substring containment, and token overlap rules. - Layer 3: Fuzzy Semantic Match. The system computes embedding similarity. Candidates above 0.72 similarity with name-level similarity above 0.5 are flagged for LLM confirmation: "Are 'Gaurav Gupta' and 'Gaurav' the same real-world person?"
- Layer 4: Description Accumulation with Summarization. When entities merge, their descriptions accumulate. If the merged description exceeds 800 characters, an LLM summarization pass condenses it to prevent embedding degradation.
5.5 Hybrid Retrieval Pipeline
When an agent needs organizational context to answer a user's query, Mind Graph's retrieval pipeline orchestrates three parallel retrieval strategies followed by enrichment and reranking stages.
┌──────────────┐
│ User Query │
└──────┬───────┘
┌──────────┴──────────┐
▼ ▼
┌───────────────┐ ┌─────────────────┐
│ Keyword Extract│ │ Embedding Gen │
│ (proper nouns,│ │ (768-dim vector)│
│ acronyms) │ │ │
└───────┬───────┘ └────────┬────────┘
▼ ▼
┌───────────────┐ ┌─────────────────┐
│ Name-Based │ │ Graph Retrieval │
│ Lookup │ │ Local + Global │
│ Sim: 0.95 │ │ + Naive chunks │
└───────────────┘ └─────────────────┘
┌─────────────┐
│ Insight │
│ Vector │
│ Search │
└─────────────┘
│
▼
┌────────────────┐
│ Merge & Dedupe │
│ by Entity ID │
└────────┬───────┘
▼
┌────────────────┐
│ Temporal Hop │
│ Enrichment │
│ (7d window,3h) │
└────────┬───────┘
▼
┌────────────────┐
│ KG-Aware Rerank│
│ Confirmed +0.1 │
│ Contradicted -0.2│
│ Recency decay │
└────────┬───────┘
▼
┌────────────────┐
│ Token Budget │
│ Entities 4000 │
│ Relations 4000 │
│ Chunks 6000 │
└────────────────┘The final ranking score for each entity e given query q is computed as: score(e) = 0.7 · sim(q, e) + 0.3 · log₂(1 + connections(e)). This weighting ensures that well-connected hub entities — people, projects, and organizations at the center of organizational activity — are surfaced preferentially, while semantic relevance remains the dominant signal.
5.6 Temporal Fact Management
Organizational knowledge is not static. People change roles, projects complete or are cancelled, partnerships form and dissolve. Mind Graph implements bi-directional temporal invalidation. When new relations are extracted, the system identifies candidate contradictions (existing relations with the same predicate group and embedding similarity above 0.7), invokes an LLM judge to assess whether the new fact invalidates the existing one, applies constraints (only DYNAMIC facts can be invalidated; static and atemporal facts are immune), and sets t_invalid on the old relation, preserving historical context while ensuring retrieval surfaces current facts. Additionally, Mind Graph supports web-grounded verification for externally verifiable facts.
5.7 Context Injection into Agent Reasoning
The final stage of the pipeline bridges Mind Graph's retrieved knowledge into the agent's reasoning process. Retrieved context is formatted as a structured block and injected into the agent's system prompt:
## ORGANIZATION CONTEXT Acme Corp is a B2B SaaS company focused on supply chain optimization... ## USER CONTEXT Sarah is VP of Engineering, timezone UTC+5:30, prefers concise responses... ## RETRIEVED KNOWLEDGE **Entities** - John Smith [person]: Former VP of Engineering, now Advisory Board - Acme Corp [organization]: Series B, 45 employees, $4.2M ARR - Project Atlas [project]: Infrastructure migration, deadline Q2 2026 **Relationships** - Sarah Chen → MANAGES → Engineering Team [fact, valid since 2025-09-15] - Project Atlas → TARGETS → Q2 2026 completion [fact, valid since 2026-01-10] - John Smith → LEFT → VP Engineering role [fact, valid since 2025-09-15] **Temporal Context (co-occurring events within 7-day window)** - Sarah Chen → PROMOTED_TO → VP Engineering [3 hops, 2025-09-12] - John Smith → JOINED → Advisory Board [3 hops, 2025-09-20]
This pre-retrieved, pre-ranked, token-budgeted context enables the agent to reason about organizational matters without needing to access external systems during generation. The agent model — which may be a compact SLM — receives everything it needs to produce an informed response within its context window.
6 The Continuous Learning Loop
6.1 From Static Knowledge to Living Intelligence
The critical differentiator between Mind Graph and traditional knowledge management is continuity. The system does not wait to be queried; it continuously observes, extracts, and integrates knowledge from every organizational communication channel.
THE CONTINUOUS LEARNING LOOP
┌─────────┐ ┌──────────┐ ┌────────────┐
│ OBSERVE │─▶│ EXTRACT │─▶│ RESOLVE & │
│ │ │ │ │ DEDUPE │
│ Chat │ │ Entities │ │ Profile │
│ Email │ │ Relations│ │ matching │
│ Calendar│ │ Facts │ │ Fuzzy merge│
│ Docs │ │ Opinions │ │ │
└─────────┘ └──────────┘ └─────┬──────┘
▲ │
│ ▼
┌─────────┐ ┌──────────┐ ┌────────────┐
│ INJECT │◀─│ RETRIEVE │◀─│ STORE │
│ Context │ │ Hybrid │ │ Temporal │
│ into │ │ search │ │ graph w/ │
│ agent │ │ + rerank │ │ invalidation│
└─────────┘ └──────────┘ └────────────┘
Each cycle deepens organizational
understanding — compounding over timeThis creates a compounding effect. An agent that has observed 10 conversations about the Acme deal knows more than an agent that has observed 1. An agent that has tracked Sarah's promotion, John's departure, and the subsequent reorganization can contextualize a new question about Project Atlas in ways that a stateless model — regardless of its parameter count — simply cannot.
6.2 Post-Execution Indexing
The learning loop extends beyond conversation. When the agent executes tools — reading emails, checking calendars, searching documents — the results are indexed back into Mind Graph:
- Email bodies are chunked and stored for future retrieval
- Calendar attendees are registered as Person entities with organizational affiliations
- Email recipients are synced as Contact entities
- Document contents are processed through the full extraction pipeline
6.3 Temporal Progression of Organizational Understanding
Returning to our roommate metaphor, consider how Mind Graph transforms the AI experience over time:
Week 1: The agent knows the user's name and role. Responses are generic. Month 1: The agent has observed hundreds of conversations. It knows the team structure, active projects, key clients, and communication patterns. Month 6: The agent has built a rich temporal graph. It knows not just current relationships but their evolution. Year 1: The agent has become an institutional memory. It has externalized the tacit knowledge that would otherwise have been lost to employee turnover.
7 SLMs + Mind Graph: The Compound Advantage
7.1 Why Structured Memory Makes SLMs Viable for Enterprise
The conventional argument against SLMs for enterprise use centers on their limited reasoning and knowledge capacity relative to frontier models. This argument assumes that the model must contain organizational knowledge within its parameters — an assumption that Mind Graph invalidates. When organizational knowledge is externalized into a structured graph and injected as pre-retrieved, pre-ranked context, the model's task simplifies dramatically.
The cognitive architecture framework CoALA [39] describes four types of agent memory: working, episodic, semantic, and procedural. Mind Graph provides episodic and semantic memory, while the SLM provides working memory and procedural knowledge. This division of labor plays to each component's strengths.
| Component | Responsibility | Strengths |
|---|---|---|
| Mind Graph | Episodic + semantic memory | Persistent, temporally aware, auditable, scalable |
| SLM | Working memory + procedural | Fast inference, structured output, tool calling, low cost |
| Frontier LLM (no memory) | All four, from scratch each time | Broad knowledge, strong reasoning — but no organizational context |
7.2 Pre-Retrieved Context Reduces Reasoning Burden
Mind Graph performs the expensive work beforethe model is invoked: entity resolution is complete, temporal filtering is complete, relevance ranking is complete, and token budgeting is complete. The SLM receives a clean, pre-processed context block that requires comprehension and synthesis, not retrieval and reasoning. This is analogous to handing a junior analyst a dossier and asking "given this briefing, draft a response" versus "find out everything and write a summary". SLMs excel at the former.
7.3 Token Budget Management
Mind Graph enforces strict token budgets for context injection:
- Entities: 4,000 tokens (6,000 for standalone queries)
- Relations: 4,000 tokens (8,000 for standalone)
- Chunks: 6,000 tokens (8,000 for standalone)
- Community summaries: 1,000 tokens
- Insights: 2,000 tokens
8 Related Work
LightRAG [36] introduces dual-level retrieval with incremental graph updates. GraphRAG [37] applies LLM-derived entity knowledge graphs with community detection for million-token datasets. HippoRAG [38] draws inspiration from hippocampal indexing theory, reporting 10-30x cheaper and 6-13x faster than iterative retrieval. GNN-RAG [40] combines GNN reasoning with LLM language understanding, outperforming GPT-4 with a 7B tuned LLM. Generative Agents [42] demonstrates memory streams with recency and importance scoring. MemOS[44] reports 159% improvement in temporal reasoning over OpenAI's global memory and 60.95% reduction in token overhead.
8.4 Comparative Summary
| Feature | LightRAG | GraphRAG | HippoRAG | Zep | Mem0 | Mind Graph |
|---|---|---|---|---|---|---|
| Temporal fact validity | No | No | No | Yes | No | Yes |
| Org entity resolution | No | No | No | No | No | Yes |
| Multi-layer dedup | No | No | No | Partial | No | Yes (4-layer) |
| Continuous extraction | No | No | No | Partial | Passive | Yes (dual-layer) |
| SLM-optimized retrieval | No | No | No | No | No | Yes |
| Web-grounded verification | No | No | No | No | No | Yes |
| Incremental graph updates | Yes | No (batch) | No | Yes | N/A | Yes |
| Community detection | No | Yes | No | No | No | Yes |
No single prior system combines temporal validity, organizational entity resolution, multi-layer deduplication, and SLM-optimized context injection. Mind Graph's contribution is the integration of these capabilities into a unified continuous learning system.
9 Future Directions
9.1 On-Device Business Learning Models
The convergence of SLM capability and edge hardware performance creates a near-term opportunity for fully on-device organizational intelligence. With Qualcomm's Hexagon NPU achieving 220 tokens/second [25] and Apple's on-device models demonstrating sub-10ms latency [23], a 270M-parameter Business Learning Model [47] could run entirely on a smartphone or tablet, with Mind Graph stored in local encrypted storage.
FUTURE: ON-DEVICE BLM
┌──────────────────────────────────┐
│ Edge Device (Phone / Tablet) │
│ │
│ ┌──────────┐ ┌─────────────┐ │
│ │ BLM │◀─▶│ Mind Graph │ │
│ │ 270M │ │ (Local DB) │ │
│ │ params │ │ │ │
│ └────┬─────┘ └──────┬──────┘ │
│ │ ▲ │
│ ▼ │ │
│ ┌──────────┐ ┌─────────────┐ │
│ │ Tool │ │ Encrypted │ │
│ │ Router │ │ Sync Layer │ │
│ └──────────┘ └──────┬──────┘ │
│ │ │
└────────────────────────┼─────────┘
▼
┌─────────────┐
│ Org Cloud │
│ (Optional) │
└─────────────┘9.2 Federated Organizational Learning
A longer-term research direction involves federated learning across organizations within the same vertical. A network of dental practices could collectively improve their BLMs' understanding of patient communication patterns without sharing actual patient data — using techniques from differential privacy and federated averaging.
9.3 Multi-Modal Knowledge Graphs
Current Mind Graph extraction is text-based. Future iterations will incorporate:
- Visual extraction: Entities and relations from shared images, whiteboard photos, and screen recordings
- Audio extraction: Real-time entity extraction from meeting transcripts
- Behavioral signals: Communication patterns (response time, message frequency, meeting attendance) as implicit relation signals
9.4 Reinforcement from Business Outcomes
The BLM training framework proposed in our companion paper [47] uses verifiable reward signals — actual business outcomes (email responses received, tasks completed, deals closed). As Mind Graph accumulates temporal records of which predictions proved accurate and which actions led to positive outcomes, these records become a training signal for continuous model improvement. The knowledge graph becomes not just a memory system but a reward signal generator for reinforcement learning.
10 Limitations
We acknowledge several limitations of the current work:
- No empirical evaluation. This paper presents architectural arguments and cites external benchmarks but does not include first-party experimental results comparing Mind Graph + SLM against baseline systems.
- Extraction accuracy. The quality of Mind Graph's knowledge depends on the accuracy of entity and relation extraction. Precision and recall have not been formally measured here.
- Cold start. A newly deployed Mind Graph has no organizational knowledge and provides no advantage over a stateless system until sufficient communication has been observed.
- Language coverage. Current extraction prompts and entity resolution heuristics are optimized for English.
- Graph growth and maintenance. Long-term graph maintenance strategies remain an area of active development.
- Single-organization scope. Cross-organizational knowledge sharing — while architecturally feasible through federated approaches — is not yet implemented.
11 Conclusion
The prevailing narrative in AI development equates progress with scale: more parameters, more training data, more compute. This narrative has produced remarkable general-purpose models, but it has failed to solve the fundamental challenge of enterprise AI — making AI systems that understand your organization, not just organizations in general.
We have presented Mind Graph, a persistent organizational memory system built on temporal knowledge graphs, and argued that its combination with Small Language Models represents a more effective, economical, and privacy-preserving approach to enterprise AI than frontier model dependence. The evidence supports three core claims:
- Small Language Models, fine-tuned for specific domains, routinely outperform frontier LLMs. Phi-4 (14B) beats GPT-4o on STEM reasoning. Meerkat-7B surpasses GPT-3.5 (175B) on medical benchmarks. The Densing Law formalizes the trend: capability density doubles every 3.5 months.
- The critical bottleneck in enterprise AI is context, not capability. Fortune 500 companies lose $31.5 billion annually from knowledge sharing failures. Current approaches — RAG, fine-tuning, agent frameworks, memory startups — each address fragments of this problem but none provide persistent, temporally-aware, organizationally-resolved memory.
- Structured memory makes small models viable for enterprise. Mind Graph's pre-retrieved, pre-ranked, token-budgeted context injection reduces the agent's task from unbounded organizational reasoning to bounded contextual synthesis — precisely the condition under which SLMs excel.
The roommate doesn't need to be the smartest person in the room. They need to be the most attentive. Mind Graph is the attention mechanism that transforms Small Language Models from general-purpose tools into organizational intelligence systems that learn, remember, and evolve with the businesses they serve.
References
- [1] Gartner. "Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025." July 2024.
- [2] McKinsey & Company. "The State of AI in 2025." March 2025.
- [3] Contextual AI. "Why Does Enterprise AI Hallucinate?" 2024.
- [4] Nuclino. "Not Sharing Knowledge Costs Fortune 500 Companies $31.5 Billion a Year." (citing IDC research).
- [5] Stravito. "Organizational Memory Loss: Why It Matters and How to Prevent It."
- [6] Kaplan, J., McCandlish, S., Henighan, T., et al. "Scaling Laws for Neural Language Models." arXiv:2001.08361, 2020.
- [7] Goel, Y., et al. "Position: Enough of Scaling LLMs! Let's Focus on Downscaling."
- [8] Devsu. "LLM API Pricing 2025: What Your Business Needs to Know."
- [9] Nonaka, I. and Takeuchi, H. The Knowledge-Creating Company. Oxford University Press, 1995.
- [10] Lasso Security; Secure Privacy. "LLM Data Privacy" / "Data Privacy Trends 2026."
- [11] Abdin, M., Aneja, J., Behl, H., Bubeck, S., et al. "Phi-4 Technical Report." arXiv:2412.08905, 2024.
- [12] Abdin, M., Agarwal, S., Awadallah, A., et al. "Phi-4-Reasoning Technical Report." Microsoft Research, 2025.
- [13] Wang, H., et al. "Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks." npj Digital Medicine, 2025. arXiv:2404.00376.
- [14] PMC/NIH Research. "Me-LLaMA: Foundation Large Language Models for Medical Applications." PMC, 2024.
- [15] AI4Finance Foundation. "FinGPT: Open-Source Financial Large Language Models." FinLLM@IJCAI 2023. arXiv:2306.06031, 2023.
- [16] Colombo, P., et al. "SaulLM-7B: A Pioneering Large Language Model for Law." arXiv:2403.03883, 2024. See also: SaulLM-54B/141B, NeurIPS 2024.
- [17] Phind. "Fine-tuned CodeLlama Outperforms GPT-4 on HumanEval." The Decoder, 2023.
- [18] Zhao, J., Wang, T., et al. "LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4." arXiv:2405.00732, 2024.
- [19] Xiao, et al. "Densing Law of LLMs." Nature Machine Intelligence, 2025.
- [20] Belcak, P. and Heinrich, G. "Small Language Models are the Future of Agentic AI." NVIDIA Research, arXiv:2506.02153, 2025.
- [21] Epoch AI. "Inference Economics of Language Models." arXiv:2506.04645, 2025.
- [22] Dettmers, T., Pagnoni, A., et al. "QLoRA: Efficient Finetuning of Quantized LLMs." NeurIPS 2023. arXiv:2305.14314.
- [23] Klover.ai. "Apple AI Strategy: Analysis of Dominance in Device Intelligence."
- [24] Meta AI. "Llama 3.2: Revolutionizing Edge AI and Vision with Open, Customizable Models." 2024.
- [25] FinancialContent. "The Edge AI Revolution: How Samsung's Galaxy S26 and Qualcomm's Snapdragon 8 Gen 5 Are Bringing Massive Reasoning Models to Your Pocket." 2026.
- [26] MarketsandMarkets / Polaris Market Research. "Small Language Model Market."
- [27] BusinessWire. "Small Language Models (SLMs) Company Evaluation Report 2025."
- [28] Gao, Y., Xiong, Y., Gao, X., et al. "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997, 2024.
- [29] Liu, N.F., Lin, K., Hewitt, J., et al. "Lost in the Middle: How Language Models Use Long Contexts." TACL, 2024. arXiv:2307.03172.
- [30] "HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation." arXiv:2502.12442, 2025.
- [31] Neo4j. "Knowledge Graph vs. Vector RAG Benchmarks" / FalkorDB. "GraphRAG Accuracy."
- [32] DEV.to. "AI Agent Memory: A Comparative Analysis of LangGraph, CrewAI, and AutoGen."
- [33] Chhikara, P., Khant, D., Aryan, S., et al. "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory." arXiv:2504.19413, 2025.
- [34] Packer, C., Wooders, S., Lin, K., et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, 2023.
- [35] Zep. "Temporal Knowledge Graph Architecture for Agent Memory." arXiv:2501.13956, 2025.
- [36] Guo, Z., Xia, L., Yu, Y., et al. "LightRAG: Simple and Fast Retrieval-Augmented Generation." EMNLP 2025. arXiv:2410.05779, 2024.
- [37] Edge, D., et al. "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv:2404.16130, 2024.
- [38] Jimenez Gutierrez, B., Shu, et al. "HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs." NeurIPS 2024. arXiv:2405.14831.
- [39] Sumers, T.R., Yao, S., Narasimhan, K., Griffiths, T.L. "Cognitive Architectures for Language Agents." TMLR, 2024. arXiv:2309.02427.
- [40] Mavromatis, C. and Karypis, G. "GNN-RAG: Graph Neural Retrieval for LLM Reasoning." ACL 2025. arXiv:2405.20139.
- [41] Wu, D., et al. "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs." arXiv:2506.09542, 2025.
- [42] Park, J.S., O'Brien, J.C., Cai, C.J., et al. "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023. arXiv:2304.03442.
- [43] Shinn, N., Cassano, F., Berman, E., et al. "Reflexion: Language Agents with Verbal Reinforcement Learning." NeurIPS 2023. arXiv:2303.11366.
- [44] "MemOS: Memory Operating System for LLMs." arXiv:2505.22101, 2025.
- [45] Cai, B., Xiang, Y., Gao, L., et al. "Temporal Knowledge Graph Completion: A Survey." IJCAI 2023. arXiv:2201.08236.
- [46] Li, C., Xin, M., Yuhao, Z., et al. "A Survey on Temporal Knowledge Graph: Representation Learning and Applications." arXiv:2403.04782, 2024.
- [47] Gulati, J.S., Jaganathan, P., Taneja, M., and Native AI Labs. "Business Learning Models." Native AI Research, 2025. /research/blm
Explore more research
Native Research Lab: BLMs, SLMs, and living UX.