Editor’s note: David Hughes and Amy Hodler are speakers for ODSC East 2025 this May 13th-15th. Check out their talk, “Advancing GraphRAG: Text, Images, and Audio for Multimodal Intelligence,” there to learn more about GraphRAG!
In the rapidly evolving field of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a powerful approach for enhancing AI systems with external knowledge. Building on this foundation, Graph-based RAG (GraphRAG) has demonstrated significant benefits by enriching semantic vector searches with the contextual relationships that graphs provide. But there’s still untapped potential in this domain—particularly when it comes to non-textual data like images and even audio.
In-person conference | May 13th-15th, 2025 | Boston, MA
Join us May 13th-15th, 2025, for 3 days of immersive learning and networking with AI experts.
🔹 World-class AI experts | 🔹 Cutting-edge workshops | 🔹 Hands-on Training
🔹 Strategic Insights | 🔹 Thought Leadership | 🔹 And much more!
The Missing Piece in Our Data Puzzle
Our digital ecosystem is increasingly visual. From medical scans to surveillance footage, from product catalogs to satellite imagery, visual data represents an enormous and growing portion of our information landscape. Yet traditional RAG approaches often leave this rich semantic content untouched, essentially “throwing away a thousand words worth of context” with each neglected image.
This observation sparked our journey into exploring what we call “multimodal GraphRAG” (mmGraphRAG)—a framework designed to seamlessly integrate visual and textual data for more comprehensive insights and more accurate responses.
What Makes mmGraphRAG Different?
At its core, mmGraphRAG combines several sophisticated technologies:
- Embeddings that capture visual and audio semantics – Using models like CLIP to transform images into semantic vectors that capture their meaning
- Graph-based reasoning – Image decomposition and representing relationships between visual elements, objects, colors, and spatial arrangements
- Explainable outcomes – Providing transparent evidence and rationales for why certain images match particular queries
The result is a system that can handle natural language queries like “Find images of bananas on a wooden table” and return not only relevant images but also explanations of why they match, identifying features like “muted yellow” colors or spatial relationships between objects.
mmGraphRAG breaks down components that can be explored (texture, spatial placement, sound elements) individually or together. This blending of semantic context and data (text, visual, and audio) enables reasoning across multiple levels of abstraction and associations.
[Image: Associative Search Enabled by mmGraphRAG]
The Technical Journey
For the visual data, we started with semantic embedding using OpenAI’s CLIP model to project images into an embedding space suitable for associative search. This foundation was then enhanced through:
- Image decomposition – Breaking images down into constituent objects, spatial relationships, dominant colors, and other features
- Hyperdimensional Computing (HDC) – Moving beyond CLIP’s 512 dimensions to 10,000+ dimensions for richer semantic representation
- Using LanceDB to store hypervectors and manage similarity search
- Graph representation – Using Lamb as an embedded graph database to represent images and their components as interconnected nodes
- Agentic workflow – Implementing the system using BAML to create a production-ready solution that processes user queries.
[Image: AI Agents Orchestrated with BAML]
The architecture leverages LanceDB for vector storage and retrieval, with query results feeding into graph database queries that provide the contextual information needed for comprehensive responses.
Level Up Your AI Expertise! Subscribe Now:
Real-World Applications
The power of mmGraphRAG becomes apparent when considering its diverse applications:
- Intellectual Property Search: Comparing new designs against existing patents using both visual and textual similarity
- Medical Imaging: Finding diagnostic images based on specific features or anomalies
- Surveillance: Detecting objects or scenarios in security footage by analyzing spatial relationships
- E-commerce: Enabling precise product searches like “yellow mugs with wooden handles”
- Geospatial Analysis: Searching satellite imagery for specific features like “red-roofed buildings near water”
Looking Forward
But we’re not stopping here. Future directions for mmGraphRAG include:
- Novel feature extraction techniques
- Integration of audio
- Temporal analytics for static images; injecting timeseries data based on audio
- Depth of objects, Z-ordering, of segmented objects
- Refined graph schemas for better insight generation
- Exploration of Hyperdimensional Computing in graph applications
A particularly innovative aspect of this work is its potential use of Hyperdimensional Computing, which draws inspiration from how the brain processes information. By using high-dimensional vectors (hypervectors), HDC provides:
- Efficient representation of complex multimodal data
- Robustness when dealing with noise or incomplete information
- Enhanced capacity for capturing relationships between elements
This approach significantly elevates the system’s performance and interpretability, allowing it to bridge diverse data types in meaningful ways.
Perhaps most intriguingly, mmGraphRAG hints at deeper levels of analysis— for example, decomposing medical images like CT or MRI scans into voxels (3D pixels) that can be modeled as graph nodes with properties and relationships, then projected into 3D space for analysis. Communities of voxels can represent anatomical structures or abnormalities like tumors and the evolution in the graph can represent disease progression or treatment response.
[Image: Brain Graph Using Latent Data and Graph Analytics to Find Patterns]
Why This Matters
In today’s data-rich environment, the ability to seamlessly integrate multiple modalities is increasingly crucial. Traditional search systems that isolate text and visual data into separate silos miss the rich contextual relationships between them.
By unifying semantic and visual reasoning, mmGraphRAG accelerates the discovery of actionable insights, enhances explainability through transparent AI techniques, and provides a more nuanced interpretation of user queries. The framework supports customizable schemas for domain-specific needs and can operate in secure, self-contained environments, making it suitable for privacy-sensitive applications.
As our data landscape continues to diversify, frameworks like mmGraphRAG represent an essential step forward in our ability to extract meaningful insights from complex, multimodal information. By bridging the gap between text, vision, and graphs, we can unlock the full potential of our increasingly visual digital world.
The transformation is clear: from leaving visual data “on the table” to building rich knowledge graphs that capture the full semantic context of our information—mmGraphRAG points the way to a more integrated, nuanced approach to artificial intelligence.
We’re excited to offer this material as a workshop at ODSC East in Boston with architectural details and a notebook for associative search. We look forward to hearing your questions and what you do with mmGraphRAG.
In-person conference | May 13th-15th, 2025 | Boston, MA
Join us at ODSC East for hands-on training, workshops, and bootcamps with the leading experts. Topics include:
🔹 Introduction to scikit-learn
🔹 Building a Multimodal AI Assistant
🔹 Explainable AI for Decision-Making Applications
🔹 Building and Deploying LLM Applications
🔹 Causal you have
🔹 Adaptive RAG Systems with Knowledge Graphs
🔹 Idiomatic Polars
🔹 Machine Learning with CatBoost
About the Authors/ODSC East Speakers on GraphRAG:
David Hughes is the Principal Data & AI Solution Architect at Enterprise Knowledge. He has 10 years of experience designing and building graph solutions which surface meaningful insights. His background includes clinical practice, medical research, software development, and cloud architecture. David has worked in healthcare and biotech within the intensive care, interventional radiology, oncology, cardiology, and proteomics domains.
Amy Hodler is an evangelist for graph analytics and responsible AI. She’s the co-author of O’Reilly books on Graph Algorithms and Knowledge Graphs as well as a contributor to the Routledge book, Massive Graph Analytics and Bloomsbury book, AI on Trial. Amy has decades of experience in emerging tech at companies such as Microsoft, Hewlett-Packard (HP), Hitachi IoT, Neo4j, Cray, and RelationalAI. Amy is the founder of GraphGeeks.org promoting connections everywhere.
For more info visit at Times Of Tech