“People know when you’re lying!” Those words of wisdom sailed across the airwaves in November of 2023, when Boomi CEO Steve Lucas took the stage at Reuters NEXT, a top-tier event in New York featuring leaders from the worlds of business and government.
His point was that honesty is always the best policy. If bad news must be shared, then take your medicine and get it off your chest. Misleading anyone in the business world is no bueno, for it leads to missed opportunities, false narratives and underperforming teams.
But what of AI and these hallucinations? How can we design systems that foster trust in this powerful tech? The simple answer is data: governed, cleansed, certified data! And that should come as no surprise to the many professionals who’ve plied these waters for years.
In the world of structured data, countless tools have come along for finding, accessing, extracting, transforming, cleansing and loading data. Along the way, data warehouses largely supplanted traditional databases as the centers of gravity for serious information workers.
AI tends to grab its magic from the vast troves of unstructured data: articles, blogs, PDFs, even videos, podcasts and other data types. High-quality AI models draw their energy from this plentiful resource; that’s how ChatGPT, Claude, Mistral and others were trained.
For enterprise use, however, these generic models will only get you so far. Out of the box, the accuracy tends to hover around 80%. So, what part of your business can be wrong 20% of the time? Fulfillment? Accounting? Operations? The obvious answer is none of the above!
So, next question: How can organizations leverage their enterprise data to bring clarity and accuracy to the big box GenAI models? As always in the world of IT, there are options! The most common approaches involve one or more of the following:
1. Graph Technology: Knowledge Graphs are extremely valuable for providing the kind of rich context necessary for generating valid responses from GenAI engines, especially the relationships between entities such as customers, products, services, costs and the like. By consulting a knowledge graph, the AI can produce responses that align closely with factual knowledge, significantly improving precision and reducing misinformation.
2. Retrieval-Augmented Generation: RAG enhances GenAI by combining the strengths of retrieval systems with generative language models. When prompted, RAG retrieves relevant documents, passages, or information chunks from a trusted source (like databases, documents, or vector embeddings). These retrieved segments are then provided as context to the generative model at inference time, dramatically improving accuracy, relevance, and factual correctness.
3. Fine-Tuning: This approach involves taking a pretrained language model and further training it on specialized, task-specific datasets to better adapt it to specific domains, industries, or use cases. This process tailors the general knowledge encoded in large foundational models specifically toward the user’s domain or desired outcome. By refining the model with targeted examples, fine-tuning helps GenAI produce higher-quality outputs, better understand nuanced terminology, and reduce inaccuracies, improving overall specificity, relevance, and accuracy.
4. Semantics and Ontologies: By no means a new discipline, the use of semantics and ontologies improve GenAI outcomes by defining clear conceptual structures and relationships among concepts in a formalized manner. Ontologies provide a structured vocabulary and hierarchical categorization of domain-specific knowledge, allowing GenAI to understand the precise meaning and relationships of terms, concepts, and entities. By embedding semantic clarity into the model’s knowledge structure, GenAI can more effectively disambiguate meanings, produce nuanced and accurate responses, and ensure alignment with the user’s intended context.
By leveraging one or more of these approaches, organizations can ground the results of GenAI models. Which route you take will depend upon your unique business use cases, your organizational skillsets, budget and time constraints.
Boomi’s Role
Boomi’s data platform plays a foundational role in enabling trusted Generative AI (GenAI) by providing high-quality, integrated, and governed data. Through its integration capabilities, unified data governance, and master data management, Boomi ensures that AI systems operate on accurate, reliable, and meaningful data.
Knowledge Graphs: Boomi’s Data Hub serves as an orchestrated, governed repository for data from multiple systems, which forms an ideal foundation for constructing comprehensive knowledge graphs. By ensuring data is accurate, unified, and consistent across all connected systems, Boomi enables organizations to confidently create knowledge graphs that effectively represent complex relationships and entities. This accuracy and coherence in underlying data are crucial for GenAI systems to navigate and leverage these graphs effectively, significantly improving accuracy and context in generated results.
Fine-Tuning: Fine-tuning large language models (LLMs) or other AI models relies heavily on well-managed, structured data. Boomi provides an ideal environment for collecting, normalizing, and feeding clean, authoritative data into fine-tuning processes. By controlling data quality, traceability, and lineage, Boomi ensures that fine-tuning uses trusted data sets, thus significantly enhancing the relevance, accuracy, and business applicability of GenAI outputs.
Retrieval-Augmented Generation (RAG): RAG combines language models with real-time retrieval of external or organizational data to enhance generative accuracy and reliability. Boomi’s data platform, particularly when combined with its API and integration capabilities, can feed current, trustworthy, and contextual data into RAG workflows. By quickly retrieving accurate and relevant enterprise data at the point of generation, Boomi supports the critical retrieval component of RAG, dramatically improving contextual precision and trustworthiness of GenAI responses.
Semantics and Ontologies: Effective GenAI requires consistent semantic meaning across large data sets. Boomi excels at applying standardized semantics and ontologies through its robust data governance and master data management features. By maintaining rigorous data definitions, standards, and ontological consistency across all integrated sources, Boomi enables GenAI models to operate on semantically aligned data, minimizing misunderstanding, ambiguity, and inaccuracies in AI-driven interactions.
The recent acquisition of Rivery by Boomi brings additional strategic strength to enabling trusted GenAI:
Advanced Data Pipelines for GenAI Readiness: Rivery’s data integration pipelines complement Boomi’s existing capabilities by enabling continuous, automated data flows from diverse sources into AI-ready environments. This means fresher, more accurate, and continuously updated data for fine-tuning models, knowledge graph updates, and real-time retrieval in RAG scenarios.
Real-Time Data Synchronization: Rivery excels at near-real-time synchronization and integration across cloud-based platforms, SaaS applications, and databases. This ensures that data utilized by GenAI models is always current, thus improving the effectiveness of real-time generation methods such as RAG, where immediate and fresh context is vital.
Cloud-Native Scalability: Rivery’s cloud-native architecture ensures scalability, performance, and responsiveness for data integration tasks at enterprise scale. This directly supports AI workloads, which require large-scale, timely data ingestion to maintain accuracy and model performance.
Seamless Semantic Alignment: Rivery supports semantic transformations and data normalization within its pipelines, further complementing Boomi’s governance and master data management capabilities. This ensures that data integrated into Boomi’s data fabric aligns with organizational ontologies, further enhancing semantic coherence and consistency necessary for trusted GenAI.
If this sounds good to you, then get ready for Boomi World in Dallas next month! I’ll be there doing DM Radio onsite, and we’ll have a host of data visionaries in the mix. Hope to see you there! Ping me if you’re coming! More info here: https://bit.ly/3G3rVNc!