MongoDB Inc.

09/12/2024 | News release | Distributed by Public on 09/12/2024 09:04

AI Agents, Hybrid Search, and Indexing with LangChain and MongoDB

Since we announced integration with LangChain last year, MongoDB has been building out tooling to help developers create advanced AI applications with LangChain. With recent releases, MongoDB has made it easier to develop agentic AI applications (with a LangGraph integration), perform hybrid search by combining Atlas Search and Atlas Vector Search, and ingest large-scale documents more effectively.

For more on each development-plus new support for the LangChain Indexing API-please read on!

The rise of AI agents

Agentic applications have emerged as a compelling next step in the development of AI. Imagine an application able to act on its own, working towards complicated goals and drawing on context to create a strategy. These applications leverage large language models (LLMs) to dynamically determine their execution path, breaking free from the constraints of traditional, deterministic logic.

Consider an application tasked with answering a question like "In our most profitable market, what is the current weather?" While a traditional retrieval-augmented generation (RAG) app may falter, unable to obtain information about "current weather," an agentic application shines. The application can intelligently deduce the need for an external API call to obtain current weather information, seamlessly integrating this with data retrieved from a vector search to identify the most profitable market.

These systems take action and gather additional information with limited human intervention, supplementing what they already know. Building such a system is easier than ever thanks to MongoDB's continued work with LangGraph.

Unleashing the power of AI agents with LangGraph and MongoDB

Because it now offers LangGraph-a framework for performing multi-agent orchestration-LangChain is more effective than ever at simplifying the creation of applications using LLMs, including AI agents. These agents require memory to maintain context across multiple interactions, allowing users to engage with them repeatedly while the agent retains information from previous exchanges.

While basic agentic applications can utilize in-memory structures, for more complicated use cases these structures are not sufficient. MongoDB allows developers to build stateful, multi-actor applications with LLMs, storing and retrieving the "checkpoints" needed by LangGraph.js. The new MongoDBSaver class makes integration simpler than ever before, as LangGraph.js is able to utilize historical user interactions to enhance agentic AI. By segmenting this history into checkpoints, the library allows for persistent session memory, easier error recovery, and even the ability to "time travel"-allowing users to jump back in the graph to a previous state to explore alternative execution. The MongoDBSaver class implements all of this functionality right into LangGraph.js, with sensible defaults and MongoDB-specific optimization.

To learn more, please visit the source code, the documentation, and our new tutorial (which includes both a written and video version).

Improve retrieval accuracy with Hybrid Search Retriever

Hybrid search is particularly well-suited for queries that have both semantic and keyword-based components. Let's look at an example, a query such as "find recent scientific papers about climate change impacts on coral reefs that specifically mention ocean acidification". This query would use a hybrid search approach, combining semantic search to identify papers discussing climate change effects on coral ecosystems, keyword matching to ensure "ocean acidification" is mentioned, and potential date-based filtering or boosting to prioritize recent publications.

This combination allows for more comprehensive and relevant results than either semantic or keyword search alone could provide. With our recent release of Retrievers in LangChain-MongoDB, building such advanced retrieval patterns is more accessible than ever.

Retrievers are how LangChain integrates external data sources into LLM applications. MongoDB has added two new custom, purpose-built Retrievers to the langchain-mongodb Python package, giving developers a unified way to perform hybrid search and full-text search with sensible defaults and extensive code annotation. These new classes make it easier than ever to use the full capabilities of MongoDB Vector Search with LangChain.

The new MongoDBAtlasFullTextSearchRetriever class performs full-text searches using the Best Match 25 (BM25) analyzer. The MongoDBAtlasHybridSearchRetriever class builds on this work, combining the above implementation with vector search, fusing the results with Reciprocal Rank Fusion (RRF) algorithm. The combination of these two techniques is a potent tool for improving the retrieval step of a Retrieval-Augmented Generation (RAG) application, enhancing the quality of the results.

To find out more, please dive into the MongoDBAtlasHybridSearchRetriever and MongoDBAtlasFullTextSearchRetriever classes.

Seamless synchronization using LangChain Indexing API

In addition to these releases, we're also excited to announce that MongoDB now supports the LangChain Indexing API, allowing for seamless loading and synchronization of documents from any source into MongoDB, leveraging LangChain's intelligent indexing features.

This new support will help users avoid duplicate content, minimize unnecessary rewrites, and optimize embedding computations. The LangChain Indexing API's record management system ensures efficient tracking of document writes, computing hashes for each document, and storing essential information like write time and source ID. This feature is particularly valuable for large-scale document processing and retrieval applications, offering flexible cleanup modes to manage documents effectively in MongoDB vector search.

To read more about how to use the Indexing API, please visit the LangChain Indexing API Documentation.

We're excited about these LangChain integrations and we hope you are too. Here are some resources to further your learning:

  • Check out our written and video tutorial to walk you through building your own JavaScript AI agent with LangGraph.js and MongoDB.

  • Experiment with Hybrid search retrievers to see the power of Hybrid search for yourself.

  • Read the previous announcement with LangChain about Semantic Caching.