MongoDB Atlas
Only available on Node.js.
You can still create API routes that use MongoDB with Next.js by setting the runtime
variable to nodejs
like so:
export const runtime = "nodejs";
You can read more about Edge runtimes in the Next.js documentation here.
LangChain.js supports MongoDB Atlas as a vector store, and supports both standard similarity search and maximal marginal relevance search, which takes a combination of documents are most similar to the inputs, then reranks and optimizes for diversity.
Setup
Installation
First, add the Node MongoDB SDK to your project:
- npm
- Yarn
- pnpm
npm install -S mongodb
yarn add mongodb
pnpm add mongodb
Initial Cluster Configuration
Next, you'll need create a MongoDB Atlas cluster. Navigate to the MongoDB Atlas website and create an account if you don't already have one.
Create and name a cluster when prompted, then find it under Database
. Select Collections
and create either a blank collection or one from the provided sample data.
Note The cluster created must be MongoDB 7.0 or higher. If you are using a pre-7.0 version of MongoDB, you must use a version of langchainjs<=0.0.163.
Creating an Index
After configuring your cluster, you'll need to create an index on the collection field you want to search over.
Switch to the Atlas Search
tab and click Create Search Index
. From there, make sure you select Atlas Vector Search - JSON Editor
,
then select the appropriate database and collection and paste the following into the textbox:
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "euclidean",
"type": "vector"
}
]
}
Note that the dimensions
property should match the dimensionality of the embeddings you are using.
For example, Cohere embeddings have 1024 dimensions, and by default OpenAI embeddings have 1536:
Note: By default the vector store expects an index name of default
, an indexed collection field name of embedding
, and a raw text field name of text
.
You should initialize the vector store with field names matching your index name collection schema as shown below.
Finally, proceed to build the index.
Usage
- npm
- Yarn
- pnpm
npm install @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
Ingestion
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";
const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);
const vectorstore = await MongoDBAtlasVectorSearch.fromTexts(
["Hello world", "Bye bye", "What's this?"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new CohereEmbeddings({ model: "embed-english-v3.0" }),
{
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
}
);
const assignedIds = await vectorstore.addDocuments([
{ pageContent: "upsertable", metadata: {} },
]);
const upsertedDocs = [{ pageContent: "overwritten", metadata: {} }];
await vectorstore.addDocuments(upsertedDocs, { ids: assignedIds });
await client.close();
API Reference:
- MongoDBAtlasVectorSearch from
@langchain/mongodb
- CohereEmbeddings from
@langchain/cohere
Search
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";
const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);
const vectorStore = new MongoDBAtlasVectorSearch(
new CohereEmbeddings({ model: "embed-english-v3.0" }),
{
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
}
);
const resultOne = await vectorStore.similaritySearch("Hello world", 1);
console.log(resultOne);
await client.close();
API Reference:
- MongoDBAtlasVectorSearch from
@langchain/mongodb
- CohereEmbeddings from
@langchain/cohere
Maximal marginal relevance
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";
const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);
const vectorStore = new MongoDBAtlasVectorSearch(
new CohereEmbeddings({ model: "embed-english-v3.0" }),
{
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
}
);
const resultOne = await vectorStore.maxMarginalRelevanceSearch("Hello world", {
k: 4,
fetchK: 20, // The number of documents to return on initial fetch
});
console.log(resultOne);
// Using MMR in a vector store retriever
const retriever = await vectorStore.asRetriever({
searchType: "mmr",
searchKwargs: {
fetchK: 20,
lambda: 0.1,
},
});
const retrieverOutput = await retriever.invoke("Hello world");
console.log(retrieverOutput);
await client.close();
API Reference:
- MongoDBAtlasVectorSearch from
@langchain/mongodb
- CohereEmbeddings from
@langchain/cohere
Metadata filtering
MongoDB Atlas supports pre-filtering of results on other fields. They require you to define which metadata fields you plan to filter on by updating the index. Here's an example:
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "euclidean",
"type": "vector"
},
{
"path": "docstore_document_id",
"type": "filter"
}
]
}
Above, the first item in fields
is the vector index, and the second item is the metadata property you want to filter on.
The name of the property is path
, so the above index would allow us to search on a metadata field named docstore_document_id
.
Then, in your code you can use MQL Query Operators for filtering. Here's an example:
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";
import { sleep } from "langchain/util/time";
const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);
const vectorStore = new MongoDBAtlasVectorSearch(
new CohereEmbeddings({ model: "embed-english-v3.0" }),
{
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
}
);
await vectorStore.addDocuments([
{
pageContent: "Hey hey hey",
metadata: { docstore_document_id: "somevalue" },
},
]);
const retriever = vectorStore.asRetriever({
filter: {
preFilter: {
docstore_document_id: {
$eq: "somevalue",
},
},
},
});
// Mongo has a slight processing delay between ingest and availability
await sleep(2000);
const results = await retriever.invoke("goodbye");
console.log(results);
await client.close();
API Reference:
- MongoDBAtlasVectorSearch from
@langchain/mongodb
- CohereEmbeddings from
@langchain/cohere
- sleep from
langchain/util/time