{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quickstart\n", "\n", "\n", "We'll cover the following here: \n", "\n", "## Installation\n", "\n", "1. Installation with fastembed\n", "2. Installation without fastembed\n", "\n", "## Embedding, Inserting and Querying\n", "\n", "1. `add` and `query` with fastembed\n", "2. Qdrant without fastembed: Points, upsert and query " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "### Installation with fastembed\n", "\n", "Qdrant's Python client ships with FastEmbed, an optional dependency for embedding text without handling" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!pip install 'qdrant-client[fastembed]' --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize the Client\n", "\n", "We support 3 production options:\n", "\n", "1. Qdrant [Cloud](https://qdrant.to/cloud?utm_source=docs&utm_medium=website&utm_campaign=python-docs&utm_content=article&utm_term=intro) - Recommended for Getting Started\n", "2. Qdrant Managed Deployment with your Cloud Provider - Recommendeded for Enteprises\n", "3. Qdrant [Self-Hosted with Docker](https://qdrant.tech/documentation/quick-start/) - Recommended for those with specific requirements\n", "\n", "In addition, Python client wraps `numpy` in a `:memory:` mode, which is useful for getting a feel of the client syntax." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from qdrant_client import QdrantClient\n", "\n", "# client = QdrantClient(path=\"path/to/db\") # Persists changes to disk\n", "# or\n", "client = QdrantClient(\":memory:\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Embedding, Inserting and Querying\n", "\n", "We'll use the FastEmbed library to embed text without handling the embedding model.\n", "\n", "### Embedding and Inserting\n", "\n", "We wrap the `Point` creation and insertion into a single API: `add` for convenience." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a3e23385a815464385a7589443f850db', 'd5bef7146f1541518cd767313f6569d5']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Prepare your documents, metadata, and IDs\n", "docs = [\"Qdrant has Langchain integrations\", \"Qdrant also has Llama Index integrations\"]\n", "\n", "client.add(\n", " collection_name=\"demo_collection\",\n", " documents=docs,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you'd like to add points with control over the metadata, you can use the `metadata` parameter. Here is a quick example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[42, 2]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Prepare your documents, metadata, and IDs\n", "docs = [\"Qdrant has Langchain integrations\", \"Qdrant also has Llama Index integrations\"]\n", "metadata = [\n", " {\"source\": \"Langchain-docs\"},\n", " {\"source\": \"Linkedin-docs\"},\n", "]\n", "ids = [42, 2]\n", "\n", "# Use the new add method\n", "client.add(\n", " collection_name=\"demo_collection\",\n", " documents=docs,\n", " metadata=metadata,\n", " ids=ids\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying with text directly\n", "\n", "At query time, we need to embed the incoming query and then search for the nearest neighbors. We can do this with the `query` API:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[QueryResponse(id=42, embedding=None, metadata={'document': 'Qdrant has Langchain integrations', 'source': 'Langchain-docs'}, document='Qdrant has Langchain integrations', score=0.8276550115796268)]\n" ] } ], "source": [ "search_result = client.query(\n", " collection_name=\"demo_collection\",\n", " query_text=\"This is a query document\",\n", " limit=1\n", ")\n", "print(search_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Qdrant without `fastembed`\n", "\n", "### Collection\n", "\n", "A collection is a set of points with the same dimensionality and a similarity metric (e.g. Dot, Cosine) defined on it. We can create a collection with the `create_collection` method:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from qdrant_client.http.models import Distance, VectorParams\n", "\n", "if not client.collection_exists(\"test_collection\"):\n", "\tclient.create_collection(\n", "\t\tcollection_name=\"test_collection\",\n", "\t\tvectors_config=VectorParams(size=4, distance=Distance.DOT),\n", "\t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we didn't have to explicitly create a collection with `fastembed`: If the collection exists already, we update or create a new one when using `add`. \n", "\n", "## Points\n", "A point is a vector of floats with a unique identifier `id`. We can create a point with the `Point` API:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "operation_id=0 status=\n" ] } ], "source": [ "from qdrant_client.http.models import PointStruct\n", "\n", "operation_info = client.upsert(\n", " collection_name=\"test_collection\",\n", " wait=True,\n", " points=[\n", " PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={\"city\": \"Berlin\"}),\n", " PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={\"city\": \"London\"}),\n", " PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={\"city\": \"Moscow\"}),\n", " PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={\"city\": \"New York\"}),\n", " PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={\"city\": \"Beijing\"}),\n", " PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={\"city\": \"Mumbai\"}),\n", " ]\n", ")\n", "print(operation_info)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This creation of `Points` is also abstracted away with the `add` API. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying with Vector" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ScoredPoint(id=2, version=0, score=1.2660000014305115, payload={'city': 'London'}, vector=None)]\n" ] } ], "source": [ "search_result = client.search(\n", " collection_name=\"test_collection\",\n", " query_vector=[0.18, 0.81, 0.75, 0.12], \n", " limit=1\n", ")\n", "print(search_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying with a Filter and Vector" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ScoredPoint(id=2, version=0, score=0.8709999993443489, payload={'city': 'London'}, vector=None)]\n" ] } ], "source": [ "from qdrant_client.http.models import Filter, FieldCondition, MatchValue\n", "\n", "search_result = client.search(\n", " collection_name=\"test_collection\",\n", " query_vector=[0.2, 0.1, 0.9, 0.7], \n", " query_filter=Filter(\n", " must=[\n", " FieldCondition(\n", " key=\"city\",\n", " match=MatchValue(value=\"London\")\n", " )\n", " ]\n", " ),\n", " limit=1\n", ")\n", "print(search_result)" ] } ], "metadata": { "kernelspec": { "display_name": "client", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 2 }