{ "cells": [ { "cell_type": "markdown", "id": "28a8b793", "metadata": { "id": "28a8b793" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "551753b7-6cd2-4f81-aec0-da119e4705ad", "metadata": { "id": "551753b7-6cd2-4f81-aec0-da119e4705ad" }, "source": [ "# Finetune Embeddings\n", "\n", "In this notebook, we show users how to finetune their own embedding models.\n", "\n", "We go through three main sections:\n", "1. Preparing the data (our `generate_qa_embedding_pairs` function makes this easy)\n", "2. Finetuning the model (using our `SentenceTransformersFinetuneEngine`)\n", "3. Evaluating the model on a validation knowledge corpus" ] }, { "cell_type": "markdown", "id": "99afd542-fc47-44ac-aed0-b3684108dba5", "metadata": { "id": "99afd542-fc47-44ac-aed0-b3684108dba5" }, "source": [ "## Generate Corpus\n", "\n", "First, we create the corpus of text chunks by leveraging LlamaIndex to load some financial PDFs, and parsing/chunking into plain text chunks." ] }, { "cell_type": "code", "execution_count": 10, "id": "e973679e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e973679e", "outputId": "efb3658f-ccc0-40f9-8b2f-42cd4150a15e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: llama-index-llms-openai in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (0.2.11)\n", "Requirement already satisfied: llama-index-core<0.12.0,>=0.11.7 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-openai) (0.11.16)\n", "Requirement already satisfied: openai<2.0.0,>=1.40.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-openai) (1.47.0)\n", "Requirement already satisfied: PyYAML>=6.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (6.0.2)\n", "Requirement already satisfied: SQLAlchemy>=1.4.49 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.0.35)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.10.5)\n", "Requirement already satisfied: dataclasses-json in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.6.7)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.2.14)\n", "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.0.8)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2024.6.1)\n", "Requirement already satisfied: httpx in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.27.2)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.6.0)\n", "Requirement already satisfied: networkx>=3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.3)\n", "Requirement already satisfied: nltk>3.8.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.9.1)\n", "Requirement already satisfied: numpy<2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.26.4)\n", "Requirement already satisfied: pillow>=9.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (10.4.0)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.9.2)\n", "Requirement already satisfied: requests>=2.31.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (8.5.0)\n", "Requirement already satisfied: tiktoken>=0.3.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.7.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (4.66.5)\n", "Requirement already satisfied: typing-extensions>=4.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (4.12.2)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.9.0)\n", "Requirement already satisfied: wrapt in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.16.0)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai) (4.6.0)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai) (1.9.0)\n", "Requirement already satisfied: jiter<1,>=0.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai) (0.5.0)\n", "Requirement already satisfied: sniffio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai) (1.3.1)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.4.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (23.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (6.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.11.1)\n", "Requirement already satisfied: idna>=2.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.40.0->llama-index-llms-openai) (3.10)\n", "Requirement already satisfied: certifi in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2024.8.30)\n", "Requirement already satisfied: httpcore==1.* in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.0.5)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpcore==1.*->httpx->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.14.0)\n", "Requirement already satisfied: click in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (8.1.7)\n", "Requirement already satisfied: joblib in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2024.9.11)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.23.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.23.4)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (2.2.3)\n", "Requirement already satisfied: greenlet!=0.4.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.1.1)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from typing-inspect>=0.8.0->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (1.0.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from dataclasses-json->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (3.22.0)\n", "Requirement already satisfied: packaging>=17.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->llama-index-core<0.12.0,>=0.11.7->llama-index-llms-openai) (24.1)\n", "Note: you may need to restart the kernel to use updated packages.\n", "Requirement already satisfied: llama-index-embeddings-openai in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (0.2.5)\n", "Requirement already satisfied: llama-index-core<0.12.0,>=0.11.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-embeddings-openai) (0.11.16)\n", "Requirement already satisfied: openai>=1.1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-embeddings-openai) (1.47.0)\n", "Requirement already satisfied: PyYAML>=6.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (6.0.2)\n", "Requirement already satisfied: SQLAlchemy>=1.4.49 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.0.35)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.10.5)\n", "Requirement already satisfied: dataclasses-json in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.6.7)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.2.14)\n", "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.0.8)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2024.6.1)\n", "Requirement already satisfied: httpx in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.27.2)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.6.0)\n", "Requirement already satisfied: networkx>=3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.3)\n", "Requirement already satisfied: nltk>3.8.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.9.1)\n", "Requirement already satisfied: numpy<2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.26.4)\n", "Requirement already satisfied: pillow>=9.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (10.4.0)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.9.2)\n", "Requirement already satisfied: requests>=2.31.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (8.5.0)\n", "Requirement already satisfied: tiktoken>=0.3.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.7.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (4.66.5)\n", "Requirement already satisfied: typing-extensions>=4.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (4.12.2)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.9.0)\n", "Requirement already satisfied: wrapt in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.16.0)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai>=1.1.0->llama-index-embeddings-openai) (4.6.0)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai>=1.1.0->llama-index-embeddings-openai) (1.9.0)\n", "Requirement already satisfied: jiter<1,>=0.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai>=1.1.0->llama-index-embeddings-openai) (0.5.0)\n", "Requirement already satisfied: sniffio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai>=1.1.0->llama-index-embeddings-openai) (1.3.1)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.4.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (23.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (6.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.11.1)\n", "Requirement already satisfied: idna>=2.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from anyio<5,>=3.5.0->openai>=1.1.0->llama-index-embeddings-openai) (3.10)\n", "Requirement already satisfied: certifi in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2024.8.30)\n", "Requirement already satisfied: httpcore==1.* in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.0.5)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpcore==1.*->httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.14.0)\n", "Requirement already satisfied: click in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (8.1.7)\n", "Requirement already satisfied: joblib in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2024.9.11)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.23.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.23.4)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (2.2.3)\n", "Requirement already satisfied: greenlet!=0.4.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.1.1)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from typing-inspect>=0.8.0->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (1.0.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from dataclasses-json->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (3.22.0)\n", "Requirement already satisfied: packaging>=17.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->llama-index-core<0.12.0,>=0.11.0->llama-index-embeddings-openai) (24.1)\n", "Note: you may need to restart the kernel to use updated packages.\n", "Requirement already satisfied: llama-index-finetuning in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (0.2.1)\n", "Requirement already satisfied: llama-index-core<0.12.0,>=0.11.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (0.11.16)\n", "Requirement already satisfied: llama-index-embeddings-adapter<0.3.0,>=0.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (0.2.2)\n", "Requirement already satisfied: llama-index-llms-azure-openai<0.3.0,>=0.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (0.2.1)\n", "Requirement already satisfied: llama-index-llms-mistralai<0.3.0,>=0.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (0.2.5)\n", "Requirement already satisfied: llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (0.2.1)\n", "Requirement already satisfied: sentence-transformers>=2.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-finetuning) (3.1.1)\n", "Requirement already satisfied: PyYAML>=6.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (6.0.2)\n", "Requirement already satisfied: SQLAlchemy>=1.4.49 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2.0.35)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.10.5)\n", "Requirement already satisfied: dataclasses-json in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.6.7)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.2.14)\n", "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.0.8)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2024.6.1)\n", "Requirement already satisfied: httpx in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.27.2)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.6.0)\n", "Requirement already satisfied: networkx>=3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.3)\n", "Requirement already satisfied: nltk>3.8.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.9.1)\n", "Requirement already satisfied: numpy<2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.26.4)\n", "Requirement already satisfied: pillow>=9.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (10.4.0)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2.9.2)\n", "Requirement already satisfied: requests>=2.31.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (8.5.0)\n", "Requirement already satisfied: tiktoken>=0.3.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.7.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (4.66.5)\n", "Requirement already satisfied: typing-extensions>=4.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (4.12.2)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.9.0)\n", "Requirement already satisfied: wrapt in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.16.0)\n", "Requirement already satisfied: torch>=2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-embeddings-adapter<0.3.0,>=0.2.0->llama-index-finetuning) (2.2.2)\n", "Requirement already satisfied: azure-identity<2.0.0,>=1.15.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.18.0)\n", "Requirement already satisfied: llama-index-llms-openai<0.3.0,>=0.2.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (0.2.11)\n", "Requirement already satisfied: mistralai>=1.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-mistralai<0.3.0,>=0.2.0->llama-index-finetuning) (1.1.0)\n", "Requirement already satisfied: cohere<6.0.0,>=5.1.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (5.11.0)\n", "Requirement already satisfied: transformers<5.0.0,>=4.38.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sentence-transformers>=2.3.0->llama-index-finetuning) (4.44.2)\n", "Requirement already satisfied: scikit-learn in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sentence-transformers>=2.3.0->llama-index-finetuning) (1.5.2)\n", "Requirement already satisfied: scipy in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sentence-transformers>=2.3.0->llama-index-finetuning) (1.14.1)\n", "Requirement already satisfied: huggingface-hub>=0.19.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sentence-transformers>=2.3.0->llama-index-finetuning) (0.25.0)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2.4.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (23.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (6.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.11.1)\n", "Requirement already satisfied: azure-core>=1.31.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.31.0)\n", "Requirement already satisfied: cryptography>=2.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (43.0.1)\n", "Requirement already satisfied: msal>=1.30.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.31.0)\n", "Requirement already satisfied: msal-extensions>=1.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.2.0)\n", "Requirement already satisfied: boto3<2.0.0,>=1.34.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.35.34)\n", "Requirement already satisfied: fastavro<2.0.0,>=1.9.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.9.7)\n", "Requirement already satisfied: httpx-sse==0.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.4.0)\n", "Requirement already satisfied: parameterized<0.10.0,>=0.9.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.9.0)\n", "Requirement already satisfied: pydantic-core<3.0.0,>=2.18.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.23.4)\n", "Requirement already satisfied: sagemaker<3.0.0,>=2.232.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.232.2)\n", "Requirement already satisfied: tokenizers<1,>=0.15 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.19.1)\n", "Requirement already satisfied: types-requests<3.0.0,>=2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.32.0.20240914)\n", "Requirement already satisfied: anyio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (4.6.0)\n", "Requirement already satisfied: certifi in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2024.8.30)\n", "Requirement already satisfied: httpcore==1.* in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.0.5)\n", "Requirement already satisfied: idna in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.10)\n", "Requirement already satisfied: sniffio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.3.1)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpcore==1.*->httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.14.0)\n", "Requirement already satisfied: filelock in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from huggingface-hub>=0.19.3->sentence-transformers>=2.3.0->llama-index-finetuning) (3.16.1)\n", "Requirement already satisfied: packaging>=20.9 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from huggingface-hub>=0.19.3->sentence-transformers>=2.3.0->llama-index-finetuning) (24.1)\n", "Requirement already satisfied: openai<2.0.0,>=1.40.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-llms-openai<0.3.0,>=0.2.1->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.47.0)\n", "Requirement already satisfied: eval-type-backport<0.3.0,>=0.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mistralai>=1.0.0->llama-index-llms-mistralai<0.3.0,>=0.2.0->llama-index-finetuning) (0.2.0)\n", "Requirement already satisfied: jsonpath-python<2.0.0,>=1.0.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mistralai>=1.0.0->llama-index-llms-mistralai<0.3.0,>=0.2.0->llama-index-finetuning) (1.0.6)\n", "Requirement already satisfied: python-dateutil==2.8.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mistralai>=1.0.0->llama-index-llms-mistralai<0.3.0,>=0.2.0->llama-index-finetuning) (2.8.2)\n", "Requirement already satisfied: six>=1.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from python-dateutil==2.8.2->mistralai>=1.0.0->llama-index-llms-mistralai<0.3.0,>=0.2.0->llama-index-finetuning) (1.16.0)\n", "Requirement already satisfied: click in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (8.1.7)\n", "Requirement already satisfied: joblib in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2024.9.11)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (0.7.0)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (2.2.3)\n", "Requirement already satisfied: greenlet!=0.4.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.1.1)\n", "Requirement already satisfied: sympy in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from torch>=2.0.0->llama-index-embeddings-adapter<0.3.0,>=0.2.0->llama-index-finetuning) (1.13.3)\n", "Requirement already satisfied: jinja2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from torch>=2.0.0->llama-index-embeddings-adapter<0.3.0,>=0.2.0->llama-index-finetuning) (3.1.4)\n", "Requirement already satisfied: safetensors>=0.4.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from transformers<5.0.0,>=4.38.0->sentence-transformers>=2.3.0->llama-index-finetuning) (0.4.5)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from typing-inspect>=0.8.0->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (1.0.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from dataclasses-json->llama-index-core<0.12.0,>=0.11.0->llama-index-finetuning) (3.22.0)\n", "Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from scikit-learn->sentence-transformers>=2.3.0->llama-index-finetuning) (3.5.0)\n", "Requirement already satisfied: botocore<1.36.0,>=1.35.34 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from boto3<2.0.0,>=1.34.0->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.35.34)\n", "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from boto3<2.0.0,>=1.34.0->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.0.1)\n", "Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from boto3<2.0.0,>=1.34.0->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.10.2)\n", "Requirement already satisfied: cffi>=1.12 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cryptography>=2.5->azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.17.1)\n", "Requirement already satisfied: PyJWT<3,>=1.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from PyJWT[crypto]<3,>=1.0.0->msal>=1.30.0->azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (2.9.0)\n", "Requirement already satisfied: portalocker<3,>=1.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from msal-extensions>=1.2.0->azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (2.10.1)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai<0.3.0,>=0.2.1->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (1.9.0)\n", "Requirement already satisfied: jiter<1,>=0.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from openai<2.0.0,>=1.40.0->llama-index-llms-openai<0.3.0,>=0.2.1->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (0.5.0)\n", "Requirement already satisfied: cloudpickle==2.2.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.2.1)\n", "Requirement already satisfied: docker in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (7.1.0)\n", "Requirement already satisfied: google-pasta in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.2.0)\n", "Requirement already satisfied: importlib-metadata<7.0,>=1.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (6.11.0)\n", "Requirement already satisfied: jsonschema in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.23.0)\n", "Requirement already satisfied: pandas in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.2.3)\n", "Requirement already satisfied: pathos in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.3.3)\n", "Requirement already satisfied: platformdirs in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.3.6)\n", "Requirement already satisfied: protobuf<5.0,>=3.12 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.25.5)\n", "Requirement already satisfied: psutil in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (6.0.0)\n", "Requirement already satisfied: sagemaker-core<2.0.0,>=1.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.0.10)\n", "Requirement already satisfied: sagemaker-mlflow in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.1.0)\n", "Requirement already satisfied: schema in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.7.7)\n", "Requirement already satisfied: smdebug-rulesconfig==1.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.0.1)\n", "Requirement already satisfied: tblib<4,>=1.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.0.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from jinja2->torch>=2.0.0->llama-index-embeddings-adapter<0.3.0,>=0.2.0->llama-index-finetuning) (2.1.5)\n", "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sympy->torch>=2.0.0->llama-index-embeddings-adapter<0.3.0,>=0.2.0->llama-index-finetuning) (1.3.0)\n", "Requirement already satisfied: pycparser in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from cffi>=1.12->cryptography>=2.5->azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-openai<0.3.0,>=0.2.0->llama-index-finetuning) (2.22)\n", "Requirement already satisfied: zipp>=0.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from importlib-metadata<7.0,>=1.4.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.20.2)\n", "Requirement already satisfied: rich<14.0.0,>=13.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker-core<2.0.0,>=1.0.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (13.9.2)\n", "Requirement already satisfied: mock<5.0,>4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker-core<2.0.0,>=1.0.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.0.3)\n", "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from jsonschema->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2023.12.1)\n", "Requirement already satisfied: referencing>=0.28.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from jsonschema->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.35.1)\n", "Requirement already satisfied: rpds-py>=0.7.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from jsonschema->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.20.0)\n", "Requirement already satisfied: pytz>=2020.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pandas->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pandas->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2024.1)\n", "Requirement already satisfied: ppft>=1.7.6.9 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pathos->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.7.6.9)\n", "Requirement already satisfied: dill>=0.3.9 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pathos->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.3.9)\n", "Requirement already satisfied: pox>=0.3.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pathos->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.3.5)\n", "Requirement already satisfied: multiprocess>=0.70.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pathos->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.70.17)\n", "Requirement already satisfied: mlflow>=2.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.16.2)\n", "Requirement already satisfied: mlflow-skinny==2.16.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.16.2)\n", "Requirement already satisfied: Flask<4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.0.3)\n", "Requirement already satisfied: alembic!=1.10.0,<2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.13.3)\n", "Requirement already satisfied: graphene<4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.3)\n", "Requirement already satisfied: markdown<4,>=3.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.7)\n", "Requirement already satisfied: matplotlib<4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.9.2)\n", "Requirement already satisfied: pyarrow<18,>=4.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (17.0.0)\n", "Requirement already satisfied: gunicorn<24 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (23.0.0)\n", "Requirement already satisfied: cachetools<6,>=5.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (5.5.0)\n", "Requirement already satisfied: databricks-sdk<1,>=0.20.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.33.0)\n", "Requirement already satisfied: gitpython<4,>=3.1.9 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.1.43)\n", "Requirement already satisfied: opentelemetry-api<3,>=1.9.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.27.0)\n", "Requirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.27.0)\n", "Requirement already satisfied: sqlparse<1,>=0.4.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.5.1)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.0.0)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.18.0)\n", "Requirement already satisfied: Mako in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from alembic!=1.10.0,<2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.3.5)\n", "Requirement already satisfied: Werkzeug>=3.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from Flask<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.0.4)\n", "Requirement already satisfied: itsdangerous>=2.1.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from Flask<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.2.0)\n", "Requirement already satisfied: blinker>=1.6.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from Flask<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.8.2)\n", "Requirement already satisfied: graphql-core<3.3,>=3.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from graphene<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.2.4)\n", "Requirement already satisfied: graphql-relay<3.3,>=3.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from graphene<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.2.0)\n", "Requirement already satisfied: aniso8601<10,>=8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from graphene<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (9.0.1)\n", "Requirement already satisfied: mdurl~=0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.0->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.1.2)\n", "Requirement already satisfied: contourpy>=1.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from matplotlib<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.3.0)\n", "Requirement already satisfied: cycler>=0.10 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from matplotlib<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.12.1)\n", "Requirement already satisfied: fonttools>=4.22.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from matplotlib<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.54.1)\n", "Requirement already satisfied: kiwisolver>=1.3.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from matplotlib<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (1.4.7)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from matplotlib<4->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (3.1.4)\n", "Requirement already satisfied: google-auth~=2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (2.35.0)\n", "Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from gitpython<4,>=3.1.9->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.0.11)\n", "Requirement already satisfied: opentelemetry-semantic-conventions==0.48b0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.48b0)\n", "Requirement already satisfied: smmap<6,>=3.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (5.0.1)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.4.1)\n", "Requirement already satisfied: rsa<5,>=3.1.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (4.9)\n", "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.16.2->mlflow>=2.8->sagemaker-mlflow->sagemaker<3.0.0,>=2.232.1->cohere<6.0.0,>=5.1.1->llama-index-postprocessor-cohere-rerank<0.3.0,>=0.2.0->llama-index-finetuning) (0.6.1)\n", "Note: you may need to restart the kernel to use updated packages.\n", "Collecting llama-index-readers-file\n", " Downloading llama_index_readers_file-0.2.2-py3-none-any.whl.metadata (5.4 kB)\n", "Collecting beautifulsoup4<5.0.0,>=4.12.3 (from llama-index-readers-file)\n", " Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)\n", "Requirement already satisfied: llama-index-core<0.12.0,>=0.11.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-readers-file) (0.11.16)\n", "Requirement already satisfied: pandas in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-readers-file) (2.2.3)\n", "Collecting pypdf<5.0.0,>=4.0.1 (from llama-index-readers-file)\n", " Using cached pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)\n", "Collecting striprtf<0.0.27,>=0.0.26 (from llama-index-readers-file)\n", " Downloading striprtf-0.0.26-py3-none-any.whl.metadata (2.1 kB)\n", "Collecting soupsieve>1.2 (from beautifulsoup4<5.0.0,>=4.12.3->llama-index-readers-file)\n", " Downloading soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)\n", "Requirement already satisfied: PyYAML>=6.0.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (6.0.2)\n", "Requirement already satisfied: SQLAlchemy>=1.4.49 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.0.35)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.10.5)\n", "Requirement already satisfied: dataclasses-json in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.6.7)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.2.14)\n", "Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.0.8)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2024.6.1)\n", "Requirement already satisfied: httpx in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.27.2)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.6.0)\n", "Requirement already satisfied: networkx>=3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.3)\n", "Requirement already satisfied: nltk>3.8.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.9.1)\n", "Requirement already satisfied: numpy<2.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.26.4)\n", "Requirement already satisfied: pillow>=9.0.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (10.4.0)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.9.2)\n", "Requirement already satisfied: requests>=2.31.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.32.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.2.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (8.5.0)\n", "Requirement already satisfied: tiktoken>=0.3.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.7.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (4.66.5)\n", "Requirement already satisfied: typing-extensions>=4.5.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (4.12.2)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.9.0)\n", "Requirement already satisfied: wrapt in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.16.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pandas->llama-index-readers-file) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pandas->llama-index-readers-file) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pandas->llama-index-readers-file) (2024.1)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.4.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (23.2.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (6.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.11.1)\n", "Requirement already satisfied: click in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (8.1.7)\n", "Requirement already satisfied: joblib in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from nltk>3.8.1->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2024.9.11)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.23.4 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.7.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.23.4)\n", "Requirement already satisfied: six>=1.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->llama-index-readers-file) (1.16.0)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.3.2)\n", "Requirement already satisfied: idna<4,>=2.5 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2.2.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (2024.8.30)\n", "Requirement already satisfied: greenlet!=0.4.17 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.1.1)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from typing-inspect>=0.8.0->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.0.0)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from dataclasses-json->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (3.22.0)\n", "Requirement already satisfied: anyio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (4.6.0)\n", "Requirement already satisfied: httpcore==1.* in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.0.5)\n", "Requirement already satisfied: sniffio in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (1.3.1)\n", "Requirement already satisfied: h11<0.15,>=0.13 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from httpcore==1.*->httpx->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (0.14.0)\n", "Requirement already satisfied: packaging>=17.0 in /Users/ledger/anaconda3/envs/policy-rag/lib/python3.11/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file) (24.1)\n", "Downloading llama_index_readers_file-0.2.2-py3-none-any.whl (38 kB)\n", "Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)\n", "Using cached pypdf-4.3.1-py3-none-any.whl (295 kB)\n", "Downloading striprtf-0.0.26-py3-none-any.whl (6.9 kB)\n", "Downloading soupsieve-2.6-py3-none-any.whl (36 kB)\n", "Installing collected packages: striprtf, soupsieve, pypdf, beautifulsoup4, llama-index-readers-file\n", "Successfully installed beautifulsoup4-4.12.3 llama-index-readers-file-0.2.2 pypdf-4.3.1 soupsieve-2.6 striprtf-0.0.26\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install llama-index-llms-openai\n", "%pip install llama-index-embeddings-openai\n", "%pip install llama-index-finetuning\n", "%pip install llama-index-readers-file\n", "%pip install optimum[exporters]\n", "%pip install huggingface_hub" ] }, { "cell_type": "code", "execution_count": 1, "id": "9280d438-b6bd-4ccf-a730-7c8bb3ebdbeb", "metadata": { "id": "9280d438-b6bd-4ccf-a730-7c8bb3ebdbeb" }, "outputs": [], "source": [ "import json\n", "\n", "from llama_index.core import SimpleDirectoryReader\n", "from llama_index.core.node_parser import SentenceSplitter\n", "from llama_index.core.schema import MetadataMode" ] }, { "cell_type": "markdown", "id": "73c42620", "metadata": { "id": "73c42620" }, "source": [ "Download Data" ] }, { "cell_type": "code", "execution_count": null, "id": "d8e11b0c", "metadata": { "id": "d8e11b0c" }, "outputs": [], "source": [ "!mkdir -p 'data/10k/'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'" ] }, { "cell_type": "code", "execution_count": 4, "id": "c5e890bc-557b-4d3c-bede-3e80dfeeee18", "metadata": { "id": "c5e890bc-557b-4d3c-bede-3e80dfeeee18" }, "outputs": [], "source": [ "TRAIN_FILES = 'data'\n", "\n", "TRAIN_CORPUS_FPATH = \"train_corpus.json\"\n", "VAL_CORPUS_FPATH = \"val_corpus.json\"" ] }, { "cell_type": "code", "execution_count": 31, "id": "1da871c1-9d58-467a-92fd-06ed3d94534b", "metadata": { "id": "1da871c1-9d58-467a-92fd-06ed3d94534b" }, "outputs": [], "source": [ "def load_corpus(dir, verbose=False):\n", " if verbose:\n", " print(f\"Loading files {dir}\")\n", "\n", " reader = SimpleDirectoryReader(dir)\n", " docs = reader.load_data()\n", " if verbose:\n", " print(f\"Loaded {len(docs)} docs\")\n", "\n", " parser = SentenceSplitter()\n", " nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)\n", "\n", " if verbose:\n", " print(f\"Parsed {len(nodes)} nodes\")\n", "\n", " return nodes" ] }, { "cell_type": "markdown", "id": "53056d8b-3b4c-4364-9b07-a375aa84330b", "metadata": { "id": "53056d8b-3b4c-4364-9b07-a375aa84330b" }, "source": [ "We do a very naive train/val split by having the Lyft corpus as the train dataset, and the Uber corpus as the val dataset." ] }, { "cell_type": "code", "execution_count": 32, "id": "d3651c77-d085-4fbc-bb34-61f143ad6674", "metadata": { "colab": { "referenced_widgets": [ "554a6636780246c8a19d1efe7a6e4786", "6748733283a34725ba6365f3c1fb1c1d" ] }, "id": "d3651c77-d085-4fbc-bb34-61f143ad6674", "outputId": "68e4fa24-3280-46e7-9694-762d6de05628" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading files data\n", "Loaded 307 docs\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Parsing nodes: 100%|██████████| 307/307 [00:00<00:00, 990.90it/s] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Parsed 325 nodes\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "corpus_nodes = load_corpus(TRAIN_FILES, verbose=True)" ] }, { "cell_type": "code", "execution_count": 33, "id": "3456ebde", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Val Nodes: 32\n", "Train Size: 293\n", "Val Size: 32\n" ] } ], "source": [ "n_val_nodes = int(len(corpus_nodes) * 0.1)\n", "print(f'Val Nodes: {n_val_nodes}')\n", "\n", "train_nodes = corpus_nodes[:-n_val_nodes]\n", "print(f'Train Size: {len(train_nodes)}')\n", "\n", "val_nodes = corpus_nodes[len(train_nodes):]\n", "print(f'Val Size: {len(val_nodes)}')" ] }, { "cell_type": "code", "execution_count": 34, "id": "87619dee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "32" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(val_nodes)" ] }, { "cell_type": "markdown", "id": "b4482c48-844b-448b-9552-3f38b455645c", "metadata": { "id": "b4482c48-844b-448b-9552-3f38b455645c" }, "source": [ "### Generate synthetic queries\n", "\n", "Now, we use an LLM (gpt-3.5-turbo) to generate questions using each text chunk in the corpus as context.\n", "\n", "Each pair of (generated question, text chunk used as context) becomes a datapoint in the finetuning dataset (either for training or evaluation)." ] }, { "cell_type": "code", "execution_count": 8, "id": "580334ce-ddaa-4cc0-8c3e-7294d11e4d2f", "metadata": { "id": "580334ce-ddaa-4cc0-8c3e-7294d11e4d2f" }, "outputs": [], "source": [ "from llama_index.finetuning import generate_qa_embedding_pairs\n", "from llama_index.core.evaluation import EmbeddingQAFinetuneDataset" ] }, { "cell_type": "code", "execution_count": 9, "id": "666001e2", "metadata": { "id": "666001e2" }, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter Your OpenAI API Key: \")" ] }, { "cell_type": "code", "execution_count": 35, "id": "ef43fe59-a29c-481b-b086-e98e55016d3e", "metadata": { "id": "ef43fe59-a29c-481b-b086-e98e55016d3e", "outputId": "ebcf6b8b-c827-4d2c-ae26-3a1519107d02" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/293 [00:00, tokenizer_name='test_model', max_length=512, pooling=, normalize=True, query_instruction=None, text_instruction=None, cache_folder=None)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from huggingface_hub import notebook_login\n", "notebook_login()" ] }, { "cell_type": "code", "execution_count": null, "id": "25b8c07a", "metadata": {}, "outputs": [], "source": [ "!optimum-cli export onnx --model policy_gte_large_5/ policy_gte_large_5/onnx/ --task feature-extraction --trust-remote-code --framework pt" ] }, { "cell_type": "code", "execution_count": null, "id": "c933d0cf", "metadata": {}, "outputs": [], "source": [ "finetune_engine.model.push_to_hub(finetuned_model, local_model_path=finetuned_model, exist_ok=True)" ] }, { "cell_type": "markdown", "id": "828dd6fe-9a8a-419b-8663-56d81ce73774", "metadata": { "id": "828dd6fe-9a8a-419b-8663-56d81ce73774" }, "source": [ "## Evaluate Finetuned Model" ] }, { "cell_type": "markdown", "id": "f4a66b83-4cbb-4374-a632-0f1bb2b785ab", "metadata": { "id": "f4a66b83-4cbb-4374-a632-0f1bb2b785ab" }, "source": [ "In this section, we evaluate 3 different embedding models:\n", "1. proprietary OpenAI embedding,\n", "2. open source `BAAI/bge-small-en`, and\n", "3. our finetuned embedding model.\n", "\n", "We consider 2 evaluation approaches:\n", "1. a simple custom **hit rate** metric\n", "2. using `InformationRetrievalEvaluator` from sentence_transformers\n", "\n", "We show that finetuning on synthetic (LLM-generated) dataset significantly improve upon an opensource embedding model." ] }, { "cell_type": "code", "execution_count": null, "id": "57d5176f-1f21-4bcb-adf5-da1c4cccb8d3", "metadata": { "id": "57d5176f-1f21-4bcb-adf5-da1c4cccb8d3" }, "outputs": [], "source": [ "from llama_index.embeddings.openai import OpenAIEmbedding\n", "from llama_index.core import VectorStoreIndex\n", "from llama_index.core.schema import TextNode\n", "from tqdm.notebook import tqdm\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "dda4c2b8-1ad8-420c-83d2-b88e0519895d", "metadata": { "id": "dda4c2b8-1ad8-420c-83d2-b88e0519895d" }, "source": [ "### Define eval function" ] }, { "cell_type": "markdown", "id": "398c24d3-3d72-4ce8-94a4-2da9c1b2605c", "metadata": { "id": "398c24d3-3d72-4ce8-94a4-2da9c1b2605c" }, "source": [ "**Option 1**: We use a simple **hit rate** metric for evaluation:\n", "* for each (query, relevant_doc) pair,\n", "* we retrieve top-k documents with the query, and\n", "* it's a **hit** if the results contain the relevant_doc.\n", "\n", "This approach is very simple and intuitive, and we can apply it to both the proprietary OpenAI embedding as well as our open source and fine-tuned embedding models." ] }, { "cell_type": "code", "execution_count": null, "id": "b89401d3-a157-4f96-86d4-212e631a54bc", "metadata": { "id": "b89401d3-a157-4f96-86d4-212e631a54bc" }, "outputs": [], "source": [ "def evaluate(\n", " dataset,\n", " embed_model,\n", " top_k=5,\n", " verbose=False,\n", "):\n", " corpus = dataset.corpus\n", " queries = dataset.queries\n", " relevant_docs = dataset.relevant_docs\n", "\n", " nodes = [TextNode(id_=id_, text=text) for id_, text in corpus.items()]\n", " index = VectorStoreIndex(\n", " nodes, embed_model=embed_model, show_progress=True\n", " )\n", " retriever = index.as_retriever(similarity_top_k=top_k)\n", "\n", " eval_results = []\n", " for query_id, query in tqdm(queries.items()):\n", " retrieved_nodes = retriever.retrieve(query)\n", " retrieved_ids = [node.node.node_id for node in retrieved_nodes]\n", " expected_id = relevant_docs[query_id][0]\n", " is_hit = expected_id in retrieved_ids # assume 1 relevant doc\n", "\n", " eval_result = {\n", " \"is_hit\": is_hit,\n", " \"retrieved\": retrieved_ids,\n", " \"expected\": expected_id,\n", " \"query\": query_id,\n", " }\n", " eval_results.append(eval_result)\n", " return eval_results" ] }, { "cell_type": "markdown", "id": "7eb16251-bb45-4de0-b65a-e15aa76e0f1e", "metadata": { "id": "7eb16251-bb45-4de0-b65a-e15aa76e0f1e" }, "source": [ "**Option 2**: We use the `InformationRetrievalEvaluator` from sentence_transformers.\n", "\n", "This provides a more comprehensive suite of metrics, but we can only run it against the sentencetransformers compatible models (open source and our finetuned model, *not* the OpenAI embedding model)." ] }, { "cell_type": "code", "execution_count": null, "id": "88e89702-ea35-4c22-99c7-f89a5428ef95", "metadata": { "id": "88e89702-ea35-4c22-99c7-f89a5428ef95" }, "outputs": [], "source": [ "from sentence_transformers.evaluation import InformationRetrievalEvaluator\n", "from sentence_transformers import SentenceTransformer\n", "from pathlib import Path\n", "\n", "\n", "def evaluate_st(\n", " dataset,\n", " model_id,\n", " name,\n", "):\n", " corpus = dataset.corpus\n", " queries = dataset.queries\n", " relevant_docs = dataset.relevant_docs\n", "\n", " evaluator = InformationRetrievalEvaluator(\n", " queries, corpus, relevant_docs, name=name\n", " )\n", " model = SentenceTransformer(model_id)\n", " output_path = \"results/\"\n", " Path(output_path).mkdir(exist_ok=True, parents=True)\n", " return evaluator(model, output_path=output_path)" ] }, { "cell_type": "markdown", "id": "af2d33dd-c39f-4c05-8adc-65db12163c88", "metadata": { "id": "af2d33dd-c39f-4c05-8adc-65db12163c88" }, "source": [ "### Run Evals" ] }, { "cell_type": "markdown", "id": "c630aa25-2395-4a8b-83cf-2885fbc862f4", "metadata": { "id": "c630aa25-2395-4a8b-83cf-2885fbc862f4" }, "source": [ "#### OpenAI\n", "\n", "Note: this might take a few minutes to run since we have to embed the corpus and queries" ] }, { "cell_type": "code", "execution_count": null, "id": "61a0784f-415e-4d3a-8c88-757b28b9e5df", "metadata": { "id": "61a0784f-415e-4d3a-8c88-757b28b9e5df" }, "outputs": [], "source": [ "ada = OpenAIEmbedding()\n", "ada_val_results = evaluate(val_dataset, ada)" ] }, { "cell_type": "code", "execution_count": null, "id": "ccc73212-fc53-48c1-b347-f5ee3a29ae82", "metadata": { "id": "ccc73212-fc53-48c1-b347-f5ee3a29ae82" }, "outputs": [], "source": [ "df_ada = pd.DataFrame(ada_val_results)" ] }, { "cell_type": "code", "execution_count": null, "id": "25eb61bb-c287-40fe-b3c7-bbfc2d2b1b94", "metadata": { "id": "25eb61bb-c287-40fe-b3c7-bbfc2d2b1b94", "outputId": "44bbd12b-f339-4102-f53e-23fb6bd25392" }, "outputs": [ { "data": { "text/plain": [ "0.8779904306220095" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hit_rate_ada = df_ada[\"is_hit\"].mean()\n", "hit_rate_ada" ] }, { "cell_type": "markdown", "id": "a1bd6c62-65a8-4f72-a67c-d0d62c92d7d1", "metadata": { "id": "a1bd6c62-65a8-4f72-a67c-d0d62c92d7d1" }, "source": [ "### BAAI/bge-small-en" ] }, { "cell_type": "code", "execution_count": null, "id": "24454aeb-9e3e-4954-ab70-647102ed7f82", "metadata": { "colab": { "referenced_widgets": [ "6e9c5f0555f641caa3a5a5d11cb87583", "1fe9a221f8984c818727771d12dfef71", "619c9cae8bf24987a4d3453aa69d24b9", "082cfe7c9f3646948886c90f0e1f4258", "3ff8d7a739fc425abf24076c47c0ab29", "0a5344851cb14ed8a5f788cbd74a90d8", "eaa8bdab99244058b1df3eae12a79b20", "e21b1a35d6c54644be124c357852fedf", "927efec699ea4c929da7214eb51fc64c", "1c8a00d15090422181a9749e0638e883", "3845bc276c88482ba0e2f2fbe317dd78", "7ceca7b6507e42b1b3da10711b37b7ab", "21170e7cf0f9485a9095807a6225aa12", "3712232b7e064486879945c4d4ac5535", "ba1f47ec020447c59d008493b31e0a57" ] }, "id": "24454aeb-9e3e-4954-ab70-647102ed7f82", "outputId": "bb1f9e84-0be2-4cd8-ec8b-407729613185" }, "outputs": [ { "data": { "application/json": { "ascii": false, "bar_format": null, "colour": null, "elapsed": 0.011851787567138672, "initial": 0, "n": 0, "ncols": null, "nrows": 28, "postfix": null, "prefix": "Downloading (…)ab102/.gitattributes", "rate": null, "total": 1519, "unit": "B", "unit_divisor": 1000, "unit_scale": true }, "application/vnd.jupyter.widget-view+json": { "model_id": "6e9c5f0555f641caa3a5a5d11cb87583", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading (…)ab102/.gitattributes: 0%| | 0.00/1.52k [00:00 1\u001b[0m \u001b[43mevaluate_st\u001b[49m\u001b[43m(\u001b[49m\u001b[43mval_dataset\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mBAAI/bge-small-en\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbge\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "Cell \u001b[0;32mIn[49], line 15\u001b[0m, in \u001b[0;36mevaluate_st\u001b[0;34m(dataset, model_id, name)\u001b[0m\n\u001b[1;32m 13\u001b[0m evaluator \u001b[38;5;241m=\u001b[39m InformationRetrievalEvaluator(queries, corpus, relevant_docs, name\u001b[38;5;241m=\u001b[39mname)\n\u001b[1;32m 14\u001b[0m model \u001b[38;5;241m=\u001b[39m SentenceTransformer(model_id)\n\u001b[0;32m---> 15\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mevaluator\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moutput_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mresults/\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Programming/gpt_index/.venv/lib/python3.10/site-packages/sentence_transformers/evaluation/InformationRetrievalEvaluator.py:104\u001b[0m, in \u001b[0;36mInformationRetrievalEvaluator.__call__\u001b[0;34m(self, model, output_path, epoch, steps, *args, **kwargs)\u001b[0m\n\u001b[1;32m 102\u001b[0m csv_path \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(output_path, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcsv_file)\n\u001b[1;32m 103\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39misfile(csv_path):\n\u001b[0;32m--> 104\u001b[0m fOut \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mcsv_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mw\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mutf-8\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 105\u001b[0m fOut\u001b[38;5;241m.\u001b[39mwrite(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m,\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcsv_headers))\n\u001b[1;32m 106\u001b[0m fOut\u001b[38;5;241m.\u001b[39mwrite(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'results/Information-Retrieval_evaluation_bge_results.csv'" ] } ], "source": [ "evaluate_st(val_dataset, \"BAAI/bge-small-en\", name=\"bge\")" ] }, { "cell_type": "markdown", "id": "1fd87550-f547-4b8b-b21a-f72b355e2cd7", "metadata": { "id": "1fd87550-f547-4b8b-b21a-f72b355e2cd7" }, "source": [ "### Finetuned" ] }, { "cell_type": "code", "execution_count": null, "id": "402dd440-1934-4778-8ff5-28e15cf1f2d3", "metadata": { "id": "402dd440-1934-4778-8ff5-28e15cf1f2d3" }, "outputs": [], "source": [ "finetuned = \"local:test_model\"\n", "val_results_finetuned = evaluate(val_dataset, finetuned)" ] }, { "cell_type": "code", "execution_count": null, "id": "ffd24643-17cb-4773-a535-77f3f8fa2d48", "metadata": { "id": "ffd24643-17cb-4773-a535-77f3f8fa2d48" }, "outputs": [], "source": [ "df_finetuned = pd.DataFrame(val_results_finetuned)" ] }, { "cell_type": "code", "execution_count": null, "id": "ec1dccd1-bbd4-427f-a520-b1011643d83b", "metadata": { "id": "ec1dccd1-bbd4-427f-a520-b1011643d83b" }, "outputs": [], "source": [ "hit_rate_finetuned = df_finetuned[\"is_hit\"].mean()\n", "hit_rate_finetuned" ] }, { "cell_type": "code", "execution_count": null, "id": "9d8dd38e-f13d-43e1-9802-cc94b854526b", "metadata": { "id": "9d8dd38e-f13d-43e1-9802-cc94b854526b" }, "outputs": [], "source": [ "evaluate_st(val_dataset, \"test_model\", name=\"finetuned\")" ] }, { "cell_type": "markdown", "id": "fbc290bc-5cc3-4ee4-b8ab-e68371441643", "metadata": { "id": "fbc290bc-5cc3-4ee4-b8ab-e68371441643" }, "source": [ "### Summary of Results" ] }, { "cell_type": "markdown", "id": "6f906a11-6a95-4f10-9069-140bf5a56246", "metadata": { "id": "6f906a11-6a95-4f10-9069-140bf5a56246" }, "source": [ "#### Hit rate" ] }, { "cell_type": "code", "execution_count": null, "id": "705fbe3c-2843-4bab-bb5c-16027fc5564b", "metadata": { "id": "705fbe3c-2843-4bab-bb5c-16027fc5564b" }, "outputs": [], "source": [ "df_ada[\"model\"] = \"ada\"\n", "df_bge[\"model\"] = \"bge\"\n", "df_finetuned[\"model\"] = \"fine_tuned\"" ] }, { "cell_type": "markdown", "id": "bebc363c-cd07-4dab-916e-1618d16d1254", "metadata": { "id": "bebc363c-cd07-4dab-916e-1618d16d1254" }, "source": [ "We can see that fine-tuning our small open-source embedding model drastically improve its retrieval quality (even approaching the quality of the proprietary OpenAI embedding)!" ] }, { "cell_type": "code", "execution_count": null, "id": "57f38b4b-1b40-42da-a054-ea9593d3e602", "metadata": { "id": "57f38b4b-1b40-42da-a054-ea9593d3e602" }, "outputs": [], "source": [ "df_all = pd.concat([df_ada, df_bge, df_finetuned])\n", "df_all.groupby(\"model\").mean(\"is_hit\")" ] }, { "cell_type": "markdown", "id": "08094d07-2c0a-44ca-ad2f-8d8bf1387ed9", "metadata": { "id": "08094d07-2c0a-44ca-ad2f-8d8bf1387ed9" }, "source": [ "#### InformationRetrievalEvaluator" ] }, { "cell_type": "code", "execution_count": null, "id": "27d0444e-a824-42d6-9ddb-4da7179902bc", "metadata": { "id": "27d0444e-a824-42d6-9ddb-4da7179902bc" }, "outputs": [], "source": [ "df_st_bge = pd.read_csv(\n", " \"results/Information-Retrieval_evaluation_bge_results.csv\"\n", ")\n", "df_st_finetuned = pd.read_csv(\n", " \"results/Information-Retrieval_evaluation_finetuned_results.csv\"\n", ")" ] }, { "cell_type": "markdown", "id": "c0903ed3-df05-4d98-8b0a-6f352c681735", "metadata": { "id": "c0903ed3-df05-4d98-8b0a-6f352c681735" }, "source": [ "We can see that embedding finetuning improves metrics consistently across the suite of eval metrics" ] }, { "cell_type": "code", "execution_count": null, "id": "81ec1c46-5aa0-4f8a-a0c5-2553e08cceb1", "metadata": { "id": "81ec1c46-5aa0-4f8a-a0c5-2553e08cceb1" }, "outputs": [], "source": [ "df_st_bge[\"model\"] = \"bge\"\n", "df_st_finetuned[\"model\"] = \"fine_tuned\"\n", "df_st_all = pd.concat([df_st_bge, df_st_finetuned])\n", "df_st_all = df_st_all.set_index(\"model\")\n", "df_st_all" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "A100", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 5 }