<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/main/notebooks/12-Improve_Query.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables

In [None]:
!pip install -q llama-index==0.9.21 openai==1.6.0 tiktoken==0.5.2 chromadb==0.4.21 kaleido==0.2.1 python-multipart==0.0.6 cohere==4.39

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m38.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m508.6/508.6 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.7/51.7 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━

In [None]:
import os

# Set the "OPENAI_API_KEY" in the Python environment. Will be used by OpenAI client later.
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_KEY>"

In [None]:
import nest_asyncio

nest_asyncio.apply()

# Load a Model

In [None]:
from llama_index.llms import OpenAI

llm = OpenAI(temperature=0.9, model="gpt-3.5-turbo", max_tokens=512)

# Create a VectoreStore

In [None]:
import chromadb

# create client and a new collection
# chromadb.EphemeralClient saves data in-memory.
chroma_client = chromadb.PersistentClient(path="./mini-llama-articles")
chroma_collection = chroma_client.create_collection("mini-llama-articles")

In [None]:
from llama_index.vector_stores import ChromaVectorStore

# Define a storage context object using the created vector database.
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Load the Dataset (CSV)

## Download

The dataset includes several articles from the TowardsAI blog, which provide an in-depth explanation of the LLaMA2 model. Read the dataset as a long string.

In [None]:
!wget https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv

--2024-02-12 17:09:58--  https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 173646 (170K) [text/plain]
Saving to: ‘mini-llama-articles.csv’


2024-02-12 17:09:58 (5.50 MB/s) - ‘mini-llama-articles.csv’ saved [173646/173646]



## Read File

In [None]:
import csv

rows = []

# Load the file as a JSON
with open("./mini-llama-articles.csv", mode="r", encoding="utf-8") as file:
  csv_reader = csv.reader(file)

  for idx, row in enumerate( csv_reader ):
    if idx == 0: continue; # Skip header row
    rows.append( row )

# The number of characters in the dataset.
len( rows )

14

# Convert to Document obj

In [None]:
from llama_index import Document

# Convert the chunks to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=row[1], metadata={"title": row[0], "url": row[2], "source_name": row[3]}) for row in rows]

# Transforming

In [None]:
from llama_index.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=512, chunk_overlap=128
)

In [None]:
from llama_index.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
    KeywordExtractor,
)
from llama_index.embeddings import OpenAIEmbedding
from llama_index.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        QuestionsAnsweredExtractor(questions=3, llm=llm),
        SummaryExtractor(summaries=["prev", "self"], llm=llm),
        KeywordExtractor(keywords=10, llm=llm),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store
)

nodes = pipeline.run(documents=documents, show_progress=True);

Parsing nodes:   0%|          | 0/14 [00:00<?, ?it/s]

464
452
457
465
448
468
434
447
455
445
449
455
431
453


Generating embeddings:   0%|          | 0/108 [00:00<?, ?it/s]

In [None]:
len( nodes )

108

In [None]:
!zip -r vectorstore.zip mini-llama-articles

# Load Indexes

In [None]:
!unzip vectorstore.zip

Archive:  vectorstore.zip
   creating: mini-llama-articles/
   creating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/data_level0.bin  
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/header.bin  
 extracting: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/link_lists.bin  
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/length.bin  
  inflating: mini-llama-articles/chroma.sqlite3  


In [None]:
import chromadb
from llama_index.vector_stores import ChromaVectorStore

# Create your index
db = chromadb.PersistentClient(path="./mini-llama-articles")
chroma_collection = db.get_or_create_collection("mini-llama-articles")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

In [None]:
# Create your index
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_vector_store(vector_store)

# Multi-Step Query Engine

## GPT-4

In [None]:
from llama_index import ServiceContext

gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)

In [None]:
from llama_index.indices.query.query_transform.base import StepDecomposeQueryTransform

step_decompose_transform_gpt4 = StepDecomposeQueryTransform(llm=gpt4, verbose=True)

In [None]:
from llama_index.query_engine.multistep_query_engine import MultiStepQueryEngine

query_engine_gpt4 = vector_index.as_query_engine(service_context=service_context_gpt4)
query_engine_gpt4 = MultiStepQueryEngine(
    query_engine=query_engine_gpt4,
    query_transform=step_decompose_transform_gpt4,
    index_summary="Used to answer questions about the LLaMA2 Model",
)

# Query Dataset

## Default

In [None]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = vector_index.as_query_engine()

res = query_engine.query("How many parameters LLaMA2 model has?")

In [None]:
res.response

'The Llama 2 model is available in four different sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.'

In [None]:
for src in res.source_nodes:
  print("Node ID\t", src.node_id)
  print("Title\t", src.metadata['title'])
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 d6f533e5-fef8-469c-a313-def19fd38efe
Title	 Meta's Llama 2: Revolutionizing Open Source Language Models for Commercial Use
Text	 I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.  II. Llama 2 Model Flavors Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been released, the 34B model is still awaited. The pretrained variant, trained on a whopping 2 trillion tokens, boasts a context window of 4096 toke

## GPT-4 Multi-Step

In [None]:
response_gpt4 = query_engine_gpt4.query("How many parameters LLaMA2 model has?")

[1;3;33m> Current query: How many parameters LLaMA2 model has?
[0m[1;3;38;5;200m> New query: What is the LLaMA2 Model?
[0m[1;3;33m> Current query: How many parameters LLaMA2 model has?
[0m[1;3;38;5;200m> New query: None
[0m

In [None]:
response_gpt4.response

'LLaMA 2 model has four different sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.'

In [None]:
for src in response_gpt4.source_nodes:
  print("Node ID\t", src.node_id)
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 121c62a4-e30e-481b-9972-b37f4a64f4b5
Text	 
Question: What is the LLaMA2 Model?
Answer: LLaMA 2 is an open-source commercial model that represents a major advancement from its predecessor, LLaMA 1. Unlike LLaMA 1, which was limited to research use, LLaMA 2 can be integrated into products by businesses to create AI-powered applications. It is available on Azure and AWS, which facilitates its fine-tuning and adoption. LLaMA 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. The model has been trained on a large number of tokens and has a context window of 4096 tokens, twice the size of its predecessor. There is also a fine-tuned version of LLaMA 2 for chat applications. However, there are restrictions on its use to prevent exploitation, such as companies with over 700 million active daily users not being allowed to use it, and its output cannot be used to improve other language models.
Score	 None
-_-_-_-_-_-_-_-_-_-_-_-_-

# Test GPT-3 Multi-Step

In [None]:
from llama_index import ServiceContext
from llama_index.indices.query.query_transform.base import StepDecomposeQueryTransform
from llama_index.query_engine.multistep_query_engine import MultiStepQueryEngine

gpt3 = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt3 = ServiceContext.from_defaults(llm=gpt3)

step_decompose_transform_gpt3 = StepDecomposeQueryTransform(llm=gpt3, verbose=True)

query_engine_gpt3 = vector_index.as_query_engine(service_context=service_context_gpt3)
query_engine_gpt3 = MultiStepQueryEngine(
    query_engine=query_engine_gpt3,
    query_transform=step_decompose_transform_gpt3,
    index_summary="Used to answer questions about the LLaMA2 Model",
)

In [None]:
response_gpt3 = query_engine_gpt3.query("How many parameters LLaMA2 model has?")

[1;3;33m> Current query: How many parameters LLaMA2 model has?
[0m[1;3;38;5;200m> New query: None
[0m

In [None]:
response_gpt3.response

'Empty Response'

# Test Retriever on Multistep

In [None]:
import llama_index

In [None]:
from llama_index.indices.query.schema import QueryBundle

In [None]:
t = QueryBundle("How many parameters LLaMA2 model has?")

In [None]:
query_engine_gpt3.retrieve(t)

NotImplementedError: This query engine does not support retrieve, use query directly

# HyDE Transform

In [None]:
query_engine = vector_index.as_query_engine()

In [None]:
from llama_index.indices.query.query_transform import HyDEQueryTransform
from llama_index.query_engine.transform_query_engine import TransformQueryEngine

hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

In [None]:
response = hyde_query_engine.query("How many parameters LLaMA2 model has?")

In [None]:
response.response

'The Llama 2 model is available in four different sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.'

In [None]:
for src in response.source_nodes:
  print("Node ID\t", src.node_id)
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 d6f533e5-fef8-469c-a313-def19fd38efe
Text	 I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.  II. Llama 2 Model Flavors Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been released, the 34B model is still awaited. The pretrained variant, trained on a whopping 2 trillion tokens, boasts a context window of 4096 tokens, twice the size of its predecessor Llama 1. Meta also released a Llama 2 fine-tuned

In [None]:
query_bundle = hyde("How many parameters LLaMA2 model has?")

In [None]:
hyde_doc = query_bundle.embedding_strs[0]

In [None]:
hyde_doc

"The LLaMA2 model is a complex machine learning model that is widely used in various fields such as natural language processing and computer vision. It is known for its ability to accurately analyze and understand large amounts of data. When it comes to the number of parameters in the LLaMA2 model, it is important to note that this can vary depending on the specific implementation and configuration. However, in general, the LLaMA2 model typically has a large number of parameters, often in the millions or even billions. These parameters are essential for the model to learn and make predictions based on the input data. They represent the weights and biases that are adjusted during the training process to optimize the model's performance. The high number of parameters in the LLaMA2 model allows it to capture intricate patterns and relationships in the data, leading to more accurate predictions and analysis. However, it also means that training and fine-tuning the model can be computationa