<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/main/notebooks/14-Adding_Chat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables

In [None]:
!pip install -q llama-index==0.9.21 openai==1.6.0 tiktoken==0.5.2 chromadb==0.4.21 kaleido==0.2.1 python-multipart==0.0.6 cohere==4.39

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m77.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m508.6/508.6 kB[0m [31m41.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.7/51.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━

In [None]:
import os

# Set the "OPENAI_API_KEY" in the Python environment. Will be used by OpenAI client later.
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_KEY>"

In [None]:
import nest_asyncio

nest_asyncio.apply()

# Load a Model

In [None]:
from llama_index.llms import OpenAI

llm = OpenAI(temperature=0.9, model="gpt-3.5-turbo", max_tokens=512)

# Create a VectoreStore

In [None]:
import chromadb

# create client and a new collection
# chromadb.EphemeralClient saves data in-memory.
chroma_client = chromadb.PersistentClient(path="./mini-llama-articles")
chroma_collection = chroma_client.create_collection("mini-llama-articles")

In [None]:
from llama_index.vector_stores import ChromaVectorStore

# Define a storage context object using the created vector database.
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Load the Dataset (CSV)

## Download

The dataset includes several articles from the TowardsAI blog, which provide an in-depth explanation of the LLaMA2 model. Read the dataset as a long string.

In [None]:
!wget https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv

--2024-02-13 18:53:28--  https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 173646 (170K) [text/plain]
Saving to: ‘mini-llama-articles.csv’


2024-02-13 18:53:29 (1.89 MB/s) - ‘mini-llama-articles.csv’ saved [173646/173646]



## Read File

In [None]:
import csv

rows = []

# Load the file as a JSON
with open("./mini-llama-articles.csv", mode="r", encoding="utf-8") as file:
  csv_reader = csv.reader(file)

  for idx, row in enumerate( csv_reader ):
    if idx == 0: continue; # Skip header row
    rows.append( row )

# The number of characters in the dataset.
len( rows )

14

# Convert to Document obj

In [None]:
from llama_index import Document

# Convert the chunks to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=row[1], metadata={"title": row[0], "url": row[2], "source_name": row[3]}) for row in rows]

# Transforming

In [None]:
from llama_index.text_splitter import TokenTextSplitter

# Define the splitter object that split the text into segments with 512 tokens,
# with a 128 overlap between the segments.
text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=512, chunk_overlap=128
)

In [None]:
from llama_index.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
    KeywordExtractor,
)
from llama_index.embeddings import OpenAIEmbedding
from llama_index.ingestion import IngestionPipeline

# Create the pipeline to apply the transformation on each chunk,
# and store the transformed text in the chroma vector store.
pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        QuestionsAnsweredExtractor(questions=3, llm=llm),
        SummaryExtractor(summaries=["prev", "self"], llm=llm),
        KeywordExtractor(keywords=10, llm=llm),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store
)

nodes = pipeline.run(documents=documents, show_progress=True);

Parsing nodes:   0%|          | 0/14 [00:00<?, ?it/s]

464
452
457
465
448
468
434
447
455
445
449
455
431
453


Generating embeddings:   0%|          | 0/108 [00:00<?, ?it/s]

In [None]:
len( nodes )

108

In [None]:
# Compress the vector store directory to a zip file to be able to download and use later.
!zip -r vectorstore.zip mini-llama-articles

# Load Indexes

If you have already uploaded the zip file for the vector store checkpoint, please uncomment the code in the following cell block to extract its contents. After doing so, you will be able to load the dataset from local storage.

In [None]:
# !unzip vectorstore.zip

Archive:  vectorstore.zip
   creating: mini-llama-articles/
   creating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/data_level0.bin  
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/header.bin  
 extracting: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/link_lists.bin  
  inflating: mini-llama-articles/a361e92f-9895-41b6-ba72-4ad38e9875bd/length.bin  
  inflating: mini-llama-articles/chroma.sqlite3  


In [None]:
import chromadb
from llama_index.vector_stores import ChromaVectorStore

# Load the vector store from the local storage.
db = chromadb.PersistentClient(path="./mini-llama-articles")
chroma_collection = db.get_or_create_collection("mini-llama-articles")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

In [None]:
from llama_index import VectorStoreIndex

# Create the index based on the vector store.
vector_index = VectorStoreIndex.from_vector_store(vector_store)

# Disply result

In [None]:
# A simple function to show the response and the sources.
def display_res(response):
  print("Response:\n\t", response.response.replace("\n", "") )

  print("Sources:")
  if response.source_nodes:
    for src in response.source_nodes:
      print("\tNode ID\t", src.node_id)
      print("\tText\t", src.text)
      print("\tScore\t", src.score)
      print("\t" + "-_"*20)
  else:
    print("\tNo sources used!")

# Chat Engine

In [None]:
# define the chat_engine by using the index
chat_engine = vector_index.as_chat_engine() #chat_mode="best"

In [None]:
# First Question:
response = chat_engine.chat("Use the tool to answer, How many parameters LLaMA2 model has?")
display_res(response)

Response:
	 The LLaMA2 model has four different sizes, with 7 billion, 13 billion, 34 billion, and 70 billion parameters.
Sources:
	Node ID	 d6f533e5-fef8-469c-a313-def19fd38efe
	Text	 I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.  II. Llama 2 Model Flavors Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been released, the 34B model is still awaited. The pretrained variant, trained on a whopping 2 trilli

In [None]:
# Second Question:
response = chat_engine.chat("Tell me a joke?")
display_res(response)

Response:
	 I'm sorry, but I don't have the capability to generate jokes. However, I'm here to help answer any questions you may have!
Sources:
	Node ID	 021c859e-809b-49b8-8d0d-38cc326c1203
	Text	 with their larger size, outperform Llama 2, this is expected due to their capacity for handling complex language tasks. Llama 2's impressive ability to compete with larger models highlights its efficiency and potential in the market. However, Llama 2 does face challenges in coding and math problems, where models like Chat GPT 4 excel, given their significantly larger size. Chat GPT 4 performed significantly better than Llama 2 for coding (HumanEval benchmark)and math problem tasks (GSM8k benchmark). Open-source AI technologies, like Llama 2, continue to advance, offering strong competition to closed-source models.  V. Ghost Attention: Enhancing Conversational Continuity One unique feature in Llama 2 is Ghost Attention, which ensures continuity in conversations. This means that even after mul

In [None]:
# Third Question: (check if it can recall previous interactions)
response = chat_engine.chat("What was the first question I asked?")
display_res(response)

Response:
	 The first question you asked was, "How many parameters LLaMA2 model has?"
Sources:
	No sources used!


In [None]:
# Reset the session to clear the memory
chat_engine.reset()

In [None]:
# Fourth Question: (don't recall the previous interactions.)
response = chat_engine.chat("What was the first question I asked?")
display_res(response)

Response:
	 The first question you asked was "What was the first question I asked?"
Sources:
	No sources used!


# Streaming

In [None]:
# Stream the words as soon as they are available instead of waiting for the model to finish generation.
streaming_response = chat_engine.stream_chat("Write a paragraph about the LLaMA2 model's capabilities.")
for token in streaming_response.response_gen:
    print(token, end="")

Querying with: What are the capabilities of the LLaMA2 model?
The capabilities of the Llama 2 model include its ability to be integrated into AI-powered applications for commercial use, its availability on Azure and AWS for fine-tuning and adoption, and its impressive performance in terms of scale and efficiency. The model is available in different sizes, ranging from 7 billion to 70 billion parameters, with a context window of 4096 tokens. Llama 2 also prioritizes safety and alignment, demonstrating low AI safety violation percentages and surpassing ChatGPT in safety benchmarks. Additionally, Llama 2 has features such as Ghost Attention, which enhances conversational continuity, and a temporal capability that organizes information based on time relevance, resulting in more contextually accurate responses.

## Condense Question

Enhance the input prompt by looking at the previous chat history along with the present question. The refined prompt can then be used to fetch the nodes.

In [None]:
# Define GPT-4 model that will be used by the chat_engine to improve the query.
gpt4 = OpenAI(temperature=0.9, model="gpt-4")

In [None]:
chat_engine = vector_index.as_chat_engine(chat_mode="condense_question", llm=gpt4, verbose=True)

In [None]:
response = chat_engine.chat("Use the tool to answer, which company released LLaMA2 model? What is the model useful for?")
display_res(response)

Querying with: Which company released the LLaMA2 model and what is the model useful for?
Response:
	 Meta AI released the Llama 2 model. The model is useful for creating AI-powered applications for commercial use.
Sources:
	Node ID	 d6f533e5-fef8-469c-a313-def19fd38efe
	Text	 I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.  II. Llama 2 Model Flavors Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been rele

## REACT

ReAct is an agent-based chat mode that uses a loop to decide on querying a data engine during interactions, offering flexibility but relying on the Large Language Model's quality for effective responses, requiring careful management to avoid inaccurate answers.

In [None]:
chat_engine = vector_index.as_chat_engine(chat_mode="react", verbose=True)

In [None]:
response = chat_engine.chat("Which company released LLaMA2 model? What is the model useful for?")

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: query_engine_tool
Action Input: {'input': 'Which company released LLaMA2 model?'}
[0m[1;3;34mObservation: Meta released the LLaMA2 model.
[0m[1;3;38;5;200mThought: I need to use a tool to help me answer the second question.
Action: query_engine_tool
Action Input: {'input': 'What is the LLaMA2 model useful for?'}
[0m[1;3;34mObservation: The LLaMA2 model is useful for creating AI-powered applications in commercial settings. It can be integrated into products to enable businesses to develop AI-powered applications.
[0m[1;3;38;5;200mThought: I can answer without using any more tools.
Response: The LLaMA2 model was released by Meta. It is useful for creating AI-powered applications in commercial settings and can be integrated into products to enable businesses to develop AI-powered applications.
[0m

In [None]:
display_res(response)

Response:
	 The LLaMA2 model was released by Meta. It is useful for creating AI-powered applications in commercial settings and can be integrated into products to enable businesses to develop AI-powered applications.
Sources:
	Node ID	 8aa510a2-b741-4d55-b661-366c3c5cb681
	Text	 the question, "How long ago did Barack Obama become president?", its only relevant after 2008. This temporal awareness allows Llama 2 to deliver more contextually accurate responses, enriching the user experience further.  VII. Open Questions and Future Outlook Meta's open-sourcing of Llama 2 represents a seismic shift, now offering developers and researchers commercial access to a leading language model. With Llama 2 outperforming MosaicML's current MPT models, all eyes are on how Databricks will respond. Can MosaicML's next MPT iteration beat Llama 2? Is it worthwhile to compete with Llama 2 or join hands with the open-source community to make the open-source models better? Meanwhile, Microsoft's move to host