<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/main/notebooks/03-RAG_with_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables

In [None]:
!pip install -q llama-index==0.10.5 openai==1.12.0 cohere==4.47 tiktoken==0.6.0

In [None]:
import os

# Set the "OPENAI_API_KEY" in the Python environment. Will be used by OpenAI client later.
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_KEY>"

# Load Dataset

## Download

The dataset includes several articles from the TowardsAI blog, which provide an in-depth explanation of the LLaMA2 model.

In [11]:
!wget https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-dataset.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25361  100 25361    0     0   285k      0 --:--:-- --:--:-- --:--:--  284k


## Read File

In [2]:
import json

# Load the file as a JSON
with open('./mini-dataset.json', 'r') as file:
    data = json.load(file)

# The number of chunks in the dataset.
len( data['chunks'] )

22

In [3]:
# Flatten the JSON variable to a list of texts.
texts = [item['text'] for item in data['chunks']]

# Generate Embedding

In [4]:
from llama_index.core import Document

# Convert the texts to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=t) for t in texts]

In [8]:
from llama_index.core import VectorStoreIndex

# Build index / generate embeddings using OpenAI.
index = VectorStoreIndex.from_documents(documents, show_progress=True)

Parsing nodes: 100%|██████████| 22/22 [00:00<00:00, 1539.89it/s]
Generating embeddings: 100%|██████████| 22/22 [00:00<00:00, 26.14it/s]


In [None]:
# Save the generated embeddings.
# index.storage_context.persist(persist_dir="indexes")

# Query Dataset

In [9]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = index.as_query_engine()

In [10]:
response = query_engine.query(
    "How many parameters LLaMA2 model has?"
)
print(response)

The Llama 2 model has four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.
