qsaheeb
commited on
Commit
·
af79894
1
Parent(s):
dffcab4
Final changes
Browse files
README.md
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
=========================================
|
2 |
+
BOOK RECOMMENDATION SYSTEM
|
3 |
+
=========================================
|
4 |
+
PROJECT OVERVIEW
|
5 |
+
|
6 |
+
---
|
7 |
+
|
8 |
+
This project is a content-based book recommendation system that suggests books based on their summaries. The system takes a user-inputted book title and retrieves similar books using Sentence-BERT (SBERT) embeddings and a cross-encoder model for re-ranking.
|
9 |
+
|
10 |
+
If the book is not found in the dataset, the system attempts to fetch its summary from the internet using DuckDuckGo Search. The project also incorporates typo correction to handle minor misspellings in book titles.
|
11 |
+
|
12 |
+
A Gradio web application serves as the interface, allowing users to enter book titles and receive recommendations interactively.
|
13 |
+
|
14 |
+
---
|
15 |
+
|
16 |
+
## FEATURES
|
17 |
+
|
18 |
+
1. Typo Correction - Uses fuzzy matching to correct user input if needed.
|
19 |
+
2. Content-Based Recommendations - Finds similar books using SBERT embeddings.
|
20 |
+
3. Re-Ranking with Cross-Encoder - Improves ranking accuracy using a more advanced ranking model.
|
21 |
+
4. Web Scraping for Missing Books - Fetches book summaries from the internet when not found in the dataset.
|
22 |
+
|
23 |
+
---
|
24 |
+
|
25 |
+
## PROJECT STRUCTURE
|
26 |
+
|
27 |
+
book-recommendation/
|
28 |
+
|-- data/ -> Contains book summaries and metadata
|
29 |
+
| |-- books_summary_cleaned.csv (Preprocessed dataset)
|
30 |
+
|-- model/ -> Stores precomputed embeddings
|
31 |
+
| |-- sbert_embeddings2.pkl (MPNET(BERT) embeddings for books)
|
32 |
+
|-- preprocess.py -> Preprocesses book dataset by handling duplicates, missing values, and text cleaning
|
33 |
+
|--embeddings.py -> Extracts BERT embeddings from book summaries and save them.
|
34 |
+
|-- app.py -> Main Gradio application to recommend books
|
35 |
+
|-- requirements.txt -> Dependencies
|
36 |
+
|-- README.txt -> Project documentation
|
37 |
+
|
38 |
+
---
|
39 |
+
|
40 |
+
## HOW IT WORKS
|
41 |
+
|
42 |
+
1 User Inputs a Book Title:
|
43 |
+
|
44 |
+
- If the book is not found, the system searches online for its summary.
|
45 |
+
- If there's a typo, it corrects the title before searching.
|
46 |
+
|
47 |
+
2 Retrieve Similar Books using BERT:
|
48 |
+
|
49 |
+
- The system encodes the book's summary into BERT embeddings.
|
50 |
+
- It calculates cosine similarity to find the top 10 similar books.
|
51 |
+
|
52 |
+
3 Re-Rank Books using a Cross-Encoder:
|
53 |
+
|
54 |
+
- A Cross-Encoder model ranks the books more accurately.
|
55 |
+
- The top 5 recommendations are returned.
|
56 |
+
- This model is optional and it increases the time significantly but I chose to include it as the time was still less than 3 seconds for the inference.
|
57 |
+
|
58 |
+
4 Display Logs in Gradio:
|
59 |
+
|
60 |
+
- The system logs each step (e.g., typo correction, dataset search, web scraping).
|