qsaheeb commited on
Commit
af79894
·
1 Parent(s): dffcab4

Final changes

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =========================================
2
+ BOOK RECOMMENDATION SYSTEM
3
+ =========================================
4
+ PROJECT OVERVIEW
5
+
6
+ ---
7
+
8
+ This project is a content-based book recommendation system that suggests books based on their summaries. The system takes a user-inputted book title and retrieves similar books using Sentence-BERT (SBERT) embeddings and a cross-encoder model for re-ranking.
9
+
10
+ If the book is not found in the dataset, the system attempts to fetch its summary from the internet using DuckDuckGo Search. The project also incorporates typo correction to handle minor misspellings in book titles.
11
+
12
+ A Gradio web application serves as the interface, allowing users to enter book titles and receive recommendations interactively.
13
+
14
+ ---
15
+
16
+ ## FEATURES
17
+
18
+ 1. Typo Correction - Uses fuzzy matching to correct user input if needed.
19
+ 2. Content-Based Recommendations - Finds similar books using SBERT embeddings.
20
+ 3. Re-Ranking with Cross-Encoder - Improves ranking accuracy using a more advanced ranking model.
21
+ 4. Web Scraping for Missing Books - Fetches book summaries from the internet when not found in the dataset.
22
+
23
+ ---
24
+
25
+ ## PROJECT STRUCTURE
26
+
27
+ book-recommendation/
28
+ |-- data/ -> Contains book summaries and metadata
29
+ | |-- books_summary_cleaned.csv (Preprocessed dataset)
30
+ |-- model/ -> Stores precomputed embeddings
31
+ | |-- sbert_embeddings2.pkl (MPNET(BERT) embeddings for books)
32
+ |-- preprocess.py -> Preprocesses book dataset by handling duplicates, missing values, and text cleaning
33
+ |--embeddings.py -> Extracts BERT embeddings from book summaries and save them.
34
+ |-- app.py -> Main Gradio application to recommend books
35
+ |-- requirements.txt -> Dependencies
36
+ |-- README.txt -> Project documentation
37
+
38
+ ---
39
+
40
+ ## HOW IT WORKS
41
+
42
+ 1 User Inputs a Book Title:
43
+
44
+ - If the book is not found, the system searches online for its summary.
45
+ - If there's a typo, it corrects the title before searching.
46
+
47
+ 2 Retrieve Similar Books using BERT:
48
+
49
+ - The system encodes the book's summary into BERT embeddings.
50
+ - It calculates cosine similarity to find the top 10 similar books.
51
+
52
+ 3 Re-Rank Books using a Cross-Encoder:
53
+
54
+ - A Cross-Encoder model ranks the books more accurately.
55
+ - The top 5 recommendations are returned.
56
+ - This model is optional and it increases the time significantly but I chose to include it as the time was still less than 3 seconds for the inference.
57
+
58
+ 4 Display Logs in Gradio:
59
+
60
+ - The system logs each step (e.g., typo correction, dataset search, web scraping).