File size: 4,485 Bytes
4e7f38e 4c7eae9 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 3be2678 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 55d01fd 4e7f38e 3be2678 4e7f38e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
title: Chatbot-with-MaritacaAI-for-PDFs
emoji: 📚
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
---
# Chatbot with MaritacaAI for PDFs
[Leia este README em Português](README.pt.md)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co./spaces/reichaves/Chatbot-with-MaritacaAI-for-PDFs)
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://chatbotmaritacaai-jkiyzmcjcuvduw5dzymwu5.streamlit.app/)
This project implements a conversational Retrieval-Augmented Generation (RAG) system using Streamlit, LangChain, and large language models from [MaritacaAI](https://www.maritaca.ai/) - a Brazilian startup focused on specializing language models for specific domains and languages - specialized in Brazilian Portuguese. The application allows users to upload PDF documents, ask questions about their content, and maintain a chat history for context in ongoing conversations.
## Author
Reinaldo Chaves ([email protected])
## Features
- Streamlit user interface with dark theme and responsive layout
- Upload and processing of multiple PDF files
- Document processing using LangChain and FAISS
- Answer generation using MaritacaAI's sabia-3 model specialized in Brazilian Portuguese
- Text embeddings using Hugging Face's all-MiniLM-L6-v2 model
- Persistent chat history to maintain conversation context
- Sidebar with important user guidelines
- Token count per response
- Special formatting for legal documents and FOI (Freedom of Information) requests
## Requirements
- Python 3.7+
- Streamlit
- LangChain
- FAISS
- PyPDF2
- MaritalkAI
- HuggingFace Embeddings
- Other dependencies listed in `requirements.txt`
## Installation
1. Clone this repository:
```
git clone https://github.com/reichaves/chatbotmaritacaai.git
cd chatbotmaritacaai
```
2. Install dependencies:
```
pip install streamlit langchain langchain_huggingface maritalk faiss-cpu tenacity cachetools
```
3. Configure the necessary API keys:
- Maritaca AI API key (https://plataforma.maritaca.ai/)
- Hugging Face API token (https://huggingface.co./docs/hub/security-tokens)
## Usage
1. Run the Streamlit application:
```
streamlit run app.py
```
2. Open your browser and access the local address shown in the terminal.
3. Enter your API keys when prompted.
4. Upload one or more PDF files.
5. Ask questions about the documents' content in the text input box.
## How it Works
1. **Document Upload**: Users upload PDF files, which are processed and split into smaller chunks.
2. **Embedding Creation**: The text is converted into embeddings using Hugging Face's all-MiniLM-L6-v2 model.
3. **Vector Storage**: Embeddings are stored in a FAISS database for efficient retrieval.
4. **Question Processing**: User questions are contextualized based on chat history.
5. **Information Retrieval**: The system retrieves the most relevant text chunks based on the question.
6. **Answer Generation**: MaritacaAI's sabia-3 model generates an answer in Brazilian Portuguese based on the retrieved chunks and question.
7. **History Maintenance**: Chat history is maintained to provide context in ongoing conversations.
## Special Features
- Special formatting for legal document analysis
- Detailed processing of Freedom of Information (FOI) documents
- Cache system for better performance
- Robust error handling
- Adaptive interface that maintains conversation context
## Important Notices
- Do not share documents containing sensitive or confidential information
- AI-generated responses may contain errors or inaccuracies
- Always verify information with original sources
- This project is for educational and demonstration purposes
- Use responsibly and in compliance with API usage policies
## Contributions
Contributions are welcome! Please:
1. Fork the project
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
[MIT License](LICENSE)
## Citation
If you use this project in your research or application, please cite:
```
@software{chatbot-maritacaai-pdfs,
author = {Reinaldo Chaves},
title = {Chatbot with MaritacaAI for PDFs},
year = {2024},
url = {https://github.com/reichaves/chatbotmaritacaai/}
}
```
|