Spaces:

clayton07
/

qwen2-colpali-ocr

Running

App Files Files Community

qwen2-colpali-ocr / README.md

clayton07

Update README.md

080848a verified 4 months ago

preview code

raw

history blame

3.12 kB

	---
	title: Qwen2 Colpali Ocr
	emoji: 🔥
	colorFrom: yellow
	colorTo: gray
	sdk: streamlit
	sdk_version: 1.38.0
	app_file: app.py
	pinned: false
	---
	# Qwen2-Colpali-OCR


	This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
	It is deployed here on HuggingFace Spaces [https://huggingface.co./spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co./spaces/clayton07/qwen2-colpali-ocr))

	## Prerequisites

	- Python 3.8+
	- CUDA-compatible GPU (recommended for optimal performance)

	## Installation

	1. Clone the repository:
	```
	git clone https://github.com/your-username/multimodal-rag-app.git
	cd multimodal-rag-app
	```

	2. Create a virtual environment:
	```
	python -m venv venv
	source venv/bin/activate # On Windows, use `venv\Scripts\activate`
	```

	3. Install the required packages:
	```
	pip install -r requirements.txt
	```

	## Running the Application Locally

	1. Ensure you're in the project directory and your virtual environment is activated.

	2. Run the Streamlit app:
	```
	streamlit run app.py
	```

	3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).


	## Features

	- Image upload or selection of an example image
	- Text-based querying of uploaded images
	- Multimodal RAG processing using custom RAG model and Qwen2-VL
	- Adjustable response length


	## Usage

	1. Choose to upload an image or use the example image.
	2. If uploading, select an image file (PNG, JPG, or JPEG).
	3. Enter a text query about the image in the provided input field.
	4. Adjust the maximum number of tokens for the response using the slider.
	5. View the generated response based on the image and your query.

	## Deployment

	This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:

	1. Ensure all dependencies are listed in `requirements.txt`.
	2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
	3. Follow the platform-specific deployment instructions, which typically involve:
	- Connecting your GitHub repository to the deployment platform
	- Configuring environment variables if necessary
	- Setting up any required build processes

	Note: For optimal performance, deploy on a platform that provides GPU support.

	## Disclaimer

	The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.

	## Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	## License

	GNU Public License v2

	## Acknowledgments

	- This project uses the [Qwen2-VL model](https://huggingface.co./Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
	- The custom RAG implementation is based on the [colpali model](https://huggingface.co./vidore/colpali).