File size: 4,697 Bytes

---
license: mit
language:
- he
tags:
- languages
- manuscripts
- hebrew
- ocr
- letters
- manuscript
- digital-humanities
datasets:
- bsesic/HebrewManuscripts
---

# Hebrew Letter Recognition Model

## Model Description

This is a **Convolutional Neural Network (CNN)** model trained to recognize **Hebrew letters** and a **stop symbols** in images. The model can identify individual letters from a provided image, outputting their respective class along with probabilities.

## Model Details:
* **Model Type**: Convolutional Neural Network (CNN)
* **Framework**: TensorFlow 2.x / Keras
* **Input Size**: 64x64 grayscale images of isolated letters.
* **Output Classes**: 28 Hebrew letters + 1 stop symbol (.)
* **Use Case**: Recognizing handwritten or printed Hebrew letters and punctuation in scanned images or photos of documents.

## Intended Use

This model is designed for the automatic recognition of *Hebrew letters* from images. The model can be used in applications such as:

* Optical character recognition (OCR) systems for Hebrew text.
* Educational tools to help learners read Hebrew text.
* Historical document digitization of Hebrew manuscripts.

## How to Use:
```python
from tensorflow.keras.models import load_model
import numpy as np
import cv2

# Load the model
model = load_model('path_to_model.hebrew_letter_model.keras')

# Preprocess an input image (example for one letter)
img = cv2.imread('path_to_image.jpg', cv2.IMREAD_GRAYSCALE)
img_resized = cv2.resize(img, (64, 64)) / 255.0
img_array = np.expand_dims(img_resized, axis=0)

# Predict
predictions = model.predict(img_array)
predicted_class = np.argmax(predictions, axis=1)[0]

# Class names for Hebrew letters
class_names = ['stop', 'א', 'ב', 'ג', 'ד', 'ה', 'ו', 'ז', 'ח', 'ט', 'י', 'ך', 'כ', 'ל', 'ם', 'מ', 'ן', 'נ', 'ס', 'ע', 'ף', 'פ', 'ץ', 'צ', 'ק', 'ר', 'ש', 'ת']

print("Predicted letter:", class_names[predicted_class])

```

## Example:
If given an image with the Hebrew word "אברם" (Abram), the model can detect and classify the letters and stop symbols with probabilities.

## Limitations:

* **Font Variations**: The model performs best on specific fonts (e.g., square Hebrew letters). Performance may degrade with highly stylized or cursive fonts.
* **Noise Sensitivity**: Images with a lot of noise, artifacts, or low resolution may lead to incorrect predictions.
* **Stop Symbol**: The stop symbol is particularly recognized by detecting three vertical dots. However, false positives can occur if letters with similar shapes are present.

## Training Data:

The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:

* **28 Hebrew letters**.
* **1 stop symbol** representing three vertical dots (.).

## Training Procedure:
* **Optimizer**: Adam
* **Loss function**: Categorical Crossentropy
* **Batch size**: 32
* **Epochs**: 10

Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.

## Model Performance

# Metrics:
* **Accuracy**: 95% on the validation dataset.
* **Precision**: 94%
* **Recall**: 93%
* 
Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.

## Known Issues:
* **False Positives for Stop Symbols**: The model sometimes incorrectly identifies letters that resemble three vertical dots as stop symbols.
* **Overfitting to Specific Fonts**: Performance can degrade on handwritten texts or cursive fonts not represented well in the training set.

## Ethical Considerations

* **Bias**: The model was trained on a specific set of Hebrew fonts and may not perform equally well across all types of Hebrew texts, particularly historical or handwritten documents.
Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.

## Future Work:

* **Improving Generalization**: Future work will focus on improving the model's robustness to different fonts, handwriting styles, and noisy inputs.
Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
Citation:

If you use this model in your work, please cite it as follows:

```bibtex
@misc{hebrew-letter-recognition,
  title={Hebrew Manuscripts Letter Recognition Model},
  author={Benjamin Schnabel},
  year={2024},
  howpublished={\url{https://huggingface.co./bsesic/HebrewManuscriptsMNIST}},
}
```

License:

This model is licensed under [MIT License](https://huggingface.co./datasets/choosealicense/licenses/blob/main/markdown/mit.md).