bsesic
/

HebrewManuscriptsMNIST

 - manuscripts
 datasets:
 - bsesic/HebrewManuscripts
+---
+# Hebrew Letter Recognition Model
+## Model Description
+This is a *Convolutional Neural Network (CNN)* model trained to recognize *Hebrew letters* and a *stop symbols* in images. The model can identify individual letters from a provided image, outputting their respective class along with probabilities.
+## Model Details:
+* *Model Type*: Convolutional Neural Network (CNN)
+* *Framework*: TensorFlow 2.x / Keras
+* *Input Size*: 64x64 grayscale images of isolated letters.
+* *Output Classes*: 28 Hebrew letters + 1 stop symbol (.)
+* *Use Case*: Recognizing handwritten or printed Hebrew letters and punctuation in scanned images or photos of documents.
+## Intended Use
+This model is designed for the automatic recognition of *Hebrew letters* from images. The model can be used in applications such as:
+* Optical character recognition (OCR) systems for Hebrew text.
+* Educational tools to help learners read Hebrew text.
+* Historical document digitization of Hebrew manuscripts.
+## How to Use:
+```
+from tensorflow.keras.models import load_model
+import numpy as np
+import cv2
+# Load the model
+model = load_model('path_to_model.hebrew_letter_model.keras')
+# Preprocess an input image (example for one letter)
+img = cv2.imread('path_to_image.jpg', cv2.IMREAD_GRAYSCALE)
+img_resized = cv2.resize(img, (64, 64)) / 255.0
+img_array = np.expand_dims(img_resized, axis=0)
+# Predict
+predictions = model.predict(img_array)
+predicted_class = np.argmax(predictions, axis=1)[0]
+# Class names for Hebrew letters
+class_names = ['stop', 'א', 'ב', 'ג', 'ד', 'ה', 'ו', 'ז', 'ח', 'ט', 'י', 'ך', 'כ', 'ל', 'ם', 'מ', 'ן', 'נ', 'ס', 'ע', 'ף', 'פ', 'ץ', 'צ', 'ק', 'ר', 'ש', 'ת']
+print("Predicted letter:", class_names[predicted_class])
+```
+## Example:
+If given an image with the Hebrew word "אברם" (Abram), the model can detect and classify the letters and stop symbols with probabilities.
+## Limitations:
+* *Font Variations*: The model performs best on specific fonts (e.g., square Hebrew letters). Performance may degrade with highly stylized or cursive fonts.
+* *Noise Sensitivity*: Images with a lot of noise, artifacts, or low resolution may lead to incorrect predictions.
+* *Stop Symbol*: The stop symbol is particularly recognized by detecting three vertical dots. However, false positives can occur if letters with similar shapes are present.
+## Training Data:
+The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:
+* *28 Hebrew letters*.
+* *1 stop symbol* representing three vertical dots (.).
+## Training Procedure:
+* *Optimizer*: Adam
+* *Loss function*: Categorical Crossentropy
+* *Batch size*: 32
+* *Epochs*: 10
+Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.
+## Model Performance
+# Metrics:
+* *Accuracy*: 95% on the validation dataset.
+* *Precision*: 94%
+* *Recall*: 93%
+*
+Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.
+## Known Issues:
+* *False Positives for Stop Symbols*: The model sometimes incorrectly identifies letters that resemble three vertical dots as stop symbols.
+* *Overfitting to Specific Fonts*: Performance can degrade on handwritten texts or cursive fonts not represented well in the training set.
+## Ethical Considerations
+* *Bias*: The model was trained on a specific set of Hebrew fonts and may not perform equally well across all types of Hebrew texts, particularly historical or handwritten documents.
+Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.
+Future Work:
+* *Improving Generalization*: Future work will focus on improving the model's robustness to different fonts, handwriting styles, and noisy inputs.
+Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
+Citation:
+If you use this model in your work, please cite it as follows:
+```
+@misc{hebrew-letter-recognition,
+  title={Hebrew Manuscripts Letter Recognition Model},
+  author={Benjamin Schnabel},
+  year={2024},
+  howpublished={\url{https://huggingface.co/bsesic/HebrewManuscriptsMNIST}},
+}
+```
+License:
+This model is licensed under [MIT License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md).