bsesic commited on
Commit
c1c4b4b
1 Parent(s): 33d3862

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -23
README.md CHANGED
@@ -13,14 +13,14 @@ datasets:
13
 
14
  ## Model Description
15
 
16
- This is a *Convolutional Neural Network (CNN)* model trained to recognize *Hebrew letters* and a *stop symbols* in images. The model can identify individual letters from a provided image, outputting their respective class along with probabilities.
17
 
18
  ## Model Details:
19
- * *Model Type*: Convolutional Neural Network (CNN)
20
- * *Framework*: TensorFlow 2.x / Keras
21
- * *Input Size*: 64x64 grayscale images of isolated letters.
22
- * *Output Classes*: 28 Hebrew letters + 1 stop symbol (.)
23
- * *Use Case*: Recognizing handwritten or printed Hebrew letters and punctuation in scanned images or photos of documents.
24
 
25
  ## Intended Use
26
 
@@ -60,45 +60,46 @@ If given an image with the Hebrew word "אברם" (Abram), the model can detect
60
 
61
  ## Limitations:
62
 
63
- * *Font Variations*: The model performs best on specific fonts (e.g., square Hebrew letters). Performance may degrade with highly stylized or cursive fonts.
64
- * *Noise Sensitivity*: Images with a lot of noise, artifacts, or low resolution may lead to incorrect predictions.
65
- * *Stop Symbol*: The stop symbol is particularly recognized by detecting three vertical dots. However, false positives can occur if letters with similar shapes are present.
66
 
67
  ## Training Data:
68
 
69
  The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:
70
 
71
- * *28 Hebrew letters*.
72
- * *1 stop symbol* representing three vertical dots (.).
73
 
74
  ## Training Procedure:
75
- * *Optimizer*: Adam
76
- * *Loss function*: Categorical Crossentropy
77
- * *Batch size*: 32
78
- * *Epochs*: 10
79
 
80
  Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.
81
 
82
  ## Model Performance
83
 
84
  # Metrics:
85
- * *Accuracy*: 95% on the validation dataset.
86
- * *Precision*: 94%
87
- * *Recall*: 93%
88
  *
89
  Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.
90
 
91
  ## Known Issues:
92
- * *False Positives for Stop Symbols*: The model sometimes incorrectly identifies letters that resemble three vertical dots as stop symbols.
93
- * *Overfitting to Specific Fonts*: Performance can degrade on handwritten texts or cursive fonts not represented well in the training set.
94
 
95
  ## Ethical Considerations
96
 
97
- * *Bias*: The model was trained on a specific set of Hebrew fonts and may not perform equally well across all types of Hebrew texts, particularly historical or handwritten documents.
98
  Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.
99
- Future Work:
100
 
101
- * *Improving Generalization*: Future work will focus on improving the model's robustness to different fonts, handwriting styles, and noisy inputs.
 
 
102
  Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
103
  Citation:
104
 
 
13
 
14
  ## Model Description
15
 
16
+ This is a **Convolutional Neural Network (CNN)** model trained to recognize **Hebrew letters** and a **stop symbols** in images. The model can identify individual letters from a provided image, outputting their respective class along with probabilities.
17
 
18
  ## Model Details:
19
+ * **Model Type**: Convolutional Neural Network (CNN)
20
+ * **Framework**: TensorFlow 2.x / Keras
21
+ * **Input Size**: 64x64 grayscale images of isolated letters.
22
+ * **Output Classes**: 28 Hebrew letters + 1 stop symbol (.)
23
+ * **Use Case**: Recognizing handwritten or printed Hebrew letters and punctuation in scanned images or photos of documents.
24
 
25
  ## Intended Use
26
 
 
60
 
61
  ## Limitations:
62
 
63
+ * **Font Variations**: The model performs best on specific fonts (e.g., square Hebrew letters). Performance may degrade with highly stylized or cursive fonts.
64
+ * **Noise Sensitivity**: Images with a lot of noise, artifacts, or low resolution may lead to incorrect predictions.
65
+ * **Stop Symbol**: The stop symbol is particularly recognized by detecting three vertical dots. However, false positives can occur if letters with similar shapes are present.
66
 
67
  ## Training Data:
68
 
69
  The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:
70
 
71
+ * **28 Hebrew letters**.
72
+ * **1 stop symbol** representing three vertical dots (.).
73
 
74
  ## Training Procedure:
75
+ * **Optimizer**: Adam
76
+ * **Loss function**: Categorical Crossentropy
77
+ * **Batch size**: 32
78
+ * **Epochs**: 10
79
 
80
  Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.
81
 
82
  ## Model Performance
83
 
84
  # Metrics:
85
+ * **Accuracy**: 95% on the validation dataset.
86
+ * **Precision**: 94%
87
+ * **Recall**: 93%
88
  *
89
  Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.
90
 
91
  ## Known Issues:
92
+ * **False Positives for Stop Symbols**: The model sometimes incorrectly identifies letters that resemble three vertical dots as stop symbols.
93
+ * **Overfitting to Specific Fonts**: Performance can degrade on handwritten texts or cursive fonts not represented well in the training set.
94
 
95
  ## Ethical Considerations
96
 
97
+ * **Bias**: The model was trained on a specific set of Hebrew fonts and may not perform equally well across all types of Hebrew texts, particularly historical or handwritten documents.
98
  Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.
 
99
 
100
+ ## Future Work:
101
+
102
+ * **Improving Generalization**: Future work will focus on improving the model's robustness to different fonts, handwriting styles, and noisy inputs.
103
  Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
104
  Citation:
105