Update README.md
Browse files
README.md
CHANGED
@@ -24,48 +24,44 @@ The Karibu project is a collaboration between pleIAs, Bibliothèque sans fronti
|
|
24 |
|
25 |
## Karibu Language Level Classifier
|
26 |
Karibu is a DeBERTa-based classifier that automatically assigns CEFR language proficiency levels (A1-C2) to French educational content.
|
27 |
-
Model Characteristics
|
28 |
|
29 |
## Architecture: DeBERTa with multi-head classification
|
30 |
-
Base Model: PleIAs/celadon
|
31 |
-
Model Size: Fine-tuned from DeBERTa-v3-small
|
32 |
-
Output: 6 classification levels (A1, A2, B1, B2, C1, C2)
|
33 |
|
34 |
🤖 [Explore the Celadon model](https://huggingface.co/PleIAs/celadon)
|
35 |
|
36 |
|
37 |
## Training Details
|
38 |
|
39 |
-
Training Data: 9,000 synthetic samples
|
40 |
|
41 |
-
Source: French press articles + Wikimedia content
|
42 |
-
Processing: Sequential text simplification using an open source model (to come)
|
43 |
-
Validation: 1,000 samples per level manually verified by BSF experts
|
44 |
|
45 |
## Topics Coverage:
|
46 |
- solidarity, geography, African literature, agriculture, tourism, cultural events, African history, geopolitics, communication
|
47 |
-
Topic Filtering: Meta-Llama-3-8B-Instruct for content categorization
|
48 |
-
Annotation Method:
|
49 |
|
50 |
🔍 [Explore the full dataset](https://huggingface.co/datasets/PleIAs/KaribuAI/viewer/default)
|
51 |
|
52 |
|
53 |
## levels
|
54 |
-
Manual verification using CEFR framework criteria
|
55 |
-
Statistical validation using Louvain word-level classification
|
56 |
|
57 |
## Technical Integration
|
58 |
|
59 |
-
Deployment: Offline-capable via microSD cards
|
60 |
-
Format: H5P-compatible for interactive exercises
|
61 |
-
Input Processing: Handles various text types (academic writing, press articles, emails, letters, stories)
|
62 |
|
63 |
|
64 |
## Collaborators
|
65 |
|
66 |
-
PleIAs: Technical development
|
67 |
-
Bibliothèque Sans Frontières (BSF): Educational expertise
|
68 |
-
Kajou: Distribution platform
|
69 |
|
70 |
|
71 |
|
|
|
24 |
|
25 |
## Karibu Language Level Classifier
|
26 |
Karibu is a DeBERTa-based classifier that automatically assigns CEFR language proficiency levels (A1-C2) to French educational content.
|
|
|
27 |
|
28 |
## Architecture: DeBERTa with multi-head classification
|
29 |
+
- Base Model: PleIAs/celadon
|
30 |
+
- Model Size: Fine-tuned from DeBERTa-v3-small
|
31 |
+
- Output : 6 classification levels (A1, A2, B1, B2, C1, C2)
|
32 |
|
33 |
🤖 [Explore the Celadon model](https://huggingface.co/PleIAs/celadon)
|
34 |
|
35 |
|
36 |
## Training Details
|
37 |
|
38 |
+
- Training Data: 9,000 synthetic samples
|
39 |
|
40 |
+
- Source: French press articles + Wikimedia content
|
41 |
+
- Processing: Sequential text simplification using an open source model (to come)
|
42 |
+
- Validation: 1,000 samples per level manually verified by BSF experts
|
43 |
|
44 |
## Topics Coverage:
|
45 |
- solidarity, geography, African literature, agriculture, tourism, cultural events, African history, geopolitics, communication
|
46 |
+
- Topic Filtering: Meta-Llama-3-8B-Instruct for content categorization
|
|
|
47 |
|
48 |
🔍 [Explore the full dataset](https://huggingface.co/datasets/PleIAs/KaribuAI/viewer/default)
|
49 |
|
50 |
|
51 |
## levels
|
52 |
+
- Manual verification using CEFR framework criteria
|
53 |
+
- Statistical validation using Louvain word-level classification
|
54 |
|
55 |
## Technical Integration
|
56 |
|
57 |
+
- Deployment: Offline-capable via microSD cards
|
58 |
+
- Format: H5P-compatible for interactive exercises
|
59 |
+
- Input Processing: Handles various text types (academic writing, press articles, emails, letters, stories)
|
60 |
|
61 |
|
62 |
## Collaborators
|
63 |
|
64 |
+
PleIAs: Technical development, Bibliothèque Sans Frontières (BSF): Educational expertise, Kajou: Distribution platform
|
|
|
|
|
65 |
|
66 |
|
67 |
|