File size: 3,157 Bytes
1c9bc57
 
 
 
 
 
14f1a33
309441d
 
1c9bc57
0865da6
 
 
 
 
 
 
 
 
 
d955cb6
19414c6
15cae9f
2e035ac
1ed25f6
64caa8b
2e035ac
1ed25f6
e893961
 
ddfc5b2
1ed25f6
 
 
e893961
f71261e
1ed25f6
e893961
715f786
e893961
1ed25f6
f5fcb73
 
 
 
 
e893961
a1642e5
 
 
 
c931c98
1ed25f6
e893961
 
1ed25f6
 
 
e893961
 
8d8decb
b55be4b
1ed25f6
b55be4b
e893961
8d8decb
79cea07
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
datasets:
- PleIAs/KaribuAI
language:
- en
- fr
pipeline_tag: text-classification
base_model:
- PleIAs/celadon
---
<div style="display: flex; justify-content: center; align-items: center; gap: 30px; margin: 20px auto; flex-wrap: nowrap;">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e1de43c0ea022d54e0c4e0/s1vzWPjwTZDjt1OJhCM-9.png" 
         style="width: 120px; height: 60px; object-fit: contain;" 
         alt="BSF Logo"/>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e1de43c0ea022d54e0c4e0/0N9N1XkqoXF4G9IpMYvM3.png" 
         style="width: 120px; height: 60px; object-fit: contain;" 
         alt="PleIAs Logo"/>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/65e1de43c0ea022d54e0c4e0/vBpchFVmXCHJn4EuNaBIK.png" 
         style="width: 120px; height: 60px; object-fit: contain;" 
         alt="Kajou Logo"/>
</div>

The Karibu project is a collaboration between pleIAs, Bibliothèque sans frontière (BSF) and Kajou. Our platform delivers comprehensive educational activities across six CEFR proficiency levels (A1 to C2), making quality language learning accessible to all, even in offline environments through microSD card deployment. By combining reading comprehension, interactive exercises, and personalized learning paths, Karibu creates an immersive educational experience that adapts to each learner's needs.

## Karibu Language Level Classifier
KaribuAI is a DeBERTa-based classifier that automatically assigns CEFR language proficiency levels (A1-C2) to French educational content.

## Architecture: DeBERTa with multi-head classification
- Model Size: Fine-tuned from DeBERTa-v3-small
- Output : 6 classification levels (A1, A2, B1, B2, C1, C2)


## Training Details

- Training Data: 9,000 synthetic samples
🔍 [Explore the full dataset](https://huggingface.co./datasets/PleIAs/KaribuAI/viewer/default)

- Source: French press articles + Wikimedia content
- Processing: Sequential text simplification using an open source model (to be published)
- Validation: 1,000 samples per level manually verified by BSF experts

## Cultural Relevance and Ethical Content Curation

Understanding the importance of cultural context in language learning, we've implemented a robust content filtering system that ensures all materials are not only educationally sound but also culturally sensitive.

- Topics coverage : Solidarity, geography, African literature, agriculture, tourism, cultural events, African history, geopolitics, communication
- Topic Filtering: Meta-Llama-3-8B-Instruct for content categorization
- Toxicity filtering : Celadon

🤖 [Explore the Celadon model](https://huggingface.co./PleIAs/celadon)


## levels
- Manual verification using CEFR framework criteria
- Statistical validation using Louvain word-level classification

## Technical Integration

- Deployment: Offline-capable via microSD cards
- Input Processing: Handles various text types (academic writing, press articles, emails, letters, stories)


## Collaborators

PleIAs: Technical development, Bibliothèque Sans Frontières (BSF): Educational expertise, Kajou: Distribution platform