File size: 1,997 Bytes
99907ef
 
dcf5b2f
 
 
14d47b5
 
 
 
 
 
 
99907ef
dcf5b2f
51f1727
2cb28eb
 
 
 
 
 
 
 
d9d68a4
2cb28eb
51f1727
 
 
6341760
38eaf08
6341760
 
2cb28eb
f1f4a97
 
 
 
 
 
2cb28eb
38eaf08
 
27f7ce8
 
 
 
 
6341760
981e4f6
14d47b5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
language:
- my
pipeline_tag: text-generation
metrics:
- code_eval
library_name: transformers
tags:
- burmese
- gpt2
- pre-trained
---

The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of 1 million Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.



### How to use

```python
!pip install transformers

from transformers import pipeline

pipe = pipeline('text-generation',model='Simbolo-Servicio/myanmar-burmese-gpt', tokenizer='Simbolo-Servicio/myanmar-burmese-gpt',config={'max_length':500})
pipe('မြန်မာဘာသာစကား')
#

```
### Data
The data utilized comprises 1 million sentences sourced from Wikipedia.

### Contributors
Main Contributor: Sa Phyo Thu Htet (https://github.com/SaPhyoThuHtet)
Wikipedia Data Crawling: Kaung Kaung Ko Ko, Phuu Pwint Thinzar Kyaing
Releasing the Model: Eithandaraung, Ye Yint Htut, Thet Chit Su, Naing Phyo Aung


### Limitations and bias
We have yet to thoroughly investigate the potential bias inherent in this model. Regarding transparency, it's important to note that the model is primarily trained on data from the Unicode Burmese(Myanmar) language.

### Previous Work Before Releasing Simbolo Myanmarsar-GPT
We would like to thank the previous works. The main motivation is from these works.
1. MinSithu, MyanmarGPT, https://huggingface.co./jojo-ai-mst/MyanmarGP
2. Dr. Wai Yan Nyein Naing, WYNN747/Burmese-GPT, https://huggingface.co./WYNN747/Burmese-GPT

### References and Citations
1. Jiang, Shengyi & Huang, Xiuwen & Cai, Xiaonan & Lin, Nankai. (2021). Pre-trained Models and Evaluation Data for the Myanmar Language. 10.1007/978-3-030-92310-5_52. 
2. Lin, N., Fu, Y., Chen, C., Yang, Z., & Jiang, S. (2021). LaoPLM: Pre-trained Language Models for Lao. ArXiv. /abs/2110.05896