Upload ./README_zh.md with huggingface_hub
Browse files- README_zh.md +18 -4
README_zh.md
CHANGED
@@ -1,7 +1,21 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
|
5 |
-
TexTeller
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- OleehyO/latex-formulas
|
5 |
+
metrics:
|
6 |
+
- bleu
|
7 |
+
pipeline_tag: image-to-text
|
8 |
+
---
|
9 |
|
10 |
+
[中文版本](./README_zh.md)
|
11 |
|
12 |
+
# About TexTeller
|
13 |
|
14 |
+
* 📮[2024-03-25] TexTeller 2.0 released! The training data for TexTeller 2.0 has been increased to 7.5M (about **15 times more** than TexTeller 1.0 and also improved in data quality). The trained TexTeller 2.0 demonstrated **superior performance** in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.
|
15 |
+
> [There](https://github.com/OleehyO/TexTeller/blob/main/assets/test.pdf) are more test images here and a horizontal comparison of recognition models from different companies.
|
16 |
+
|
17 |
+
TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.
|
18 |
+
|
19 |
+
TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available [here](https://huggingface.co/datasets/OleehyO/latex-formulas)), **exhibits superior generalization ability and higher accuracy compared to [LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)**, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.
|
20 |
+
|
21 |
+
> For more details, please refer to the 𝐓𝐞𝐱𝐓𝐞𝐥𝐥𝐞𝐫 [GitHub repository](https://github.com/OleehyO/TexTeller?tab=readme-ov-file).
|