Chinese very bad
hi, the released florence-2 models are english only.
Would consider add Chinese support? Since it's a "Foundation Vision Model"
I don't think Chinese is the basis. You are wrong in thinking why all models must have Chinese, by default they are in English, that is the foundation. You should not expect Chinese, you should consider English please, not Chinese
You are wrong thinking about all models should must be English.
You are wrong, this is clear. All foundation vision models must be in Slovene. This way is fair for English and Chinese.
Yes, am training a multi langual Florence2 now, so far so good, but not include Slovene, sorry.
Althought this is currently only roughly tuned, when applied on more data, it could gets better.
I don't see any CJK characters in the original vocab.json of Florence-2-large, so I guess you must extend the vocabulary before the chinese OCR finetuning task?
And I still don't understand why it can output chinese chars in your first post, did you have already extend the vocab before inference?
Oh, yes, you were right.
I rechcked the vocab, it doesn have CJK
Very strange....
Oh, yes, you were right.
I rechcked the vocab, it doesn have CJK
Very strange....
Is it mean that you have finetuned florence-2 with chinese ocr training data, but without extending the vocab? And got a pretty decent result?
Yes, yes, but as you can see, the first with raw flr2, it also can prints Chinese.
I haven't tried but I think it might can encode a Chinese character to id, can decode it back
Yes, yes, but as you can see, the first with raw flr2, it also can prints Chinese.
I haven't tried but I think it might can encode a Chinese character to id, can decode it back
Maybe we should try to explore the logic of this, I am also curious, and I am trying to understand why we can output cjk