pz / docs /README_ko-KR.md
github-actions[bot]
GitHub deploy: ec524f8885a8220ac96f04d1c0f21b3e1e69500e
1bf3e6e

Create new file

๊ณผํ•™ PDF ๋ฌธ์„œ ๋ฒˆ์—ญ ๋ฐ ์ด์ค‘ ์–ธ์–ด ๋น„๊ต ๋„๊ตฌ

ํ”ผ๋“œ๋ฐฑ์€ GitHub Issues ๋˜๋Š” Telegram ๊ทธ๋ฃน์—์„œ ํ•ด์ฃผ์„ธ์š”.

์ตœ๊ทผ ์—…๋ฐ์ดํŠธ

  • [2024๋…„ 12์›” 24์ผ] Xinference ์‹คํ–‰ ๋กœ์ปฌ LLM ์ง€์› ์ถ”๊ฐ€ (by @imClumsyPanda)
  • [2024๋…„ 11์›” 26์ผ] CLI๊ฐ€ ์˜จ๋ผ์ธ ํŒŒ์ผ์„ ์ง€์›ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (by @reycn)
  • [2024๋…„ 11์›” 24์ผ] ์˜์กด์„ฑ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ONNX ์ง€์› ์ถ”๊ฐ€ (by @Wybxc)
  • [2024๋…„ 11์›” 23์ผ] ๐ŸŒŸ ๋ฌด๋ฃŒ ๊ณต๊ณต ์„œ๋น„์Šค ์˜จ๋ผ์ธ! (by @Byaidu)
  • [2024๋…„ 11์›” 23์ผ] ์›น ๋ด‡์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉํ™”๋ฒฝ ์ถ”๊ฐ€ (by @Byaidu)
  • [2024๋…„ 11์›” 22์ผ] GUI๊ฐ€ ์ดํƒˆ๋ฆฌ์•„์–ด๋ฅผ ์ง€์›ํ•˜๊ณ  ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (by @Byaidu, @reycn)
  • [2024๋…„ 11์›” 22์ผ] ๋ฐฐํฌ๋œ ์„œ๋น„์Šค๋ฅผ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๊ณผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (by @Zxis233)
  • [2024๋…„ 11์›” 22์ผ] Tencent ๋ฒˆ์—ญ ์ง€์› (by @hellofinch)
  • [2024๋…„ 11์›” 21์ผ] GUI๊ฐ€ ์ด์ค‘ ์–ธ์–ด ๋ฌธ์„œ ๋‹ค์šด๋กœ๋“œ๋ฅผ ์ง€์›ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค (by @reycn)
  • [2024๋…„ 11์›” 20์ผ] ๐ŸŒŸ ๋ฐ๋ชจ๊ฐ€ ์˜จ๋ผ์ธ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! (by @reycn)

๋ฏธ๋ฆฌ๋ณด๊ธฐ

๊ณต๊ณต ์„œ๋น„์Šค ๐ŸŒŸ

๋ฌด๋ฃŒ ์„œ๋น„์Šค (https://pdf2zh.com/)

์„ค์น˜ ์—†์ด ๋ฌด๋ฃŒ ๊ณต๊ณต ์„œ๋น„์Šค๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ๋ชจ

์„ค์น˜ ์—†์ด HuggingFace์˜ ๋ฐ๋ชจ์™€ ModelScope์˜ ๋ฐ๋ชจ๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ๋ชจ์˜ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ๋‚จ์šฉํ•˜์ง€ ๋ง์•„์ฃผ์„ธ์š”.

์„ค์น˜ ๋ฐ ์‚ฌ์šฉ๋ฒ•

์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” 4๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค: ์ปค๋งจ๋“œ๋ผ์ธ ๋„๊ตฌ, ํฌํ„ฐ๋ธ”, GUI, ๋ฐ Docker.

pdf2zh ์‹คํ–‰์—๋Š” ์ถ”๊ฐ€ ๋ชจ๋ธ(wybxc/DocLayout-YOLO-DocStructBench-onnx)์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ModelScope์—์„œ๋„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ์ž‘ํ•  ๋•Œ ์ด ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ์— ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค๋ฉด ๋‹ค์Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”:

set HF_ENDPOINT=https://hf-mirror.com

PowerShell ์‚ฌ์šฉ์ž์˜ ๊ฒฝ์šฐ:

$env:HF_ENDPOINT = https://hf-mirror.com

๋ฐฉ๋ฒ• 1. ์ปค๋งจ๋“œ๋ผ์ธ ๋„๊ตฌ

  1. Python์ด ์„ค์น˜๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (๋ฒ„์ „ 3.10 <= ๋ฒ„์ „ <= 3.12)

  2. ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    pip install pdf2zh
    
  3. ๋ฒˆ์—ญ์„ ์‹คํ–‰ํ•˜๊ณ  ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์— ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

    pdf2zh document.pdf
    

๋ฐฉ๋ฒ• 2. ํฌํ„ฐ๋ธ”

Python ํ™˜๊ฒฝ์„ ๋ฏธ๋ฆฌ ์„ค์น˜ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

setup.bat์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ๋”๋ธ”ํด๋ฆญํ•˜์—ฌ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• 3. GUI

  1. Python์ด ์„ค์น˜๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (๋ฒ„์ „ 3.10 <= ๋ฒ„์ „ <= 3.12)

  2. ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

    pip install pdf2zh
    
  3. ๋ธŒ๋ผ์šฐ์ €์—์„œ ์‚ฌ์šฉ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:

    pdf2zh -i
    
  4. ๋ธŒ๋ผ์šฐ์ €๊ฐ€ ์ž๋™์œผ๋กœ ์‹œ์ž‘๋˜์ง€ ์•Š์œผ๋ฉด ๋‹ค์Œ URL์„ ์—ฝ๋‹ˆ๋‹ค:

    http://localhost:7860/
    

์ž์„ธํ•œ ๋‚ด์šฉ์€ GUI ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๋ฐฉ๋ฒ• 4. Docker

  1. ํ’€ํ•˜๊ณ  ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

    docker pull byaidu/pdf2zh
    docker run -d -p 7860:7860 byaidu/pdf2zh
    
  2. ๋ธŒ๋ผ์šฐ์ €์—์„œ ์—ฝ๋‹ˆ๋‹ค:

    http://localhost:7860/
    

ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค์—์„œ Docker ๋ฐฐํฌ์šฉ:

๊ณ ๊ธ‰ ์˜ต์…˜

์ปค๋งจ๋“œ๋ผ์ธ์—์„œ ๋ฒˆ์—ญ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฒˆ์—ญ๋œ ๋ฌธ์„œ example-mono.pdf์™€ ์ด์ค‘ ์–ธ์–ด ๋ฌธ์„œ example-dual.pdf๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ Google ๋ฒˆ์—ญ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋” ๋งŽ์€ ์ง€์› ๋ฒˆ์—ญ ์„œ๋น„์Šค๋Š” ์—ฌ๊ธฐ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

cmd

๋‹ค์Œ ํ‘œ์— ์ฐธ๊ณ ์šฉ์œผ๋กœ ๋ชจ๋“  ๊ณ ๊ธ‰ ์˜ต์…˜์„ ๋‚˜์—ดํ–ˆ์Šต๋‹ˆ๋‹ค:

์˜ต์…˜ ๊ธฐ๋Šฅ ์˜ˆ์‹œ
files ๋กœ์ปฌ ํŒŒ์ผ pdf2zh ~/local.pdf
links ์˜จ๋ผ์ธ ํŒŒ์ผ pdf2zh http://arxiv.org/paper.pdf
-i GUI ์ง„์ž… pdf2zh -i
-p ๋ถ€๋ถ„ ๋ฌธ์„œ ๋ฒˆ์—ญ pdf2zh example.pdf -p 1
-li ์†Œ์Šค ์–ธ์–ด pdf2zh example.pdf -li en
-lo ๋Œ€์ƒ ์–ธ์–ด pdf2zh example.pdf -lo zh
-s ๋ฒˆ์—ญ ์„œ๋น„์Šค pdf2zh example.pdf -s deepl
-t ๋ฉ€ํ‹ฐ์Šค๋ ˆ๋“œ pdf2zh example.pdf -t 1
-o ์ถœ๋ ฅ ๋””๋ ‰ํ† ๋ฆฌ pdf2zh example.pdf -o output
-f, -c ์˜ˆ์™ธ pdf2zh example.pdf -f "(MS.*)"
--share [gradio ๊ณต๊ฐœ ๋งํฌ ์–ป๊ธฐ] pdf2zh -i --share
--authorized [์›น ์ธ์ฆ ๋ฐ ์‚ฌ์šฉ์ž ์ •์˜ ์ธ์ฆ ํŽ˜์ด์ง€ ์ถ”๊ฐ€] pdf2zh -i --authorized users.txt [auth.html]
--prompt [์‚ฌ์šฉ์ž ์ •์˜ ๋Œ€ํ˜• ๋ชจ๋ธ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ] pdf2zh --prompt [prompt.txt]
--onnx [์‚ฌ์šฉ์ž ์ •์˜ DocLayout-YOLO ONNX ๋ชจ๋ธ ์‚ฌ์šฉ] pdf2zh --onnx [onnx/model/path]
--serverport [์‚ฌ์šฉ์ž ์ •์˜ WebUI ํฌํŠธ ์‚ฌ์šฉ] pdf2zh --serverport 7860
--dir [๋ฐฐ์น˜ ๋ฒˆ์—ญ] pdf2zh --dir /path/to/translate/
--config ๊ตฌ์„ฑ ํŒŒ์ผ pdf2zh --config /path/to/config/config.json

์ „์ฒด ๋˜๋Š” ๋ถ€๋ถ„ ๋ฌธ์„œ ๋ฒˆ์—ญ

  • ์ „์ฒด ๋ฒˆ์—ญ
pdf2zh example.pdf
  • ๋ถ€๋ถ„ ๋ฒˆ์—ญ
pdf2zh example.pdf -p 1-3,5

์†Œ์Šค ์–ธ์–ด์™€ ๋Œ€์ƒ ์–ธ์–ด ์ง€์ •

Google Languages Codes, DeepL Languages Codes ์ฐธ์กฐ

pdf2zh example.pdf -li en -lo ko

๋‹ค๋ฅธ ์„œ๋น„์Šค๋กœ ๋ฒˆ์—ญ

๋‹ค์Œ ํ‘œ๋Š” ๊ฐ ๋ฒˆ์—ญ ์„œ๋น„์Šค์— ํ•„์š”ํ•œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ฐ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์ด๋Ÿฌํ•œ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜์„ธ์š”.

๋ฒˆ์—ญ๊ธฐ ์„œ๋น„์Šค ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ์ฐธ๊ณ 
Google (๊ธฐ๋ณธ) google ์—†์Œ N/A ์—†์Œ
Bing bing ์—†์Œ N/A ์—†์Œ
DeepL deepl DEEPL_AUTH_KEY [Your Key] DeepL ์ฐธ์กฐ
DeepLX deeplx DEEPLX_ENDPOINT https://api.deepl.com/translate DeepLX ์ฐธ์กฐ
Ollama ollama OLLAMA_HOST, OLLAMA_MODEL http://127.0.0.1:11434, gemma2 Ollama ์ฐธ์กฐ
OpenAI openai OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL https://api.openai.com/v1, [Your Key], gpt-4o-mini OpenAI ์ฐธ์กฐ
AzureOpenAI azure-openai AZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_MODEL [Your Endpoint], [Your Key], gpt-4o-mini Azure OpenAI ์ฐธ์กฐ
Zhipu zhipu ZHIPU_API_KEY, ZHIPU_MODEL [Your Key], glm-4-flash Zhipu ์ฐธ์กฐ
ModelScope modelscope MODELSCOPE_API_KEY, MODELSCOPE_MODEL [Your Key], Qwen/Qwen2.5-Coder-32B-Instruct ModelScope ์ฐธ์กฐ
Silicon silicon SILICON_API_KEY, SILICON_MODEL [Your Key], Qwen/Qwen2.5-7B-Instruct SiliconCloud ์ฐธ์กฐ
Gemini gemini GEMINI_API_KEY, GEMINI_MODEL [Your Key], gemini-1.5-flash Gemini ์ฐธ์กฐ
Azure azure AZURE_ENDPOINT, AZURE_API_KEY https://api.translator.azure.cn, [Your Key] Azure ์ฐธ์กฐ
Tencent tencent TENCENTCLOUD_SECRET_ID, TENCENTCLOUD_SECRET_KEY [Your ID], [Your Key] Tencent ์ฐธ์กฐ
Dify dify DIFY_API_URL, DIFY_API_KEY [Your DIFY URL], [Your Key] Dify ์ฐธ์กฐ, Dify์˜ ์›Œํฌํ”Œ๋กœ์šฐ ์ž…๋ ฅ์—์„œ lang_out, lang_in, text ์„ธ ๋ณ€์ˆ˜๋ฅผ ์ •์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
AnythingLLM anythingllm AnythingLLM_URL, AnythingLLM_APIKEY [Your AnythingLLM URL], [Your Key] anything-llm ์ฐธ์กฐ
Argos Translate argos argos-translate ์ฐธ์กฐ
Grok grok GORK_API_KEY, GORK_MODEL [Your GORK_API_KEY], grok-2-1212 Grok ์ฐธ์กฐ
DeepSeek deepseek DEEPSEEK_API_KEY, DEEPSEEK_MODEL [Your DEEPSEEK_API_KEY], deepseek-chat DeepSeek ์ฐธ์กฐ
OpenAI-Liked openailiked OPENAILIKED_BASE_URL, OPENAILIKED_API_KEY, OPENAILIKED_MODEL url, [Your Key], model name ์—†์Œ

์œ„ ํ‘œ์— ์—†๋Š” OpenAI API์™€ ํ˜ธํ™˜๋˜๋Š” ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ํ‘œ์˜ OpenAI์™€ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

-s service ๋˜๋Š” -s service:model์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒˆ์—ญ ์„œ๋น„์Šค๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

pdf2zh example.pdf -s openai:gpt-4o-mini

๋˜๋Š” ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ๋ชจ๋ธ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

set OPENAI_MODEL=gpt-4o-mini
pdf2zh example.pdf -s openai

PowerShell ์‚ฌ์šฉ์ž์˜ ๊ฒฝ์šฐ:

$env:OPENAI_MODEL = gpt-4o-mini
pdf2zh example.pdf -s openai

์˜ˆ์™ธ ์ง€์ •

์ •๊ทœ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณด์กดํ•ด์•ผ ํ•  ์ˆ˜์‹ ํฐํŠธ์™€ ๋ฌธ์ž๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

๊ธฐ๋ณธ์ ์œผ๋กœ Latex, Mono, Code, Italic, Symbol ๋ฐ Math ํฐํŠธ๋ฅผ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค:

pdf2zh example.pdf -f "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"

์Šค๋ ˆ๋“œ ์ˆ˜ ์ง€์ •

-t๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒˆ์—ญ์— ์‚ฌ์šฉํ•  ์Šค๋ ˆ๋“œ ์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

pdf2zh example.pdf -t 1

์‚ฌ์šฉ์ž ์ •์˜ ํ”„๋กฌํ”„ํŠธ

--prompt๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LLM์—์„œ ์‚ฌ์šฉํ•  ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

pdf2zh example.pdf -pr prompt.txt

prompt.txt ์˜ˆ์‹œ:

[
    {
        "role": "system",
        "content": "You are a professional,authentic machine translation engine.",
    },
    {
        "role": "user",
        "content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:",
    },
]

์‚ฌ์šฉ์ž ์ •์˜ ํ”„๋กฌํ”„ํŠธ ํŒŒ์ผ์—์„œ๋Š” ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

๋ณ€์ˆ˜ ๋‚ด์šฉ
lang_in ์†Œ์Šค ์–ธ์–ด
lang_out ๋Œ€์ƒ ์–ธ์–ด
text ๋ฒˆ์—ญํ•  ํ…์ŠคํŠธ

API

Python

from pdf2zh import translate, translate_stream

params = {"lang_in": "en", "lang_out": "ko", "service": "google", "thread": 4}
file_mono, file_dual = translate(files=["example.pdf"], **params)[0]
with open("example.pdf", "rb") as f:
    stream_mono, stream_dual = translate_stream(stream=f.read(), **params)

HTTP

pip install pdf2zh[backend]
pdf2zh --flask
pdf2zh --celery worker
curl http://localhost:11008/v1/translate -F "[email protected]" -F "data={\"lang_in\":\"en\",\"lang_out\":\"ko\",\"service\":\"google\",\"thread\":4}"
{"id":"d9894125-2f4e-45ea-9d93-1a9068d2045a"}

curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a
{"info":{"n":13,"total":506},"state":"PROGRESS"}

curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a
{"state":"SUCCESS"}

curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/mono --output example-mono.pdf

curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/dual --output example-dual.pdf

curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a -X DELETE

๊ฐ์‚ฌ์˜ ๋ง

๊ธฐ์—ฌ์ž

Alt

์Šคํƒ€ ํžˆ์Šคํ† ๋ฆฌ

Star History Chart