Introduction
Welcome to the documentation for manga-image-translator-rust.
This book describes how to build, run and deploy the project, explains available modules, features, config options, examples, and developer docs.
Please read at least until 6. Python Renderer Usage before creating any issues
Quick Start
- Install cuda 12.9/cudnn7
- Download Release
- delete cudnn* file
./mit-runtime -i in -o out
For more details see installation and Usage.
Installation
Binaries are available here for windows, linux and MacOs for arm64 and x86_64
For faster execution, it is recommended to install CUDA and cuDNN.
Install cuda 12.9
Install cudnn
If you use cuda delete the cudnn* file in the downloaded folder.
Otherwise, delete the onnxruntime cuda execution provider
Linux only
cd path/to/folderecho "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$(pwd)" >> ~/.bashrcecho "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$(pwd)" >> ~/.zshrc
MacOS only
cd path/to/folderecho "export DYLD_LIBRARY_PATH=\$DYLD_LIBRARY_PATH:$(pwd)" >> ~/.bashrcecho "export DYLD_LIBRARY_PATH=\$DYLD_LIBRARY_PATH:$(pwd)" >> ~/.zshrc
Usage
❯ cargo r -p simple-runtime -- cli -i path/to/input -o path/to/output
❯ ./runtime cli -i path/to/input -o path/to/output
Usage: simple-runtime cli [OPTIONS] --input <INPUT> --output <OUTPUT>
Options:
-i, --input <INPUT>
Input file or directory
-v, --verbose...
Verbose mode (-v, -vv, -vvv)
--max-batch-size-ocr <MAX_BATCH_SIZE_OCR>
maximum batch size for ocr [default: 16]
-o, --output <OUTPUT>
Output directory
-c, --config <CONFIG>
Optional config file
--max-batch-size-upscaler <MAX_BATCH_SIZE_UPSCALER>
maximum batch size for upscaler [default: 2]
--overwrite
Overwrite already translated images
-h, --help
Print help
Usage: simple-runtime [OPTIONS] <COMMAND>
Commands:
cli Run the image translation CLI
api Run in API server mode
ui Run the UI
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose...
Verbose mode (-v, -vv, -vvv)
--max-batch-size-ocr <MAX_BATCH_SIZE_OCR>
maximum batch size for ocr [default: 16]
--max-batch-size-upscaler <MAX_BATCH_SIZE_UPSCALER>
maximum batch size for upscaler [default: 2]
-h, --help
Print help
-V, --version
Print version
Only
- coreml
- cuda
- cpu
- tensorrt
- (rocm) needs compile from source with --features "rocm"
is supported right now. For AMD support look at how to enable rocm for onnxruntime or maybe ZLUDA
Python Renderer Usage
The runtime allows to export the processed image, before the text is rendered. This output can be used with the original Renderer from the python project.
After running the runtime you can run the Python renderer script.
Install
# Setup virtual environment
python3 -m venv venv && source venv/bin/activate
# Install dependencies
pip install numpy Pillow git+https://github.com/frederik-uni/manga-image-translator.git@renderer-module#subdirectory=pip-modules/mit-renderer
# Install Python renderer script
curl -O https://raw.githubusercontent.com/frederik-uni/manga-image-translator-rust/master/scripts/python-render.py
# Download fonts
REPO="zyddnys/manga-image-translator"; FOLDER="fonts"; BRANCH="main"; mkdir -p "$FOLDER"; curl -s "https://api.github.com/repos/$REPO/contents/$FOLDER?ref=$BRANCH" | jq -r '.[] | select(.type=="file") | .download_url' | while read -r url; do fname=$(basename "$url"); fname=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$fname'))"); curl -L "$url" -o "$FOLDER/$fname"; done
Usage
❯ ./python-render.py -i path/to/input.mit.bin -o path/to/output.png
usage: python-render.py [-h] -i INPUT -o OUTPUT
[--renderer {Renderer.default,Renderer.manga2Eng,Renderer.manga2EngPillow}]
[--font-path FONT_PATH] [--line_spacing LINE_SPACING] [--no_hyphenation]
[--font_size FONT_SIZE] [--font_size_offset FONT_SIZE_OFFSET]
[--font_size_minimum FONT_SIZE_MINIMUM]
Modules
This project is composed of modular components.
It is just giving additional information about which models were used and which papers its based on. For information about the possible config values use vscode or any other editor that supports json schema. The possible completions will be listed on hover. Also see Example Config.
Short descriptions and links to subpages:
Detectors
See Detectors.
OCRs
See OCRs.
Upscalers
See Upscalers.
Inpainters
See Inpainters.
Translators
See Translators.
Detectors
OCRs
Upscaler
| Model | Paper | Train | Source |
|---|---|---|---|
| ESRGAN | arxiv | Docs | Github |
| Waifu2x | arxiv | Docs | Github Maintained GitHub Original |
| Anime4k | GitHub |
Inpainter
Translators
| Model | Paper | Train | Source |
|---|---|---|---|
| m2m100 | arxiv | Fairseq | Hugging Face Github |
| mbart | arxiv | Fairseq | Hugging Face Github |
| nllb | arxiv | Fairseq | Hugging Face GitHub |
| sugoi | / | Fairseq | Blog Patreon |
| jparacrawl | arxiv aclanthology aclanthology | Fairseq | HomePage |
| qwen2 | / | / | Blog Hugging Face Github |
CPP Dependencies
Roadmap
-
detectors
- dbnet
- ctd
- paddle
- dbnet_convnext
- yolo5,
- ysg
-
craft
- ocr
-
inpainter
- color
- lama_aot
- lama_large
- lama_mpe
- sd
- patchmatch
-
colorizer
- none
- mc2
- renderer
-
upscaler
- anime4k
- waifu2x
- esrgan
-
translator
- baidu
- caiyun
- m2m100
- mbart
- nllb
- none
- original
- papgo
- sugoi
- jparacrawl
- youdao
- deepl
- qwen2
- chatgpt
- groq
- deepseek
- gemini
- sakrua
- cleanup code
- more tests(100% test coverage)
- more benchmarks
- optimize code
- [~] error handling
- replace clipper2
-
replace opencv -
ci
- cargo build
- gh publish
- cargo test
- cargo fmt
- cargo clippy
- cargo doc
- cargo tarpaulin
-
pyo3 publish
- macos arm64
- macos x86_64
- linux x86_64
- linux arm64
- windows x86_64
- windows arm64(no prebuild clang)
-
windows x86
Build
Preperation All
Install rust with rustup
Install cuda 12.9
Install cudnn
Preparation Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y cuda-12-9 cudnn9
sudo apt-get install -y pkg-config libssl-dev libopencv-dev clang libclang-dev libfontconfig-dev
Preperation MacOS
brew install llvm opencv
# Old macs only
brew install openssl@3
# Run this on every terminal session(not actually required for debug builds/only release builds)
export OPENCV_LINK_LIBS=+static=opencv_core,static=opencv_imgproc,static=opencv_calib3d,static=libtegra_hal,tbb,static=ittnotify,framework=OpenCL,z
Preperation Windows
choco install opencv llvm
$env:OPENCV_LINK_LIBS = $libName # opencv_world*.lib. Its the only .lib file in the C:\tools\opencv if you use the prebuilts
$env:OPENCV_LINK_PATHS = $libPath # the parent folder of the opencv_world*.lib file. maybe "C:\tools\opencv\build\x64\vc16\lib"
$env:OPENCV_INCLUDE_PATHS = $includePath # most likely "C:\tools\opencv\build\include"
$env:Path = "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path
$env:Path = "C:\Program Files\NVIDIA\CUDNN\v9.13\bin\12.9;" + $env:Path
or permanent
[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_LIBS", "opencv_world4110", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_PATHS", "C:\tools\opencv\build\x64\vc16\lib", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_INCLUDE_PATHS", "C:\tools\opencv\build\include", "User")
[System.Environment]::SetEnvironmentVariable("Path", "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path, "User")
Path to long error 1 Path to long error 2
Quick Start
git clone https://github.com/frederik-uni/manga-image-translator-rust --recursive
cargo r -p simple-runtime -- -i in -o out
Deploy
When releasing the application these files need to be included:
- (cuda/cudnn)
- opencv
- onnxruntime exectuion providers
- main binary
Binary Data Structure Version 2
Note:
- All numbers are little-endian.
nindicates a previously read length.?means variable size (compute from other definitions).
Export
| Size | Type | Description |
|---|---|---|
| 9 | _ | unknown/reserved |
| 4 | uint | version |
| ? | Image | embedded Image |
| ? | Image | overlay Image |
| 8 | uint | number of patches |
| ?×n | TextBlock | n blocks |
Image
| Size | Type | Description |
|---|---|---|
| 2 | uint | width |
| 2 | uint | height |
| 1 | bool | raw |
| 8 | uint | buffer length |
| n | bytes | buffer |
4PTS (4 Points)
| Size | Type | Description |
|---|---|---|
| 64 | 4×[int,int] | 4 (x, y) coordinates |
TextBlock
| Size | Type | Description |
|---|---|---|
| 8 | uint | font size |
| 8 | float | angle |
| 8 | float | probability |
| 1 | _ | unknown/reserved |
| 1 | bool | fg_color available |
| 0|1 | uint | fg_r (if available) |
| 0|1 | uint | fg_g (if available) |
| 0|1 | uint | fg_b (if available) |
| 1 | bool | bg_color available |
| 0|1 | uint | bg_r (if available) |
| 0|1 | uint | bg_g (if available) |
| 0|1 | uint | bg_b (if available) |
| 8 | uint | original text length |
| n | bytes | original text |
| 8 | uint | 4PTS count |
| n×64 | 4PTS | 4PTS data |