Introduction
Welcome to the documentation for manga-image-translator-rust.
This book describes how to build, run and deploy the project, explains available modules, features, config options, examples, and developer docs.
Please read at least until 6. Python Renderer Usage
before creating any issues
Quick Start
- Install cuda 12.9/cudnn7
- Download Release
- delete cudnn* file
./mit-runtime -i in -o out
For more details see installation and Usage.
Installation
Binaries are available here for windows, linux and MacOs for arm64 and x86_64
For faster execution, it is recommended to install CUDA and cuDNN.
Install cuda 12.9
Install cudnn
If you use cuda delete the cudnn*
file in the downloaded folder.
Otherwise, delete the onnxruntime cuda execution provider
Usage
❯ cargo r -p simple-runtime -- -i path/to/input -o path/to/output
❯ ./runtime -i path/to/input -o path/to/output
Options:
-i, --input <INPUT> Input file or directory
-o, --output <OUTPUT> Output directory
-c, --config <CONFIG> Optional config file
-v, --verbose... Verbose mode (-v, -vv, -vvv)
--overwrite Overwrite already translated images
-h, --help Print help
-V, --version Print version
Only
- coreml
- cuda
- cpu
- tensorrt
is supported right now. For AMD support look at how to enable rocm for onnxruntime or maybe ZLUDA
Python Renderer Usage
The runtime allows to export the processed image, before the text is rendered. This output can be used with the original Renderer from the python project.
After running the runtime you can run the Python renderer script.
Install
# Setup virtual environment
python3 -m venv venv && source venv/bin/activate
# Install dependencies
pip install numpy Pillow git+https://github.com/frederik-uni/manga-image-translator.git@renderer-module#subdirectory=pip-modules/mit-renderer
# Install Python renderer script
curl -O https://raw.githubusercontent.com/frederik-uni/manga-image-translator-rust/master/scripts/python-render.py
# Download fonts
REPO="zyddnys/manga-image-translator"; FOLDER="fonts"; BRANCH="main"; mkdir -p "$FOLDER"; curl -s "https://api.github.com/repos/$REPO/contents/$FOLDER?ref=$BRANCH" | jq -r '.[] | select(.type=="file") | .download_url' | while read -r url; do fname=$(basename "$url"); fname=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$fname'))"); curl -L "$url" -o "$FOLDER/$fname"; done
Usage
❯ ./python-render.py -i path/to/input.mit.bin -o path/to/output.png
usage: python-render.py [-h] -i INPUT -o OUTPUT
[--renderer {Renderer.default,Renderer.manga2Eng,Renderer.manga2EngPillow}]
[--font-path FONT_PATH] [--line_spacing LINE_SPACING] [--no_hyphenation]
[--font_size FONT_SIZE] [--font_size_offset FONT_SIZE_OFFSET]
[--font_size_minimum FONT_SIZE_MINIMUM]
Modules
This project is composed of modular components. Short descriptions and links to subpages:
Detectors
See Detectors.
OCRs
See OCRs.
Upscalers
See Upscalers.
Inpainters
See Inpainters.
Translators
See Translators.
Detectors
OCRs
Upscaler
Model | Paper | Train | Source |
---|---|---|---|
ESRGAN | arxiv | Docs | Github |
Waifu2x | arxiv | Docs | Github Maintained GitHub Original |
Anime4k | GitHub |
Inpainter
Translators
Model | Paper | Train | Source |
---|---|---|---|
m2m100 | arxiv | Fairseq | Hugging Face Github |
mbart | arxiv | Fairseq | Hugging Face Github |
nllb | arxiv | Fairseq | Hugging Face GitHub |
sugoi | / | Fairseq | Blog Patreon |
jparacrawl | arxiv aclanthology aclanthology | Fairseq | HomePage |
qwen2 | / | / | Blog Hugging Face Github |
CPP Dependencies
Roadmap
-
detectors
- dbnet
- ctd
- paddle
- dbnet_convnext
- yolo5,
- ysg
-
craft
- ocr
-
inpainter
- color
- lama_aot
- lama_large
- lama_mpe
- sd
- patchmatch
-
colorizer
- none
- mc2
- renderer
-
upscaler
- anime4k
- waifu2x
- esrgan
-
translator
- baidu
- caiyun
- m2m100
- mbart
- nllb
- none
- original
- papgo
- sugoi
- jparacrawl
- youdao
- deepl
- qwen2
- chatgpt
- groq
- deepseek
- gemini
- sakrua
- cleanup code
- more tests(100% test coverage)
- more benchmarks
- optimize code
- [~] error handling
- replace clipper2
-
replace opencv -
ci
- cargo build
- gh publish
- cargo test
- cargo fmt
- cargo clippy
- cargo doc
- cargo tarpaulin
-
pyo3 publish
- macos arm64
- macos x86_64
- linux x86_64
- linux arm64
- windows x86_64
- windows arm64(no prebuild clang)
-
windows x86
Build
Preperation All
Install rust with rustup
Install cuda 12.9
Install cudnn
Preparation Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y cuda-12-9 cudnn9
sudo apt-get install -y pkg-config libssl-dev libopencv-dev clang libclang-dev libfontconfig-dev
Preperation MacOS
brew install llvm opencv
# Old macs only
brew install openssl@3
# Run this on every terminal session(not actually required for debug builds/only release builds)
export OPENCV_LINK_LIBS=+static=opencv_core,static=opencv_imgproc,static=opencv_calib3d,static=libtegra_hal,tbb,static=ittnotify,framework=OpenCL,z
Preperation Windows
choco install opencv llvm
$env:OPENCV_LINK_LIBS = $libName # opencv_world*.lib. Its the only .lib file in the C:\tools\opencv if you use the prebuilts
$env:OPENCV_LINK_PATHS = $libPath # the parent folder of the opencv_world*.lib file. maybe "C:\tools\opencv\build\x64\vc16\lib"
$env:OPENCV_INCLUDE_PATHS = $includePath # most likely "C:\tools\opencv\build\include"
$env:Path = "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path
$env:Path = "C:\Program Files\NVIDIA\CUDNN\v9.13\bin\12.9;" + $env:Path
or permanent
[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_LIBS", "opencv_world4110", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_PATHS", "C:\tools\opencv\build\x64\vc16\lib", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_INCLUDE_PATHS", "C:\tools\opencv\build\include", "User")
[System.Environment]::SetEnvironmentVariable("Path", "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path, "User")
Path to long error 1 Path to long error 2
Quick Start
git clone https://github.com/frederik-uni/manga-image-translator-rust --recursive
cargo r -p simple-runtime -- -i in -o out
Deploy
When releasing the application these files need to be included:
- (cuda/cudnn)
- opencv
- onnxruntime exectuion providers
- main binary
Binary Data Structure Version 1
Note:
- All numbers are little-endian.
n
indicates a previously read length.?
means variable size (compute from other definitions).
Export
Size | Type | Description |
---|---|---|
9 | _ | unknown/reserved |
4 | uint | version |
? | Image | embedded Image |
8 | uint | number of patches |
?×n | Patch | n patches |
Image
Size | Type | Description |
---|---|---|
2 | uint | width |
2 | uint | height |
1 | bool | raw |
8 | uint | buffer length |
n | bytes | buffer |
4PTS (4 Points)
Size | Type | Description |
---|---|---|
64 | 4×[int,int] | 4 (x, y) coordinates |
TextBlock
Size | Type | Description |
---|---|---|
8 | uint | font size |
8 | float | angle |
8 | float | probability |
1 | _ | unknown/reserved |
1 | bool | fg_color available |
0|1 | uint | fg_r (if available) |
0|1 | uint | fg_g (if available) |
0|1 | uint | fg_b (if available) |
1 | bool | bg_color available |
0|1 | uint | bg_r (if available) |
0|1 | uint | bg_g (if available) |
0|1 | uint | bg_b (if available) |
8 | uint | original text length |
n | bytes | original text |
8 | uint | 4PTS count |
n×64 | 4PTS | 4PTS data |
Patch
Size | Type | Description |
---|---|---|
8 | float | x |
8 | float | y |
? | Image | embedded Image |
? | TextBlock | embedded TextBlock |