Introduction

Welcome to the documentation for manga-image-translator-rust.

This book describes how to build, run and deploy the project, explains available modules, features, config options, examples, and developer docs.

Please read at least until 6. Python Renderer Usage before creating any issues

Quick Start

Install cuda 12.9/cudnn7
Download Release
delete cudnn* file

./mit-runtime -i in -o out

For more details see installation and Usage.

Installation

Binaries are available here for windows, linux and MacOs for arm64 and x86_64

For faster execution, it is recommended to install CUDA and cuDNN.

Install cuda 12.9

Install cudnn

If you use cuda delete the cudnn* file in the downloaded folder. Otherwise, delete the onnxruntime cuda execution provider

Linux only

cd path/to/folder
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$(pwd)" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$(pwd)" >> ~/.zshrc

MacOS only

cd path/to/folder
echo "export DYLD_LIBRARY_PATH=\$DYLD_LIBRARY_PATH:$(pwd)" >> ~/.bashrc
echo "export DYLD_LIBRARY_PATH=\$DYLD_LIBRARY_PATH:$(pwd)" >> ~/.zshrc

Usage

❯ cargo r -p simple-runtime -- cli -i path/to/input -o path/to/output
❯ ./runtime cli -i path/to/input -o path/to/output

Usage: simple-runtime cli [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
  -i, --input <INPUT>
          Input file or directory
  -v, --verbose...
          Verbose mode (-v, -vv, -vvv)
      --max-batch-size-ocr <MAX_BATCH_SIZE_OCR>
          maximum batch size for ocr [default: 16]
  -o, --output <OUTPUT>
          Output directory
  -c, --config <CONFIG>
          Optional config file
      --max-batch-size-upscaler <MAX_BATCH_SIZE_UPSCALER>
          maximum batch size for upscaler [default: 2]
      --overwrite
          Overwrite already translated images
  -h, --help
          Print help


Usage: simple-runtime [OPTIONS] <COMMAND>

Commands:
  cli   Run the image translation CLI
  api   Run in API server mode
  ui    Run the UI
  help  Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose...
          Verbose mode (-v, -vv, -vvv)
      --max-batch-size-ocr <MAX_BATCH_SIZE_OCR>
          maximum batch size for ocr [default: 16]
      --max-batch-size-upscaler <MAX_BATCH_SIZE_UPSCALER>
          maximum batch size for upscaler [default: 2]
  -h, --help
          Print help
  -V, --version
          Print version

Only

coreml
cuda
cpu
tensorrt
(rocm) needs compile from source with --features "rocm"

is supported right now. For AMD support look at how to enable rocm for onnxruntime or maybe ZLUDA

Python Renderer Usage

The runtime allows to export the processed image, before the text is rendered. This output can be used with the original Renderer from the python project.

After running the runtime you can run the Python renderer script.

Install

# Setup virtual environment
python3 -m venv venv && source venv/bin/activate

# Install dependencies
pip install numpy Pillow git+https://github.com/frederik-uni/manga-image-translator.git@renderer-module#subdirectory=pip-modules/mit-renderer

# Install Python renderer script
curl -O https://raw.githubusercontent.com/frederik-uni/manga-image-translator-rust/master/scripts/python-render.py

# Download fonts
REPO="zyddnys/manga-image-translator"; FOLDER="fonts"; BRANCH="main"; mkdir -p "$FOLDER"; curl -s "https://api.github.com/repos/$REPO/contents/$FOLDER?ref=$BRANCH" | jq -r '.[] | select(.type=="file") | .download_url' | while read -r url; do fname=$(basename "$url"); fname=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$fname'))"); curl -L "$url" -o "$FOLDER/$fname"; done

Usage

❯ ./python-render.py -i path/to/input.mit.bin -o path/to/output.png
usage: python-render.py [-h] -i INPUT -o OUTPUT
                        [--renderer {Renderer.default,Renderer.manga2Eng,Renderer.manga2EngPillow}]
                        [--font-path FONT_PATH] [--line_spacing LINE_SPACING] [--no_hyphenation]
                        [--font_size FONT_SIZE] [--font_size_offset FONT_SIZE_OFFSET]
                        [--font_size_minimum FONT_SIZE_MINIMUM]

Modules

This project is composed of modular components.

It is just giving additional information about which models were used and which papers its based on. For information about the possible config values use vscode or any other editor that supports json schema. The possible completions will be listed on hover. Also see Example Config.

Short descriptions and links to subpages:

Detectors

Model	Paper	Train	Source
dbnet	ARXIV ARXIV	/	GitHub
ctd	/	/	GitHub
dbnet_convnext	/	/	GitHub
Paddle	/	Docs	GitHub

OCRs

Model	Paper	Train	Source
manga-ocr	/	Docs	GitHub

Upscaler

Model	Paper	Train	Source
ESRGAN	arxiv	Docs	Github
Waifu2x	arxiv	Docs	Github Maintained GitHub Original
Anime4k			GitHub

Inpainter

Model	Paper	Source
Lama AOT
Lama Large	arxiv arxiv	GitHub HomePage GitHub
Lama MPE	arxiv	GitHub

Translators

Model	Paper	Train	Source
m2m100	arxiv	Fairseq	Hugging Face Github
mbart	arxiv	Fairseq	Hugging Face Github
nllb	arxiv	Fairseq	Hugging Face GitHub
sugoi	/	Fairseq	Blog Patreon
jparacrawl	arxiv aclanthology aclanthology	Fairseq	HomePage
qwen2	/	/	Blog Hugging Face Github

CPP Dependencies

Roadmap

detectors
- dbnet
- ctd
- paddle
- dbnet_convnext
- yolo5,
- ysg
- ~~craft~~
ocr
- windows
- macos
- tesseract
- ~~oneocr~~
- paddle
- 32px
- 48px
- 48px_ctc
- lens_proto
- mocr
  - greedy
  - beam
inpainter
- color
- lama_aot
- lama_large
- lama_mpe
- sd
- patchmatch
colorizer
- none
- mc2
renderer
- struct
- gimp
- pdf
- psd
- html
- [~] png/jpeg/qoi
upscaler
- anime4k
- waifu2x
- esrgan
translator
- baidu
- caiyun
- google
- m2m100
- mbart
- nllb
- none
- original
- papgo
- sugoi
- jparacrawl
- youdao
- deepl
- qwen2
- chatgpt
- groq
- deepseek
- gemini
- sakrua
cleanup code
more tests(100% test coverage)
more benchmarks
optimize code
[~] error handling
replace clipper2
~~replace opencv~~
ci
- cargo build
- gh publish
- cargo test
- cargo fmt
- cargo clippy
- cargo doc
- cargo tarpaulin
- pyo3 publish
  - macos arm64
  - macos x86_64
  - linux x86_64
  - linux arm64
  - windows x86_64
  - windows arm64(no prebuild clang)
  - ~~windows x86~~

Build

Preperation All

Install rust with rustup

Install cuda 12.9

Install cudnn

Preparation Ubuntu/Debian

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y cuda-12-9 cudnn9

sudo apt-get install -y pkg-config libssl-dev libopencv-dev clang libclang-dev libfontconfig-dev

Preperation MacOS

brew install llvm opencv
# Old macs only
brew install openssl@3
# Run this on every terminal session(not actually required for debug builds/only release builds)
export OPENCV_LINK_LIBS=+static=opencv_core,static=opencv_imgproc,static=opencv_calib3d,static=libtegra_hal,tbb,static=ittnotify,framework=OpenCL,z

Preperation Windows

choco install opencv llvm

$env:OPENCV_LINK_LIBS = $libName # opencv_world*.lib. Its the only .lib file in the C:\tools\opencv if you use the prebuilts
$env:OPENCV_LINK_PATHS = $libPath # the parent folder of the opencv_world*.lib file. maybe "C:\tools\opencv\build\x64\vc16\lib"
$env:OPENCV_INCLUDE_PATHS = $includePath # most likely "C:\tools\opencv\build\include"
$env:Path = "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path
$env:Path = "C:\Program Files\NVIDIA\CUDNN\v9.13\bin\12.9;" + $env:Path

or permanent

[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_LIBS", "opencv_world4110", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_LINK_PATHS", "C:\tools\opencv\build\x64\vc16\lib", "User")
[System.Environment]::SetEnvironmentVariable("OPENCV_INCLUDE_PATHS", "C:\tools\opencv\build\include", "User")
[System.Environment]::SetEnvironmentVariable("Path", "C:\tools\opencv\build\x64\vc16\bin;" + $env:Path, "User")

Path to long error 1 Path to long error 2

Quick Start

git clone https://github.com/frederik-uni/manga-image-translator-rust --recursive

cargo r -p simple-runtime -- -i in -o out

Deploy

When releasing the application these files need to be included:

(cuda/cudnn)
opencv
onnxruntime exectuion providers
main binary

Binary Data Structure Version 2

Note:

All numbers are little-endian.

n indicates a previously read length.

? means variable size (compute from other definitions).

Export

Size	Type	Description
9	_	unknown/reserved
4	`uint`	version
?	Image	embedded Image
?	Image	overlay Image
8	`uint`	number of patches
?×n	TextBlock	n blocks

Image

Size	Type	Description
2	`uint`	width
2	`uint`	height
1	`bool`	raw
8	`uint`	buffer length
n	bytes	buffer

4PTS (4 Points)

Size	Type	Description
64	4×[int,int]	4 (x, y) coordinates

TextBlock

Size	Type	Description
8	`uint`	font size
8	`float`	angle
8	`float`	probability
1	_	unknown/reserved
1	`bool`	fg_color available
0\|1	`uint`	fg_r (if available)
0\|1	`uint`	fg_g (if available)
0\|1	`uint`	fg_b (if available)
1	`bool`	bg_color available
0\|1	`uint`	bg_r (if available)
0\|1	`uint`	bg_g (if available)
0\|1	`uint`	bg_b (if available)
8	`uint`	original text length
n	bytes	original text
8	`uint`	4PTS count
n×64	4PTS	4PTS data

Manga Image Translator Docs

Introduction

Quick Start

Installation

Linux only

MacOS only

Usage

Python Renderer Usage

Install

Usage

Modules

Detectors

OCRs

Upscalers

Inpainters

Translators

Detectors

OCRs

Upscaler

Inpainter

Translators

CPP Dependencies

Roadmap

Build

Preperation All

Preparation Ubuntu/Debian

Preperation MacOS

Preperation Windows

Quick Start

Deploy

Binary Data Structure Version 2

Export

Image

4PTS (4 Points)

TextBlock

Keyboard shortcuts

Manga Image Translator Docs