Quick recap: I've been building an AI tool that cleans, translates, and typesets manga speech bubbles. It started as an effort to read one of my favorite manga, One Punch Man, as soon as a chapter was released. I made significant progress after creating more than 4000 pages of training data from OPM to the train the AI. Today I present a version of the tool that is vastly improved in almost every way.

Full page cleaning, translate, and typeset

Text recognition and translation that was gibberish last year are now almost fully comprehensible. It's not top-notch writing, but what's important is that one can understand what's going on in the page. It's so good sometimes that I can't tell it was done by a machine. Below are examples of typesetting and translation 100% done by machine, no human intervention was involved.

Here's a full demonstration of full-page automation with an OPM page. The result is a fully readable page with human level typesetting in ~20secs.

DBS worked really well. Minor name issues like sensu beans become fairy bean, and Goku becomes Wukong (Chinese monkey king with same name). But it's all readable.

Here are some text dense pages with somewhat awkward translations, but fully comprehensible.

This is an extremely text dense page with 20 bubbles that took about 1 min to fully process.

Some more random samples. Notice that there are some funny translations like sound effect that also came with a dictionary definition.

The AI had trouble recognizing Kanji heavy bubbles. I left the bubbles as is when the translation confidence is too low.

Manual typesetting

I've built an editor as well so that the user can further improve typesetting aesthetics. It was designed such that the user only needs to make minor adjustments while the machine does 90% of the work. The also provides the option of just downloading the cleared bubbles image if the user prefers manual typesetting. It was designed to save the user's time regardless of their workflow. Here are two pages done under 40 sec only needed minor adjustments.

Here's an example of a page done completely manual with 3 different fonts.

Here's an example of machine assisted translation plus manual typesetting.

Where I'm going from here

Pretty much every part of the app can be further improved. Better typesetting aesthetic, better text recognition, higher bubble detection rate, more languages, performance improvements, and even more abstract things like auto font and auto redraws (yes, this is possible now with neural nets).

I work a busy full-time job, so I've only been able to work on my app for about one day a week over the past two years. I'd love to spend more time on this so I can further improve the app and allow everyone to read their favorite manga that much quicker. My app is currently only open to a trusted few. I was told I should start a Patreon to cover some of the cost of development and speed up the overall progress. So I did! If you like what you see and would like to support my efforts, you can now become a Patron! With your support, I will also be able to open this app for more people to test which will ultimately lead to a public release. Patrons will receive early app development updates and participate in private development discussion on Discord.

Patreon - Discord

One more thing: I will not support anyone using this tool for illegal purposes. The last thing I want is for some big publisher to shut me down before I can even release the app.

Blog Comments powered by Disqus.

Previous Post