I trained a neural net to detect speech bubbles in One Punch Man (and little bit of Garou from part 1), in an effort reduce the amount of work that scanlator has to do when a new chapter comes out.
I'm happy to report that the new neural net is now smart enough to distinguish Saitama's head from a bubble.
So much has improved since. My new network is trained on almost 4000 pages of OPM (vs ~200 pages from last post) up to the Orochi being defeated by Saitama chapter. Testing the network on new chapters yielded 100% accuracy with 1 false positive (95% recall) in detecting bubbles on the Atomic Samurai vs Black Sperm chapter. This chapter was not in the training data.
What's most exciting is that I've also been working on an editor based upon the speech bubble detector neural net to automate speech bubble translation with one click
The speech bubble neural net proposes bubbles. Then do some image processing and extract the bubble via an custom algorithm. The text is extracted and sent through an OCR neural net. The text is then sent to Google Translate. Lastly typeset the translated text via a custom line breaking algorithm. This process is completely automated.
As you can see machine translation is still mostly garbage these days. Until machine translation AI can understand context, this is not really useful. Ok, maybe just for the hardcore OPM fans who smashes the f5 button when a new RAW chapter is about to come out.
Ultimately, I built the tool to eliminate the need of human typesetters, so a human can just focus on translation. Just to give you a taste of what my tool can do, it can detect all speech bubbles, erase them, and typeset any text you input to fit the shape of the bubble. Simply being able to automate erasing bubbles will save typesetters a ton of time.
Black Sperm's face was the 1 false positive detection. There was no Black Sperm in the training data. This is easily fixable by including him in the training data.
Here's a demonstration of the automated typesetter. The typesetter will recognize the oval of the bubble and auto-fit text to fit the bubble. It is able to intelligently infer things like appropriate font size, number of lines, and when to hyphenate. The right image is a human typesetter for comparison.
Here's another look at the hyphenation. It's works, but not great. There are much to be improved.
The long term goal is improving bubble detection accuracy and expanding the neural net data set beyond OPM. This will allow the tool to work on literally any manga.
There are a lot of things the neural net and editor cannot do yet such as detecting overlay text and font customization. These are things I hope to eventually add.
The next goal is to release this as web application. The reason for this is accessibility. The desktop app requires a bunch special software and hardware, which are hard to setup and can get really expensive. So having this run on the cloud will allow anyone to access the system.
That's all for this update! If you like what I'm doing and want to support me you can buy my new game, Trap Labs. Shameless plug I know. It just came out on iOS and Android few days ago. It is an action puzzle based off bounds from StarCraft. It features cross-device/platform online play and a hat that suspiciously look like Watchdog Man, which you can only get after beating the final boss. Trust me, this game is much more challenging than it looks.
Seriously, your support would mean that I can continue to make these amazing software.
I'm also the author of one of the most popular comic book readers on Windows 10, Comics++
You can also support me through donation here.