Legiblate
Use AI to guess a book's genre from its cover!
In 2020 I started the course "Deep Learning for Coders" from fast.ai. As one of my projects, I had to train an image classifier (fine-tune a Resnet model with fastai's wrapper around PyTorch, specifically).
I wanted to use a dataset that no one had tried before, so I acquired the book cover images from the paper Judging a Book by its Cover (2017). Citing the paper here:
@misc{iwana2017judging,
title={Judging a Book By its Cover},
author={Brian Kenji Iwana and Syed Tahseen Raza Rizvi and Sheraz Ahmed and Andreas Dengel and Seiichi Uchida},
year={2017},
eprint={1610.09204},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
The authors achieved 24.7% accuracy. As we'll see below, mine achieved 33.7% accuracy -- not amazing, but SOTA for the time. I called this book cover judger "Legiblate", for unknown reasons.
I later decided to upload Legiblate to Heroku as a Streamlit app. I had to retrain a smaller resnet34 model for that, as the resnet50 version wouldn't fit on a free dynamo. That model had 32.8% accuracy and you can find the Legiblate app here. Note that if the site has not been accessed recently it will take a few seconds to load, as the free dynamo boots up.
Unfortunately for code history, I did not record the process of cleaning and collating the data into cleaned.csv
. That file has 56,977 lines and associates each image with one of 30 classes. It looks like this:
name,label
1438005687.jpg,Test Preparation
0060750715.jpg,Biographies & Memoirs
1580237959.jpg,Religion & Spirituality
0135137829.jpg,Arts & Photography
0312556411.jpg,Literature & Fiction
0393339157.jpg,Engineering & Transportation
0521456924.jpg,Science & Math
0898699223.jpg,Christian Books & Bibles
0545700272.jpg,Children's Books
1616494441.jpg,Self-Help
In the following cells I make my imports and gather the dataset.
from fastai import *
from fastai.vision.all import *
import ipywidgets as widgets
from pathlib import Path
data_folder = Path('/home/mage/Hacking/datasets/book-dataset/images/224x224')
imgs = get_image_files(data_folder)
imgs
ds = ImageDataLoaders.from_csv('.',
csv_fname='cleaned.csv',
folder='/home/mage/Hacking/datasets/book-dataset/images/224x224/',
batch_tfms=aug_transforms(),
seed=42
)
ds.show_batch()
After checking that the DataBatch looks right (i.e., the images match their classes), I instantiate a learner with the pretrained resnet model, and fine-tune for 4 epochs. Fastai v2 takes care of finding the learning rate automatically, a significant improvement over v1.
learn = cnn_learner(ds, resnet50, metrics=accuracy)
learn.fine_tune(4)
Training on such a large dataset took a while (5 minutes per epoch), but this was before I knew to train on a sample dataset. Luckily the loss converged pretty well at 33%.
Next I explore the predictions, to see which classes and samples are harder for the net to classify.
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(15,15))
interp.plot_top_losses(9)