In 2020 I started the course "Deep Learning for Coders" from fast.ai. As one of my projects, I had to train an image classifier (fine-tune a Resnet model with fastai's wrapper around PyTorch, specifically).

I wanted to use a dataset that no one had tried before, so I acquired the book cover images from the paper Judging a Book by its Cover (2017). Citing the paper here:

@misc{iwana2017judging,
      title={Judging a Book By its Cover}, 
      author={Brian Kenji Iwana and Syed Tahseen Raza Rizvi and Sheraz Ahmed and Andreas Dengel and Seiichi Uchida},
      year={2017},
      eprint={1610.09204},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

The authors achieved 24.7% accuracy. As we'll see below, mine achieved 33.7% accuracy -- not amazing, but SOTA for the time. I called this book cover judger "Legiblate", for unknown reasons.

I later decided to upload Legiblate to Heroku as a Streamlit app. I had to retrain a smaller resnet34 model for that, as the resnet50 version wouldn't fit on a free dynamo. That model had 32.8% accuracy and you can find the Legiblate app here. Note that if the site has not been accessed recently it will take a few seconds to load, as the free dynamo boots up.

Unfortunately for code history, I did not record the process of cleaning and collating the data into cleaned.csv. That file has 56,977 lines and associates each image with one of 30 classes. It looks like this:

name,label
1438005687.jpg,Test Preparation
0060750715.jpg,Biographies & Memoirs
1580237959.jpg,Religion & Spirituality
0135137829.jpg,Arts & Photography
0312556411.jpg,Literature & Fiction
0393339157.jpg,Engineering & Transportation
0521456924.jpg,Science & Math
0898699223.jpg,Christian Books & Bibles
0545700272.jpg,Children's Books
1616494441.jpg,Self-Help

In the following cells I make my imports and gather the dataset.

from fastai import *
from fastai.vision.all import *
import ipywidgets as widgets
from pathlib import Path
data_folder = Path('/home/mage/Hacking/datasets/book-dataset/images/224x224')
imgs = get_image_files(data_folder)
imgs
(#57000) [Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/0323311482.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/1931257698.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/0764961888.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/1554947596.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/0939460009.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/0393330753.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/1598571621.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/0452258286.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/8498167868.jpg'),Path('/home/mage/Hacking/datasets/book-dataset/images/224x224/1593306970.jpg')...]
ds = ImageDataLoaders.from_csv('.', 
                                csv_fname='cleaned.csv',
                                folder='/home/mage/Hacking/datasets/book-dataset/images/224x224/',
                                batch_tfms=aug_transforms(),
                                seed=42
                              )
ds.show_batch()

After checking that the DataBatch looks right (i.e., the images match their classes), I instantiate a learner with the pretrained resnet model, and fine-tune for 4 epochs. Fastai v2 takes care of finding the learning rate automatically, a significant improvement over v1.

learn = cnn_learner(ds, resnet50, metrics=accuracy)
learn.fine_tune(4)
epoch train_loss valid_loss accuracy time
0 3.083939 2.753545 0.238262 04:15
epoch train_loss valid_loss accuracy time
0 2.667974 2.641955 0.268100 05:43
1 2.541682 2.470613 0.308644 05:43
2 2.242933 2.367178 0.331988 05:43
3 1.960798 2.373091 0.337429 05:43

Training on such a large dataset took a while (5 minutes per epoch), but this was before I knew to train on a sample dataset. Luckily the loss converged pretty well at 33%.

Next I explore the predictions, to see which classes and samples are harder for the net to classify.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(15,15))
interp.plot_top_losses(9)