There are too many different standard book numbering system. Therefore, we need a new universal standard book numbering system...

But right now what I need is a list of ISBNs for book lists from different eras. I opened all these up on the work computer and then stupidly closed them and went home; should have just done htis here entirely. Basically I want a list for prehistory-1700, 1700-1900, 1900-1950, 1950-1990, 1990-2010 and 2010-2021. I don't know if I'll get equal amounts from each but this distribution matches roughly the goodbooks-10k numbers and my inutuitive sense of what people like to buy. It also plays into the idea of a narrowing of canon over time, as I wrote about in Canon Curation.

I will need the ranked fiction canon, for the book titles, and my large books dataset for getting their ISBN by title. There might be more than one ISBN, but I think I can grab whichever edition has the highest number of reviews and assume that's the one to get (if a buyer wants a less popular edition, they can figure that out). While I'm doing that I might as well grab the same type of data that I did for graphing goodbooks-10k, so I don't have to repeat all these steps later if I want it.

import pandas as pd
large_df = pd.read_csv('../../records/cleaned_goodreads_books.csv')
large_df.tail()
Unnamed: 0 Unnamed: 0.1 Unnamed: 0.1.1 isbn text_reviews_count series language_code popular_shelves asin average_rating ... publication_month publication_year url image_url book_id ratings_count work_id title top_genre author_name
1215978 1215978 1215978 2360645 0689852959 1.0 [] NaN [{'count': '22', 'name': 'to-read'}, {'count':... NaN 4.36 ... 9.0 2002.0 https://www.goodreads.com/book/show/331839.Jac... https://s.gr-assets.com/assets/nophoto/book/11... 331839 18.0 25313618.0 Jacqueline Kennedy Onassis: Friend of the Arts biography Beatrice Gormley
1215979 1215979 1215979 2360647 0373126476 9.0 [] NaN [{'count': '78', 'name': 'to-read'}, {'count':... NaN 3.42 ... 7.0 2007.0 https://www.goodreads.com/book/show/2685097-th... https://s.gr-assets.com/assets/nophoto/book/11... 2685097 112.0 2710420.0 The Spaniard's Blackmailed Bride harlequin Trish Morey
1215980 1215980 1215980 2360651 178092870X 2.0 [] eng [{'count': '702', 'name': 'to-read'}, {'count'... NaN 3.50 ... 8.0 2015.0 https://www.goodreads.com/book/show/26168430-s... https://images.gr-assets.com/books/1440592011m... 26168430 6.0 46130263.0 Sherlock Holmes and the July Crisis mystery Arthur Conan Doyle
1215981 1215981 1215981 2360652 0765197456 6.0 [] NaN [{'count': '37', 'name': 'to-read'}, {'count':... NaN 4.00 ... 8.0 1996.0 https://www.goodreads.com/book/show/2342551.Th... https://s.gr-assets.com/assets/nophoto/book/11... 2342551 36.0 2349247.0 The Children's Classic Poetry Collection poetry Nicola Baxter
1215982 1215982 1215982 2360653 162378140X 17.0 ['658195'] eng [{'count': '56', 'name': 'to-read'}, {'count':... NaN 4.37 ... 4.0 2014.0 https://www.goodreads.com/book/show/22017381-1... https://images.gr-assets.com/books/1398621236m... 22017381 70.0 41332799.0 101 Nights: Volume One (101 Nights, #1-3) erotica S.E. Reign

5 rows × 28 columns

ranked_df = pd.read_csv('../assets/2021-07-27-found-canon.csv')
ranked_df
Unnamed: 0 Title Listed count Author
0 1 Ulysses 51.0 Joyce, James
1 2 The Great Gatsby 50.0 F. Scott Fitzgerald
2 3 One Hundred Years of Solitude 44.0 Gabriel Garcia Marquez
3 4 Lolita 43.0 Vladimir Nabokov
4 5 Nineteen Eighty Four 42.0 Orwell, George
... ... ... ... ...
4409 4410 Decline of the West 1.0 O
4410 4411 The History of the Standard Oil Company 1.0 I
4411 4412 Theory of Games and Economic Behavior 1.0 J
4412 4413 AA Big Book 1.0 B
4413 4414 Behaviorism 1.0 J

4414 rows × 4 columns

Ooh, looking at the end of this dataframe it looks like my method for extracting Author information got messed up somehow. Better just delete that column and reconstruct it from the data in large_df anyway.

no_authors = ranked_df.drop(columns='Author')
no_authors
Unnamed: 0 Title Listed count
0 1 Ulysses 51.0
1 2 The Great Gatsby 50.0
2 3 One Hundred Years of Solitude 44.0
3 4 Lolita 43.0
4 5 Nineteen Eighty Four 42.0
... ... ... ...
4409 4410 Decline of the West 1.0
4410 4411 The History of the Standard Oil Company 1.0
4411 4412 Theory of Games and Economic Behavior 1.0
4412 4413 AA Big Book 1.0
4413 4414 Behaviorism 1.0

4414 rows × 3 columns