## Rationale

I have collated Goodreads review data with Amazon sales rankings. Now I would like to see what type of insights can be drawn from the interaction of these variables.

## Explore the data

My plotting library here is Altair. It interops with my website framework, `fastpages`, to print interactive charts in the reader's browser. Unfortunately, doing this can have a heavy load on the browser, and so Altair limits the maximum data points in a graph to 5,000.

I have other options for creating a huge graph of 500,000 books, but for now I'll take slices of the full dataframe and get an idea of their shape and texture of the data.

```import pandas as pd
import altair as alt
```
```full_df = pd.read_csv(f'../../records/books-with-salesrank.csv', low_memory=False).drop(columns='Unnamed: 0')
```

## Highly-reviewed books

Because there are so many books in here, I'm going to sort them by `text_reviews_count`, and take the most-reviewed slice to start. That way I should be able to recognize some of the titles.

```sorted_df = full_df.sort_values(by='text_reviews_count', ascending=False)

plotting_df = sorted_df.set_index(o for o in range(len(sorted_df))).loc[:4999]
```

The following code sets up a chart in Altair. Like I said, I'm using this library because it works by default with `fastpages`. But it's a quite nice tool, actually. It's a declarative plotting tool, rather than an imperative one. You tell it which variables to encode to which visual information channels, and it figures out the details.

```plotting_chart = alt.Chart(plotting_df).mark_point().encode(
x = alt.X('rank',
scale=alt.Scale(type='log', reverse=True),
),
y = alt.Y('text_reviews_count',
scale=alt.Scale(type='log'),
),
color = alt.Color('average_rating',
scale=alt.Scale(scheme='yellowgreenblue')),
tooltip = ['title_gr', 'author_name', 'publication_year', 'category', 'rank']
).properties(
width = 640,
height = 640
).interactive()

# Drawing some lines at the mean of each scale, to make quadrants
x_line = alt.Chart(pd.DataFrame({'x': [int(plotting_df['rank'].mean())]})).mark_rule().encode(x='x')
y_line = alt.Chart(pd.DataFrame({'y': [int(plotting_df['text_reviews_count'].mean())]})).mark_rule().encode(y='y')

plotting_chart + x_line + y_line
```