# Minimum Distortion Embedding on book data

I have collected different types of data on books, most recently Amazon sales rank, and explored direct plots of those variables in Sales data exploration. Now I want to use minimum distortion embedding, or MDE, to project those multi-dimensional variables into a 2D embedding space, and plot that.

### MDE

Doing the embedding requires a subset of data that is purely numerical, so let's grab thos ecolumns and covnert to an `ndarray`

. We'll use the same slice of the dataset from the previous slice, for consistency. We can scale it up to the full dataset later.

MDE also wants no NaN values, so we can use `.dropna()`

to drop those. Only a fraction of the dataset has those missing values, so hopefully won't affect it too much.

```
import pandas as pd
import altair as alt
sorted_corner_df = pd.read_csv('../../records/sorted_corner_df.csv')
numbers_df = sorted_corner_df[['publication_year', 'num_pages', 'average_rating', 'text_reviews_count', 'rank']].dropna()
len(numbers_df)
```

```
numbers_array = numbers_df.to_numpy()
```

```
import pymde
%matplotlib inline
```

```
mde = pymde.preserve_neighbors(numbers_array, verbose=True)
embedding = mde.embed(verbose=True)
```

```
pymde.plot(embedding, color_by=numbers_df['rank'])
```

A weird shape! Finally!

These data points should be the same as the blue-purple graph from my last post (`average_rating`

vs `rank`

), but arranged by the constraints of each numerical feature. Let's compare the two.

The following chart uses a common `brush`

selection for two graphs with different encoded axes. That means if you select points on one side, you should be able to see the same points light up on the other.

```
numbers_df[['mde_x', 'mde_y']] = embedding.tolist()
embed_df = numbers_df.merge(sorted_corner_df)
```

```
brush = alt.selection_interval()
chart1 = alt.Chart(embed_df).mark_point(filled=True).encode(
size = alt.Size('ratings_count',
scale=alt.Scale(type='log', range=[.075,75]),
),
color=alt.condition(
brush,
alt.Color('text_reviews_count', type='quantitative',
scale=alt.Scale(type='log', scheme='bluepurple')
),
alt.value('lightgray')),
opacity = alt.OpacityValue(0.6),
tooltip = ['title_gr', 'author_name', 'publication_year', 'category', 'rank', 'text_reviews_count', 'average_rating', 'ratings_count']
).properties(
width = 420,
height = 340
).add_selection(
brush
)
chart1.encode(x = 'mde_x', y = 'mde_y') & chart1.encode(x = alt.X('rank', scale=alt.Scale( domain=[-10, 23000], reverse=True), ), y = alt.Y('average_rating', scale=alt.Scale(type='linear', domain=[3.5,4.7])))
```