Overview

Dataset statistics

Number of variables16
Number of observations533227
Missing cells909687
Missing cells (%)10.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory65.1 MiB
Average record size in memory128.0 B

Variable types

Categorical10
Numeric6

Warnings

isbn has a high cardinality: 532698 distinct values High cardinality
asin has a high cardinality: 513 distinct values High cardinality
title_gr has a high cardinality: 510621 distinct values High cardinality
author_name has a high cardinality: 237836 distinct values High cardinality
top_genre has a high cardinality: 36431 distinct values High cardinality
publisher has a high cardinality: 45434 distinct values High cardinality
format has a high cardinality: 348 distinct values High cardinality
description has a high cardinality: 464101 distinct values High cardinality
title_az has a high cardinality: 519053 distinct values High cardinality
category has a high cardinality: 609 distinct values High cardinality
ratings_count is highly correlated with text_reviews_countHigh correlation
text_reviews_count is highly correlated with ratings_countHigh correlation
ratings_count is highly correlated with text_reviews_countHigh correlation
text_reviews_count is highly correlated with ratings_countHigh correlation
ratings_count is highly correlated with text_reviews_countHigh correlation
text_reviews_count is highly correlated with ratings_countHigh correlation
ratings_count is highly correlated with text_reviews_countHigh correlation
text_reviews_count is highly correlated with ratings_countHigh correlation
asin has 532702 (99.9%) missing values Missing
publisher has 70023 (13.1%) missing values Missing
publication_year has 66131 (12.4%) missing values Missing
format has 70276 (13.2%) missing values Missing
num_pages has 85148 (16.0%) missing values Missing
description has 65982 (12.4%) missing values Missing
category has 18598 (3.5%) missing values Missing
publication_year is highly skewed (γ1 = 302.0827798) Skewed
ratings_count is highly skewed (γ1 = 153.7407412) Skewed
text_reviews_count is highly skewed (γ1 = 75.53875794) Skewed
isbn is uniformly distributed Uniform
asin is uniformly distributed Uniform
title_gr is uniformly distributed Uniform
description is uniformly distributed Uniform
title_az is uniformly distributed Uniform

Reproduction

Analysis started2021-09-16 14:35:08.042264
Analysis finished2021-09-16 14:37:13.661574
Duration2 minutes and 5.62 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

isbn
Categorical

HIGH CARDINALITY
UNIFORM

Distinct532698
Distinct (%)> 99.9%
Missing525
Missing (%)0.1%
Memory size4.1 MiB
B00005VPKU
 
2
B00005XHOV
 
2
B00005WKI6
 
2
B0000CIKV8
 
2
1605983683
 
1
Other values (532693)
532693 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5327020
Distinct characters36
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique532694 ?
Unique (%)> 99.9%

Sample

1st row0312853122
2nd row0850308712
3rd row0425040887
4th row1934876569
5th row0922915113

Common Values

ValueCountFrequency (%)
B00005VPKU2
 
< 0.1%
B00005XHOV2
 
< 0.1%
B00005WKI62
 
< 0.1%
B0000CIKV82
 
< 0.1%
16059836831
 
< 0.1%
076421280X1
 
< 0.1%
01424220371
 
< 0.1%
16126201081
 
< 0.1%
04252471981
 
< 0.1%
04401203061
 
< 0.1%
Other values (532688)532688
99.9%
(Missing)525
 
0.1%

Length

2021-09-16T08:37:13.908265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b00005wki62
 
< 0.1%
b0000cikv82
 
< 0.1%
b00005vpku2
 
< 0.1%
b00005xhov2
 
< 0.1%
07278714471
 
< 0.1%
01953676851
 
< 0.1%
04410097511
 
< 0.1%
16104876641
 
< 0.1%
14454932921
 
< 0.1%
03123638341
 
< 0.1%
Other values (532688)532688
> 99.9%

Most occurring characters

ValueCountFrequency (%)
0813818
15.3%
1696286
13.1%
4501516
9.4%
5501006
9.4%
3484621
9.1%
8464353
8.7%
9460786
8.6%
7455473
8.6%
6451466
8.5%
2450720
8.5%
Other values (26)46975
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5280045
99.1%
Uppercase Letter46975
 
0.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
X46793
99.6%
B40
 
0.1%
V11
 
< 0.1%
W9
 
< 0.1%
K9
 
< 0.1%
U9
 
< 0.1%
I8
 
< 0.1%
D7
 
< 0.1%
O7
 
< 0.1%
E7
 
< 0.1%
Other values (16)75
 
0.2%
Decimal Number
ValueCountFrequency (%)
0813818
15.4%
1696286
13.2%
4501516
9.5%
5501006
9.5%
3484621
9.2%
8464353
8.8%
9460786
8.7%
7455473
8.6%
6451466
8.6%
2450720
8.5%

Most occurring scripts

ValueCountFrequency (%)
Common5280045
99.1%
Latin46975
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
X46793
99.6%
B40
 
0.1%
V11
 
< 0.1%
W9
 
< 0.1%
K9
 
< 0.1%
U9
 
< 0.1%
I8
 
< 0.1%
D7
 
< 0.1%
O7
 
< 0.1%
E7
 
< 0.1%
Other values (16)75
 
0.2%
Common
ValueCountFrequency (%)
0813818
15.4%
1696286
13.2%
4501516
9.5%
5501006
9.5%
3484621
9.2%
8464353
8.8%
9460786
8.7%
7455473
8.6%
6451466
8.6%
2450720
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII5327020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0813818
15.3%
1696286
13.1%
4501516
9.4%
5501006
9.4%
3484621
9.1%
8464353
8.7%
9460786
8.6%
7455473
8.6%
6451466
8.5%
2450720
8.5%
Other values (26)46975
 
0.9%

asin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct513
Distinct (%)97.7%
Missing532702
Missing (%)99.9%
Memory size4.1 MiB
B00005X8YJ
 
2
B0000E9M76
 
2
B0000E94BF
 
2
B0000CJ9AH
 
2
B00005WJ1K
 
2
Other values (508)
515 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5250
Distinct characters36
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique501 ?
Unique (%)95.4%

Sample

1st rowB0006BUGVG
2nd row1492208213
3rd rowB003YL4KIA
4th rowB003FPYZCG
5th rowB0006EUM12

Common Values

ValueCountFrequency (%)
B00005X8YJ2
 
< 0.1%
B0000E9M762
 
< 0.1%
B0000E94BF2
 
< 0.1%
B0000CJ9AH2
 
< 0.1%
B00005WJ1K2
 
< 0.1%
B0000CL59T2
 
< 0.1%
B00005XWPY2
 
< 0.1%
B0000CIN9Z2
 
< 0.1%
B0000CJS4X2
 
< 0.1%
B00005VQ4S2
 
< 0.1%
Other values (503)505
 
0.1%
(Missing)532702
99.9%

Length

2021-09-16T08:37:14.206631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b0000cjs4x2
 
0.4%
b0000cl59t2
 
0.4%
b0000cin9z2
 
0.4%
b0000e94bf2
 
0.4%
b0000e9m762
 
0.4%
b0000cp0hs2
 
0.4%
b00005xwpy2
 
0.4%
b00005wj1k2
 
0.4%
b00005vq4s2
 
0.4%
b0000cj9ah2
 
0.4%
Other values (503)505
96.2%

Most occurring characters

ValueCountFrequency (%)
01287
24.5%
B493
 
9.4%
6250
 
4.8%
1250
 
4.8%
4197
 
3.8%
5196
 
3.7%
7188
 
3.6%
9161
 
3.1%
2148
 
2.8%
8142
 
2.7%
Other values (26)1938
36.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2947
56.1%
Uppercase Letter2303
43.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B493
21.4%
Q95
 
4.1%
M93
 
4.0%
W93
 
4.0%
C90
 
3.9%
O90
 
3.9%
A86
 
3.7%
Y83
 
3.6%
G82
 
3.6%
E81
 
3.5%
Other values (16)1017
44.2%
Decimal Number
ValueCountFrequency (%)
01287
43.7%
6250
 
8.5%
1250
 
8.5%
4197
 
6.7%
5196
 
6.7%
7188
 
6.4%
9161
 
5.5%
2148
 
5.0%
8142
 
4.8%
3128
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common2947
56.1%
Latin2303
43.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
B493
21.4%
Q95
 
4.1%
M93
 
4.0%
W93
 
4.0%
C90
 
3.9%
O90
 
3.9%
A86
 
3.7%
Y83
 
3.6%
G82
 
3.6%
E81
 
3.5%
Other values (16)1017
44.2%
Common
ValueCountFrequency (%)
01287
43.7%
6250
 
8.5%
1250
 
8.5%
4197
 
6.7%
5196
 
6.7%
7188
 
6.4%
9161
 
5.5%
2148
 
5.0%
8142
 
4.8%
3128
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01287
24.5%
B493
 
9.4%
6250
 
4.8%
1250
 
4.8%
4197
 
3.8%
5196
 
3.7%
7188
 
3.6%
9161
 
3.1%
2148
 
2.8%
8142
 
2.7%
Other values (26)1938
36.9%

title_gr
Categorical

HIGH CARDINALITY
UNIFORM

Distinct510621
Distinct (%)95.8%
Missing2
Missing (%)< 0.1%
Memory size4.1 MiB
Selected Poems
 
100
Collected Poems
 
43
Beauty and the Beast
 
25
Redemption
 
24
Broken
 
22
Other values (510616)
533011 

Length

Max length255
Median length34
Mean length39.51431572
Min length1

Characters and Unicode

Total characters21070021
Distinct characters492
Distinct categories22 ?
Distinct scripts14 ?
Distinct blocks27 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique496891 ?
Unique (%)93.2%

Sample

1st rowW.C. Fields: A Life on Film
2nd rowRunic Astrology: Starcraft and Timekeeping in the Northern Tradition
3rd rowThe Wanting of Levine
4th rowAll's Fairy in Love and War (Avalon: Web of Magic, #8)
5th rowThe Devil's Notebook

Common Values

ValueCountFrequency (%)
Selected Poems100
 
< 0.1%
Collected Poems43
 
< 0.1%
Beauty and the Beast25
 
< 0.1%
Redemption24
 
< 0.1%
Broken22
 
< 0.1%
Coming Home21
 
< 0.1%
Beowulf21
 
< 0.1%
Cinderella20
 
< 0.1%
Dracula19
 
< 0.1%
The Complete Poems19
 
< 0.1%
Other values (510611)532911
99.9%

Length

2021-09-16T08:37:14.564995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the297676
 
8.5%
of145105
 
4.2%
and94442
 
2.7%
a88615
 
2.5%
to55990
 
1.6%
in45683
 
1.3%
for32884
 
0.9%
131546
 
0.9%
221809
 
0.6%
21730
 
0.6%
Other values (130155)2654037
76.1%

Most occurring characters

ValueCountFrequency (%)
2960017
 
14.0%
e1919904
 
9.1%
o1290532
 
6.1%
a1175451
 
5.6%
i1145070
 
5.4%
n1108567
 
5.3%
r1079983
 
5.1%
t1065000
 
5.1%
s880432
 
4.2%
h661411
 
3.1%
Other values (482)7783654
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14248584
67.6%
Space Separator2960030
 
14.0%
Uppercase Letter2818295
 
13.4%
Other Punctuation586621
 
2.8%
Decimal Number231766
 
1.1%
Open Punctuation87668
 
0.4%
Close Punctuation87399
 
0.4%
Dash Punctuation46420
 
0.2%
Final Punctuation1033
 
< 0.1%
Math Symbol994
 
< 0.1%
Other values (12)1211
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
ا23
 
5.0%
م14
 
3.1%
ر14
 
3.1%
ل13
 
2.8%
ی12
 
2.6%
ن11
 
2.4%
ت10
 
2.2%
9
 
2.0%
9
 
2.0%
و9
 
2.0%
Other values (184)333
72.9%
Lowercase Letter
ValueCountFrequency (%)
e1919904
13.5%
o1290532
 
9.1%
a1175451
 
8.2%
i1145070
 
8.0%
n1108567
 
7.8%
r1079983
 
7.6%
t1065000
 
7.5%
s880432
 
6.2%
h661411
 
4.6%
l645544
 
4.5%
Other values (116)3276690
23.0%
Uppercase Letter
ValueCountFrequency (%)
T324401
 
11.5%
S253693
 
9.0%
A209115
 
7.4%
C198412
 
7.0%
M173257
 
6.1%
B162170
 
5.8%
W140361
 
5.0%
P136512
 
4.8%
D131113
 
4.7%
H125695
 
4.5%
Other values (50)963566
34.2%
Other Punctuation
ValueCountFrequency (%)
:209873
35.8%
,154857
26.4%
#78889
 
13.4%
'67900
 
11.6%
.33627
 
5.7%
&12729
 
2.2%
!11879
 
2.0%
?6485
 
1.1%
/4870
 
0.8%
"2578
 
0.4%
Other values (13)2934
 
0.5%
Nonspacing Mark
ValueCountFrequency (%)
́18
52.9%
3
 
8.8%
ָ2
 
5.9%
̄2
 
5.9%
̌2
 
5.9%
ֿ1
 
2.9%
̉1
 
2.9%
̈1
 
2.9%
̃1
 
2.9%
1
 
2.9%
Other values (2)2
 
5.9%
Decimal Number
ValueCountFrequency (%)
169063
29.8%
237482
16.2%
027898
12.0%
323777
 
10.3%
516191
 
7.0%
415681
 
6.8%
913412
 
5.8%
610523
 
4.5%
89204
 
4.0%
78535
 
3.7%
Math Symbol
ValueCountFrequency (%)
+745
74.9%
=140
 
14.1%
~60
 
6.0%
>17
 
1.7%
|16
 
1.6%
<10
 
1.0%
3
 
0.3%
1
 
0.1%
÷1
 
0.1%
±1
 
0.1%
Other Symbol
ValueCountFrequency (%)
®265
74.0%
40
 
11.2%
29
 
8.1%
14
 
3.9%
©5
 
1.4%
3
 
0.8%
°1
 
0.3%
1
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
-45737
98.5%
451
 
1.0%
225
 
0.5%
5
 
< 0.1%
1
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$186
94.4%
¥3
 
1.5%
3
 
1.5%
¤2
 
1.0%
¢2
 
1.0%
£1
 
0.5%
Other Number
ValueCountFrequency (%)
½12
37.5%
³8
25.0%
²6
18.8%
¹5
15.6%
¼1
 
3.1%
Modifier Letter
ValueCountFrequency (%)
ʻ7
36.8%
ʼ5
26.3%
3
15.8%
ʿ3
15.8%
ʾ1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
(86749
99.0%
[911
 
1.0%
{7
 
< 0.1%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
)86483
99.0%
]906
 
1.0%
}9
 
< 0.1%
1
 
< 0.1%
Format
ValueCountFrequency (%)
3
37.5%
­3
37.5%
1
 
12.5%
1
 
12.5%
Spacing Mark
ValueCountFrequency (%)
3
42.9%
2
28.6%
ி1
 
14.3%
1
 
14.3%
Modifier Symbol
ValueCountFrequency (%)
`15
83.3%
´2
 
11.1%
^1
 
5.6%
Space Separator
ValueCountFrequency (%)
2960017
> 99.9%
 13
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
1016
98.4%
17
 
1.6%
Initial Punctuation
ValueCountFrequency (%)
19
55.9%
15
44.1%
Connector Punctuation
ValueCountFrequency (%)
_44
100.0%
Control
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17066754
81.0%
Common4002644
 
19.0%
Arabic184
 
< 0.1%
Han116
 
< 0.1%
Cyrillic113
 
< 0.1%
Hiragana41
 
< 0.1%
Katakana36
 
< 0.1%
Hebrew35
 
< 0.1%
Hangul28
 
< 0.1%
Inherited26
 
< 0.1%
Other values (4)44
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1919904
 
11.2%
o1290532
 
7.6%
a1175451
 
6.9%
i1145070
 
6.7%
n1108567
 
6.5%
r1079983
 
6.3%
t1065000
 
6.2%
s880432
 
5.2%
h661411
 
3.9%
l645544
 
3.8%
Other values (133)6094860
35.7%
Common
ValueCountFrequency (%)
2960017
74.0%
:209873
 
5.2%
,154857
 
3.9%
(86749
 
2.2%
)86483
 
2.2%
#78889
 
2.0%
169063
 
1.7%
'67900
 
1.7%
-45737
 
1.1%
237482
 
0.9%
Other values (86)205594
 
5.1%
Han
ValueCountFrequency (%)
9
 
7.8%
9
 
7.8%
5
 
4.3%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.6%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (68)72
62.1%
Cyrillic
ValueCountFrequency (%)
о14
 
12.4%
е12
 
10.6%
а11
 
9.7%
к8
 
7.1%
с7
 
6.2%
н6
 
5.3%
у5
 
4.4%
р4
 
3.5%
т4
 
3.5%
в3
 
2.7%
Other values (23)39
34.5%
Arabic
ValueCountFrequency (%)
ا23
 
12.5%
م14
 
7.6%
ر14
 
7.6%
ل13
 
7.1%
ی12
 
6.5%
ن11
 
6.0%
ت10
 
5.4%
و9
 
4.9%
ه8
 
4.3%
س6
 
3.3%
Other values (21)64
34.8%
Hiragana
ValueCountFrequency (%)
5
 
12.2%
3
 
7.3%
3
 
7.3%
3
 
7.3%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
Other values (13)15
36.6%
Hebrew
ValueCountFrequency (%)
י4
 
11.4%
ת4
 
11.4%
ל3
 
8.6%
א2
 
5.7%
ָ2
 
5.7%
ש2
 
5.7%
ב2
 
5.7%
ד2
 
5.7%
ו2
 
5.7%
ז1
 
2.9%
Other values (11)11
31.4%
Katakana
ValueCountFrequency (%)
5
13.9%
3
 
8.3%
3
 
8.3%
3
 
8.3%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
Other values (10)10
27.8%
Greek
ValueCountFrequency (%)
π3
21.4%
Ι2
14.3%
Ο1
 
7.1%
ρ1
 
7.1%
έ1
 
7.1%
σ1
 
7.1%
τ1
 
7.1%
ε1
 
7.1%
ι1
 
7.1%
α1
 
7.1%
Tamil
ValueCountFrequency (%)
3
17.6%
3
17.6%
3
17.6%
2
11.8%
1
 
5.9%
1
 
5.9%
ி1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Hangul
ValueCountFrequency (%)
5
17.9%
5
17.9%
5
17.9%
5
17.9%
5
17.9%
1
 
3.6%
1
 
3.6%
1
 
3.6%
Inherited
ValueCountFrequency (%)
́18
69.2%
̄2
 
7.7%
̌2
 
7.7%
̉1
 
3.8%
̈1
 
3.8%
̃1
 
3.8%
̂1
 
3.8%
Devanagari
ValueCountFrequency (%)
2
28.6%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Lao
ValueCountFrequency (%)
2
33.3%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII21064315
> 99.9%
Latin 1 Sup2927
 
< 0.1%
Punctuation1858
 
< 0.1%
Arabic180
 
< 0.1%
Latin Ext A150
 
< 0.1%
CJK116
 
< 0.1%
Cyrillic113
 
< 0.1%
Hiragana41
 
< 0.1%
Letterlike Symbols40
 
< 0.1%
Katakana39
 
< 0.1%
Other values (17)242
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2960017
 
14.1%
e1919904
 
9.1%
o1290532
 
6.1%
a1175451
 
5.6%
i1145070
 
5.4%
n1108567
 
5.3%
r1079983
 
5.1%
t1065000
 
5.1%
s880432
 
4.2%
h661411
 
3.1%
Other values (86)7777948
36.9%
Latin 1 Sup
ValueCountFrequency (%)
é812
27.7%
®265
 
9.1%
á252
 
8.6%
í237
 
8.1%
ó205
 
7.0%
ñ154
 
5.3%
è135
 
4.6%
ö93
 
3.2%
ü81
 
2.8%
ë69
 
2.4%
Other values (55)624
21.3%
Punctuation
ValueCountFrequency (%)
1016
54.7%
451
24.3%
225
 
12.1%
51
 
2.7%
29
 
1.6%
19
 
1.0%
17
 
0.9%
15
 
0.8%
13
 
0.7%
10
 
0.5%
Other values (6)12
 
0.6%
CJK
ValueCountFrequency (%)
9
 
7.8%
9
 
7.8%
5
 
4.3%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.6%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (68)72
62.1%
Latin Ext A
ValueCountFrequency (%)
ō28
18.7%
ā27
18.0%
Ō19
12.7%
ū18
12.0%
ī11
 
7.3%
œ7
 
4.7%
ć6
 
4.0%
ž5
 
3.3%
ă5
 
3.3%
Ž3
 
2.0%
Other values (16)21
14.0%
Diacriticals
ValueCountFrequency (%)
́18
69.2%
̄2
 
7.7%
̌2
 
7.7%
̉1
 
3.8%
̈1
 
3.8%
̃1
 
3.8%
̂1
 
3.8%
Letterlike Symbols
ValueCountFrequency (%)
40
100.0%
Arabic
ValueCountFrequency (%)
ا23
 
12.8%
م14
 
7.8%
ر14
 
7.8%
ل13
 
7.2%
ی12
 
6.7%
ن11
 
6.1%
ت10
 
5.6%
و9
 
5.0%
ه8
 
4.4%
س6
 
3.3%
Other values (20)60
33.3%
None
ValueCountFrequency (%)
π3
17.6%
Ι2
11.8%
Ο1
 
5.9%
ρ1
 
5.9%
έ1
 
5.9%
σ1
 
5.9%
τ1
 
5.9%
ε1
 
5.9%
ι1
 
5.9%
α1
 
5.9%
Other values (4)4
23.5%
Specials
ValueCountFrequency (%)
29
100.0%
Katakana
ValueCountFrequency (%)
5
12.8%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
2
 
5.1%
2
 
5.1%
2
 
5.1%
2
 
5.1%
2
 
5.1%
Other values (11)12
30.8%
Hiragana
ValueCountFrequency (%)
5
 
12.2%
3
 
7.3%
3
 
7.3%
3
 
7.3%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
2
 
4.9%
Other values (13)15
36.6%
Cyrillic
ValueCountFrequency (%)
о14
 
12.4%
е12
 
10.6%
а11
 
9.7%
к8
 
7.1%
с7
 
6.2%
н6
 
5.3%
у5
 
4.4%
р4
 
3.5%
т4
 
3.5%
в3
 
2.7%
Other values (23)39
34.5%
Arrows
ValueCountFrequency (%)
3
100.0%
Misc Symbols
ValueCountFrequency (%)
14
77.8%
3
 
16.7%
1
 
5.6%
Hebrew
ValueCountFrequency (%)
י4
 
11.4%
ת4
 
11.4%
ל3
 
8.6%
א2
 
5.7%
ָ2
 
5.7%
ש2
 
5.7%
ב2
 
5.7%
ד2
 
5.7%
ו2
 
5.7%
ז1
 
2.9%
Other values (11)11
31.4%
Latin Ext B
ValueCountFrequency (%)
ǐ4
50.0%
ǒ2
25.0%
ǰ1
 
12.5%
ǎ1
 
12.5%
Latin Ext Additional
ValueCountFrequency (%)
3
13.6%
3
13.6%
2
 
9.1%
2
 
9.1%
2
 
9.1%
1
 
4.5%
1
 
4.5%
1
 
4.5%
ế1
 
4.5%
1
 
4.5%
Other values (5)5
22.7%
Currency Symbols
ValueCountFrequency (%)
3
100.0%
Modifier Letters
ValueCountFrequency (%)
ʻ7
43.8%
ʼ5
31.2%
ʿ3
18.8%
ʾ1
 
6.2%
Hangul
ValueCountFrequency (%)
5
17.9%
5
17.9%
5
17.9%
5
17.9%
5
17.9%
1
 
3.6%
1
 
3.6%
1
 
3.6%
Arabic PF A
ValueCountFrequency (%)
4
100.0%
Devanagari
ValueCountFrequency (%)
2
28.6%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Math Operators
ValueCountFrequency (%)
1
100.0%
IPA Ext
ValueCountFrequency (%)
ə2
100.0%
Tamil
ValueCountFrequency (%)
3
17.6%
3
17.6%
3
17.6%
2
11.8%
1
 
5.9%
1
 
5.9%
ி1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Lao
ValueCountFrequency (%)
2
33.3%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

author_name
Categorical

HIGH CARDINALITY

Distinct237836
Distinct (%)44.6%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
Anonymous
 
712
Francine Pascal
 
453
R.L. Stine
 
383
Walt Disney Company
 
366
Ann M. Martin
 
239
Other values (237831)
531074 

Length

Max length100
Median length13
Mean length13.9286739
Min length1

Characters and Unicode

Total characters7427145
Distinct characters88
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique159646 ?
Unique (%)29.9%

Sample

1st rowRonald J. Fields
2nd rowNigel Pennick
3rd rowMichael Halberstam
4th rowRachel Roberts
5th rowAnton Szandor LaVey

Common Values

ValueCountFrequency (%)
Anonymous712
 
0.1%
Francine Pascal453
 
0.1%
R.L. Stine383
 
0.1%
Walt Disney Company366
 
0.1%
Ann M. Martin239
 
< 0.1%
Nora Roberts227
 
< 0.1%
Carolyn Keene220
 
< 0.1%
Brian Michael Bendis219
 
< 0.1%
Stan Lee215
 
< 0.1%
Jane Yolen205
 
< 0.1%
Other values (237826)529988
99.4%

Length

2021-09-16T08:37:14.930978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john10636
 
0.9%
david9844
 
0.8%
michael7800
 
0.7%
j7216
 
0.6%
james7047
 
0.6%
robert6936
 
0.6%
m5900
 
0.5%
a5874
 
0.5%
l5041
 
0.4%
richard4613
 
0.4%
Other values (99691)1111763
94.0%

Most occurring characters

ValueCountFrequency (%)
e681394
 
9.2%
676111
 
9.1%
a651495
 
8.8%
n518704
 
7.0%
r493385
 
6.6%
i438978
 
5.9%
o379053
 
5.1%
l360059
 
4.8%
t261216
 
3.5%
s258050
 
3.5%
Other values (78)2708700
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5409485
72.8%
Uppercase Letter1226516
 
16.5%
Space Separator676111
 
9.1%
Other Punctuation106426
 
1.4%
Dash Punctuation7861
 
0.1%
Decimal Number328
 
< 0.1%
Open Punctuation169
 
< 0.1%
Close Punctuation168
 
< 0.1%
Modifier Symbol61
 
< 0.1%
Math Symbol10
 
< 0.1%
Other values (3)10
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M113626
 
9.3%
S102732
 
8.4%
J95082
 
7.8%
C87606
 
7.1%
B80787
 
6.6%
A73254
 
6.0%
R70925
 
5.8%
D70795
 
5.8%
L69189
 
5.6%
K56466
 
4.6%
Other values (16)406054
33.1%
Lowercase Letter
ValueCountFrequency (%)
e681394
12.6%
a651495
12.0%
n518704
9.6%
r493385
9.1%
i438978
 
8.1%
o379053
 
7.0%
l360059
 
6.7%
t261216
 
4.8%
s258050
 
4.8%
h208125
 
3.8%
Other values (16)1159026
21.4%
Other Punctuation
ValueCountFrequency (%)
.101294
95.2%
'4062
 
3.8%
"339
 
0.3%
,306
 
0.3%
&245
 
0.2%
/39
 
< 0.1%
:36
 
< 0.1%
*33
 
< 0.1%
!28
 
< 0.1%
;27
 
< 0.1%
Other values (2)17
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
072
22.0%
759
18.0%
345
13.7%
537
11.3%
134
10.4%
227
 
8.2%
417
 
5.2%
914
 
4.3%
813
 
4.0%
610
 
3.0%
Math Symbol
ValueCountFrequency (%)
~7
70.0%
+2
 
20.0%
>1
 
10.0%
Open Punctuation
ValueCountFrequency (%)
(160
94.7%
[9
 
5.3%
Close Punctuation
ValueCountFrequency (%)
)159
94.6%
]9
 
5.4%
Modifier Symbol
ValueCountFrequency (%)
`59
96.7%
^2
 
3.3%
Space Separator
ValueCountFrequency (%)
676111
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7861
100.0%
Currency Symbol
ValueCountFrequency (%)
$4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6636001
89.3%
Common791144
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e681394
 
10.3%
a651495
 
9.8%
n518704
 
7.8%
r493385
 
7.4%
i438978
 
6.6%
o379053
 
5.7%
l360059
 
5.4%
t261216
 
3.9%
s258050
 
3.9%
h208125
 
3.1%
Other values (42)2385542
35.9%
Common
ValueCountFrequency (%)
676111
85.5%
.101294
 
12.8%
-7861
 
1.0%
'4062
 
0.5%
"339
 
< 0.1%
,306
 
< 0.1%
&245
 
< 0.1%
(160
 
< 0.1%
)159
 
< 0.1%
072
 
< 0.1%
Other values (26)535
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII7427145
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e681394
 
9.2%
676111
 
9.1%
a651495
 
8.8%
n518704
 
7.0%
r493385
 
6.6%
i438978
 
5.9%
o379053
 
5.1%
l360059
 
4.8%
t261216
 
3.5%
s258050
 
3.5%
Other values (78)2708700
36.5%

top_genre
Categorical

HIGH CARDINALITY

Distinct36431
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
fiction
 
23938
history
 
20885
mystery
 
18781
picture-books
 
17930
fantasy
 
17177
Other values (36426)
434516 

Length

Max length35
Median length8
Mean length9.304279416
Min length1

Characters and Unicode

Total characters4961293
Distinct characters260
Distinct categories6 ?
Distinct scripts10 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24846 ?
Unique (%)4.7%

Sample

1st rowp
2nd rowrunes
3rd rowread-gave-away
4th rowfantasy
5th rowoccult

Common Values

ValueCountFrequency (%)
fiction23938
 
4.5%
history20885
 
3.9%
mystery18781
 
3.5%
picture-books17930
 
3.4%
fantasy17177
 
3.2%
romance16881
 
3.2%
historical-fiction10529
 
2.0%
comics10393
 
1.9%
poetry9784
 
1.8%
young-adult9497
 
1.8%
Other values (36421)377432
70.8%

Length

2021-09-16T08:37:15.251490image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fiction23941
 
4.5%
history20885
 
3.9%
mystery18781
 
3.5%
picture-books17930
 
3.4%
fantasy17177
 
3.2%
romance16881
 
3.2%
historical-fiction10529
 
2.0%
comics10393
 
1.9%
poetry9784
 
1.8%
young-adult9497
 
1.8%
Other values (36365)377429
70.8%

Most occurring characters

ValueCountFrequency (%)
i458490
 
9.2%
o445678
 
9.0%
r368082
 
7.4%
e361096
 
7.3%
s349758
 
7.0%
t339992
 
6.9%
a334538
 
6.7%
n294160
 
5.9%
c275388
 
5.6%
-217399
 
4.4%
Other values (250)1516712
30.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4643035
93.6%
Dash Punctuation217399
 
4.4%
Uppercase Letter67556
 
1.4%
Decimal Number30683
 
0.6%
Other Letter1792
 
< 0.1%
Connector Punctuation828
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i458490
 
9.9%
o445678
 
9.6%
r368082
 
7.9%
e361096
 
7.8%
s349758
 
7.5%
t339992
 
7.3%
a334538
 
7.2%
n294160
 
6.3%
c275388
 
5.9%
l177287
 
3.8%
Other values (113)1238566
26.7%
Other Letter
ValueCountFrequency (%)
ا239
 
13.3%
ي168
 
9.4%
ر118
 
6.6%
ت117
 
6.5%
م107
 
6.0%
ل105
 
5.9%
و103
 
5.7%
ن79
 
4.4%
ب72
 
4.0%
ة67
 
3.7%
Other values (88)617
34.4%
Uppercase Letter
ValueCountFrequency (%)
N22507
33.3%
O9372
13.9%
F9372
13.9%
I9372
13.9%
C9372
13.9%
U3763
 
5.6%
K3763
 
5.6%
Б8
 
< 0.1%
Ø4
 
< 0.1%
Е2
 
< 0.1%
Other values (17)21
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
08305
27.1%
17532
24.5%
26697
21.8%
72107
 
6.9%
61307
 
4.3%
51239
 
4.0%
41136
 
3.7%
31070
 
3.5%
9691
 
2.3%
8599
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
-217399
100.0%
Connector Punctuation
ValueCountFrequency (%)
_828
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4710210
94.9%
Common248910
 
5.0%
Arabic1692
 
< 0.1%
Cyrillic275
 
< 0.1%
Greek99
 
< 0.1%
Han78
 
< 0.1%
Hangul14
 
< 0.1%
Armenian8
 
< 0.1%
Hebrew5
 
< 0.1%
Katakana2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i458490
 
9.7%
o445678
 
9.5%
r368082
 
7.8%
e361096
 
7.7%
s349758
 
7.4%
t339992
 
7.2%
a334538
 
7.1%
n294160
 
6.2%
c275388
 
5.8%
l177287
 
3.8%
Other values (69)1305741
27.7%
Arabic
ValueCountFrequency (%)
ا239
14.1%
ي168
 
9.9%
ر118
 
7.0%
ت117
 
6.9%
م107
 
6.3%
ل105
 
6.2%
و103
 
6.1%
ن79
 
4.7%
ب72
 
4.3%
ة67
 
4.0%
Other values (29)517
30.6%
Cyrillic
ValueCountFrequency (%)
а35
12.7%
и28
 
10.2%
о21
 
7.6%
т19
 
6.9%
е19
 
6.9%
к18
 
6.5%
р14
 
5.1%
л14
 
5.1%
в10
 
3.6%
н10
 
3.6%
Other values (28)87
31.6%
Han
ValueCountFrequency (%)
6
 
7.7%
5
 
6.4%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
Other values (27)35
44.9%
Greek
ValueCountFrequency (%)
α12
 
12.1%
ο9
 
9.1%
ι8
 
8.1%
τ7
 
7.1%
κ7
 
7.1%
ά6
 
6.1%
ν5
 
5.1%
λ5
 
5.1%
π4
 
4.0%
ρ4
 
4.0%
Other values (19)32
32.3%
Hangul
ValueCountFrequency (%)
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
굿1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Other values (4)4
28.6%
Common
ValueCountFrequency (%)
-217399
87.3%
08305
 
3.3%
17532
 
3.0%
26697
 
2.7%
72107
 
0.8%
61307
 
0.5%
51239
 
0.5%
41136
 
0.5%
31070
 
0.4%
_828
 
0.3%
Other values (2)1290
 
0.5%
Hebrew
ValueCountFrequency (%)
ע1
20.0%
ב1
20.0%
ר1
20.0%
י1
20.0%
ת1
20.0%
Armenian
ValueCountFrequency (%)
ա3
37.5%
կ2
25.0%
հ1
 
12.5%
յ1
 
12.5%
ն1
 
12.5%
Katakana
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4958636
99.9%
Arabic1691
 
< 0.1%
Latin 1 Sup439
 
< 0.1%
Cyrillic275
 
< 0.1%
None98
 
< 0.1%
CJK78
 
< 0.1%
Latin Ext A40
 
< 0.1%
Hangul13
 
< 0.1%
Armenian8
 
< 0.1%
Hebrew5
 
< 0.1%
Other values (6)10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i458490
 
9.2%
o445678
 
9.0%
r368082
 
7.4%
e361096
 
7.3%
s349758
 
7.1%
t339992
 
6.9%
a334538
 
6.7%
n294160
 
5.9%
c275388
 
5.6%
-217399
 
4.4%
Other values (35)1514055
30.5%
Latin 1 Sup
ValueCountFrequency (%)
á97
22.1%
é87
19.8%
ó53
12.1%
í27
 
6.2%
ç26
 
5.9%
ñ21
 
4.8%
è20
 
4.6%
ü15
 
3.4%
ö13
 
3.0%
ã12
 
2.7%
Other values (18)68
15.5%
Arabic
ValueCountFrequency (%)
ا239
14.1%
ي168
 
9.9%
ر118
 
7.0%
ت117
 
6.9%
م107
 
6.3%
ل105
 
6.2%
و103
 
6.1%
ن79
 
4.7%
ب72
 
4.3%
ة67
 
4.0%
Other values (28)516
30.5%
Latin Ext A
ValueCountFrequency (%)
ı13
32.5%
ė6
15.0%
ă4
 
10.0%
ğ3
 
7.5%
ż2
 
5.0%
ā2
 
5.0%
ł2
 
5.0%
ą1
 
2.5%
ş1
 
2.5%
Š1
 
2.5%
Other values (5)5
 
12.5%
Hebrew
ValueCountFrequency (%)
ע1
20.0%
ב1
20.0%
ר1
20.0%
י1
20.0%
ת1
20.0%
CJK
ValueCountFrequency (%)
6
 
7.7%
5
 
6.4%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
4
 
5.1%
Other values (27)35
44.9%
Cyrillic
ValueCountFrequency (%)
а35
12.7%
и28
 
10.2%
о21
 
7.6%
т19
 
6.9%
е19
 
6.9%
к18
 
6.5%
р14
 
5.1%
л14
 
5.1%
в10
 
3.6%
н10
 
3.6%
Other values (28)87
31.6%
Katakana
ValueCountFrequency (%)
1
50.0%
1
50.0%
None
ValueCountFrequency (%)
α12
 
12.2%
ο9
 
9.2%
ι8
 
8.2%
τ7
 
7.1%
κ7
 
7.1%
ά6
 
6.1%
ν5
 
5.1%
λ5
 
5.1%
π4
 
4.1%
ρ4
 
4.1%
Other values (18)31
31.6%
Arabic PF A
ValueCountFrequency (%)
1
100.0%
Armenian
ValueCountFrequency (%)
ա3
37.5%
կ2
25.0%
հ1
 
12.5%
յ1
 
12.5%
ն1
 
12.5%
Latin Ext B
ValueCountFrequency (%)
ț2
50.0%
ƒ2
50.0%
Hangul
ValueCountFrequency (%)
1
 
7.7%
1
 
7.7%
1
 
7.7%
굿1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%
1
 
7.7%
Other values (3)3
23.1%
Greek Ext
ValueCountFrequency (%)
1
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
Latin Ext Additional
ValueCountFrequency (%)
1
100.0%

publisher
Categorical

HIGH CARDINALITY
MISSING

Distinct45434
Distinct (%)9.8%
Missing70023
Missing (%)13.1%
Memory size4.1 MiB
Createspace Independent Publishing Platform
 
10793
Createspace
 
5052
Harlequin
 
4872
Oxford University Press, USA
 
3367
Berkley
 
2863
Other values (45429)
436257 

Length

Max length100
Median length15
Mean length16.82094282
Min length1

Characters and Unicode

Total characters7791528
Distinct characters91
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29515 ?
Unique (%)6.4%

Sample

1st rowSt. Martin's Press
2nd rowBerkley Publishing Group
3rd rowSeven Seas
4th rowFeral House
5th rowSimon & Schuster UK

Common Values

ValueCountFrequency (%)
Createspace Independent Publishing Platform10793
 
2.0%
Createspace5052
 
0.9%
Harlequin4872
 
0.9%
Oxford University Press, USA3367
 
0.6%
Berkley2863
 
0.5%
Marvel2834
 
0.5%
Penguin Books2723
 
0.5%
CreateSpace2575
 
0.5%
HarperCollins2546
 
0.5%
St. Martin's Press2322
 
0.4%
Other values (45424)423257
79.4%
(Missing)70023
 
13.1%

Length

2021-09-16T08:37:15.543288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
press83600
 
7.6%
books78366
 
7.1%
publishing52151
 
4.7%
university22599
 
2.1%
createspace20658
 
1.9%
16728
 
1.5%
independent12944
 
1.2%
platform12790
 
1.2%
house12362
 
1.1%
publishers11828
 
1.1%
Other values (27577)776858
70.6%

Most occurring characters

ValueCountFrequency (%)
e674363
 
8.7%
641739
 
8.2%
s589271
 
7.6%
o534297
 
6.9%
r517577
 
6.6%
i486629
 
6.2%
n467126
 
6.0%
a438320
 
5.6%
l309452
 
4.0%
t300085
 
3.9%
Other values (81)2832669
36.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5874479
75.4%
Uppercase Letter1169779
 
15.0%
Space Separator641739
 
8.2%
Other Punctuation77892
 
1.0%
Open Punctuation8593
 
0.1%
Close Punctuation8558
 
0.1%
Dash Punctuation6225
 
0.1%
Decimal Number3874
 
< 0.1%
Math Symbol366
 
< 0.1%
Connector Punctuation20
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P223828
19.1%
B132029
11.3%
C110944
 
9.5%
S85629
 
7.3%
H72363
 
6.2%
M56149
 
4.8%
A51068
 
4.4%
L44161
 
3.8%
I38911
 
3.3%
R38458
 
3.3%
Other values (16)316239
27.0%
Lowercase Letter
ValueCountFrequency (%)
e674363
11.5%
s589271
10.0%
o534297
 
9.1%
r517577
 
8.8%
i486629
 
8.3%
n467126
 
8.0%
a438320
 
7.5%
l309452
 
5.3%
t300085
 
5.1%
u209971
 
3.6%
Other values (16)1347388
22.9%
Other Punctuation
ValueCountFrequency (%)
.28630
36.8%
&16168
20.8%
,15277
19.6%
'12099
15.5%
/4500
 
5.8%
!529
 
0.7%
:345
 
0.4%
;122
 
0.2%
\81
 
0.1%
"72
 
0.1%
Other values (5)69
 
0.1%
Decimal Number
ValueCountFrequency (%)
2654
16.9%
1601
15.5%
0533
13.8%
4488
12.6%
7397
10.2%
3339
8.8%
5266
6.9%
9256
 
6.6%
8221
 
5.7%
6119
 
3.1%
Math Symbol
ValueCountFrequency (%)
+336
91.8%
|17
 
4.6%
~6
 
1.6%
=3
 
0.8%
<2
 
0.5%
>2
 
0.5%
Open Punctuation
ValueCountFrequency (%)
(8588
99.9%
[5
 
0.1%
Close Punctuation
ValueCountFrequency (%)
)8552
99.9%
]6
 
0.1%
Space Separator
ValueCountFrequency (%)
641739
100.0%
Dash Punctuation
ValueCountFrequency (%)
-6225
100.0%
Connector Punctuation
ValueCountFrequency (%)
_20
100.0%
Modifier Symbol
ValueCountFrequency (%)
`3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7044258
90.4%
Common747270
 
9.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e674363
 
9.6%
s589271
 
8.4%
o534297
 
7.6%
r517577
 
7.3%
i486629
 
6.9%
n467126
 
6.6%
a438320
 
6.2%
l309452
 
4.4%
t300085
 
4.3%
P223828
 
3.2%
Other values (42)2503310
35.5%
Common
ValueCountFrequency (%)
641739
85.9%
.28630
 
3.8%
&16168
 
2.2%
,15277
 
2.0%
'12099
 
1.6%
(8588
 
1.1%
)8552
 
1.1%
-6225
 
0.8%
/4500
 
0.6%
2654
 
0.1%
Other values (29)4838
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII7791528
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e674363
 
8.7%
641739
 
8.2%
s589271
 
7.6%
o534297
 
6.9%
r517577
 
6.6%
i486629
 
6.2%
n467126
 
6.0%
a438320
 
5.6%
l309452
 
4.0%
t300085
 
3.9%
Other values (81)2832669
36.4%

publication_year
Real number (ℝ≥0)

MISSING
SKEWED

Distinct173
Distinct (%)< 0.1%
Missing66131
Missing (%)12.4%
Infinite0
Infinite (%)0.0%
Mean2007.325558
Minimum2
Maximum65535
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:15.675450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile1989
Q12003
median2010
Q32013
95-th percentile2016
Maximum65535
Range65533
Interquartile range (IQR)10

Descriptive statistics

Standard deviation134.2015492
Coefficient of variation (CV)0.0668558962
Kurtosis119754.9375
Mean2007.325558
Median Absolute Deviation (MAD)4
Skewness302.0827798
Sum937613739
Variance18010.0558
MonotonicityNot monotonic
2021-09-16T08:37:15.902972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201336107
 
6.8%
201434603
 
6.5%
201233681
 
6.3%
201530774
 
5.8%
201128820
 
5.4%
201627909
 
5.2%
201024676
 
4.6%
200922862
 
4.3%
200821028
 
3.9%
200719680
 
3.7%
Other values (163)186956
35.1%
(Missing)66131
 
12.4%
ValueCountFrequency (%)
21
 
< 0.1%
41
 
< 0.1%
91
 
< 0.1%
113
< 0.1%
122
 
< 0.1%
133
< 0.1%
146
< 0.1%
153
< 0.1%
167
< 0.1%
171
 
< 0.1%
ValueCountFrequency (%)
655351
< 0.1%
320141
< 0.1%
210121
< 0.1%
210111
< 0.1%
201871
< 0.1%
201581
< 0.1%
201561
< 0.1%
201361
< 0.1%
201141
< 0.1%
200131
< 0.1%

format
Categorical

HIGH CARDINALITY
MISSING

Distinct348
Distinct (%)0.1%
Missing70276
Missing (%)13.2%
Memory size4.1 MiB
Paperback
313662 
Hardcover
130079 
Mass Market Paperback
 
11790
Board Book
 
2829
Unknown Binding
 
1750
Other values (343)
 
2841

Length

Max length67
Median length9
Mean length9.354599083
Min length2

Characters and Unicode

Total characters4330721
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)0.1%

Sample

1st rowPaperback
2nd rowPaperback
3rd rowPaperback
4th rowPaperback
5th rowHardcover

Common Values

ValueCountFrequency (%)
Paperback313662
58.8%
Hardcover130079
24.4%
Mass Market Paperback11790
 
2.2%
Board Book2829
 
0.5%
Unknown Binding1750
 
0.3%
Spiral-bound604
 
0.1%
Trade Paperback350
 
0.1%
Board book310
 
0.1%
Leather Bound214
 
< 0.1%
Novelty Book211
 
< 0.1%
Other values (338)1152
 
0.2%
(Missing)70276
 
13.2%

Length

2021-09-16T08:37:16.189596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
paperback326081
66.1%
hardcover130157
 
26.4%
mass11796
 
2.4%
market11795
 
2.4%
book3406
 
0.7%
board3142
 
0.6%
binding1901
 
0.4%
unknown1751
 
0.4%
spiral-bound606
 
0.1%
trade375
 
0.1%
Other values (264)2062
 
0.4%

Most occurring characters

ValueCountFrequency (%)
a811028
18.7%
r603316
13.9%
e469742
10.8%
c456592
10.5%
k343215
7.9%
b327359
7.6%
p327247
7.6%
P326044
7.5%
o143525
 
3.3%
d136758
 
3.2%
Other values (59)385895
8.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3807503
87.9%
Uppercase Letter492218
 
11.4%
Space Separator30121
 
0.7%
Dash Punctuation654
 
< 0.1%
Other Punctuation161
 
< 0.1%
Decimal Number44
 
< 0.1%
Math Symbol8
 
< 0.1%
Open Punctuation6
 
< 0.1%
Close Punctuation6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a811028
21.3%
r603316
15.8%
e469742
12.3%
c456592
12.0%
k343215
9.0%
b327359
8.6%
p327247
8.6%
o143525
 
3.8%
d136758
 
3.6%
v130479
 
3.4%
Other values (16)58242
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
P326044
66.2%
H130181
 
26.4%
M23613
 
4.8%
B8434
 
1.7%
U1780
 
0.4%
S720
 
0.1%
T392
 
0.1%
L384
 
0.1%
N244
 
< 0.1%
C119
 
< 0.1%
Other values (13)307
 
0.1%
Decimal Number
ValueCountFrequency (%)
212
27.3%
110
22.7%
95
11.4%
65
11.4%
83
 
6.8%
43
 
6.8%
72
 
4.5%
32
 
4.5%
52
 
4.5%
Other Punctuation
ValueCountFrequency (%)
,74
46.0%
;30
18.6%
&29
 
18.0%
/15
 
9.3%
.10
 
6.2%
'3
 
1.9%
Space Separator
ValueCountFrequency (%)
30121
100.0%
Dash Punctuation
ValueCountFrequency (%)
-654
100.0%
Math Symbol
ValueCountFrequency (%)
+8
100.0%
Open Punctuation
ValueCountFrequency (%)
(6
100.0%
Close Punctuation
ValueCountFrequency (%)
)6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4299721
99.3%
Common31000
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a811028
18.9%
r603316
14.0%
e469742
10.9%
c456592
10.6%
k343215
8.0%
b327359
7.6%
p327247
7.6%
P326044
7.6%
o143525
 
3.3%
d136758
 
3.2%
Other values (39)354895
8.3%
Common
ValueCountFrequency (%)
30121
97.2%
-654
 
2.1%
,74
 
0.2%
;30
 
0.1%
&29
 
0.1%
/15
 
< 0.1%
212
 
< 0.1%
110
 
< 0.1%
.10
 
< 0.1%
+8
 
< 0.1%
Other values (10)37
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4330721
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a811028
18.7%
r603316
13.9%
e469742
10.8%
c456592
10.5%
k343215
7.9%
b327359
7.6%
p327247
7.6%
P326044
7.5%
o143525
 
3.3%
d136758
 
3.2%
Other values (59)385895
8.9%

num_pages
Real number (ℝ≥0)

MISSING

Distinct1684
Distinct (%)0.4%
Missing85148
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean259.9044655
Minimum0
Maximum32000
Zeros289
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:16.328769image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile32
Q1160
median250
Q3335
95-th percentile501
Maximum32000
Range32000
Interquartile range (IQR)175

Descriptive statistics

Standard deviation172.8830227
Coefficient of variation (CV)0.6651791164
Kurtosis2699.689895
Mean259.9044655
Median Absolute Deviation (MAD)86
Skewness18.78706945
Sum116457733
Variance29888.53954
MonotonicityNot monotonic
2021-09-16T08:37:16.453045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3215608
 
2.9%
22412687
 
2.4%
25612184
 
2.3%
28811442
 
2.1%
19210832
 
2.0%
32010638
 
2.0%
3049601
 
1.8%
2408858
 
1.7%
1288054
 
1.5%
3528021
 
1.5%
Other values (1674)340154
63.8%
(Missing)85148
 
16.0%
ValueCountFrequency (%)
0289
0.1%
151
 
< 0.1%
258
 
< 0.1%
319
 
< 0.1%
417
 
< 0.1%
526
 
< 0.1%
647
 
< 0.1%
722
 
< 0.1%
8103
 
< 0.1%
912
 
< 0.1%
ValueCountFrequency (%)
320001
< 0.1%
138001
< 0.1%
90002
< 0.1%
74881
< 0.1%
66881
< 0.1%
57711
< 0.1%
53921
< 0.1%
52721
< 0.1%
51201
< 0.1%
50561
< 0.1%

average_rating
Real number (ℝ≥0)

Distinct340
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.895517304
Minimum0
Maximum5
Zeros523
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:16.578846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.17
Q13.65
median3.91
Q34.17
95-th percentile4.6
Maximum5
Range5
Interquartile range (IQR)0.52

Descriptive statistics

Standard deviation0.4565162789
Coefficient of variation (CV)0.1171901556
Kurtosis6.707013872
Mean3.895517304
Median Absolute Deviation (MAD)0.26
Skewness-0.9907490602
Sum2077191.11
Variance0.2084071129
MonotonicityNot monotonic
2021-09-16T08:37:16.707203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
422069
 
4.1%
510284
 
1.9%
3.677239
 
1.4%
3.57193
 
1.3%
4.56956
 
1.3%
3.756694
 
1.3%
4.336388
 
1.2%
3.836302
 
1.2%
3.86251
 
1.2%
3.886132
 
1.1%
Other values (330)447718
84.0%
ValueCountFrequency (%)
0523
0.1%
1324
0.1%
1.23
 
< 0.1%
1.254
 
< 0.1%
1.281
 
< 0.1%
1.3310
 
< 0.1%
1.351
 
< 0.1%
1.381
 
< 0.1%
1.44
 
< 0.1%
1.441
 
< 0.1%
ValueCountFrequency (%)
510284
1.9%
4.973
 
< 0.1%
4.9614
 
< 0.1%
4.9517
 
< 0.1%
4.9433
 
< 0.1%
4.9354
 
< 0.1%
4.9296
 
< 0.1%
4.9187
 
< 0.1%
4.9103
 
< 0.1%
4.89194
 
< 0.1%

ratings_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct10386
Distinct (%)1.9%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean450.3502117
Minimum0
Maximum1821802
Zeros877
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:16.843816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q112
median37
Q3133
95-th percentile1229
Maximum1821802
Range1821802
Interquartile range (IQR)121

Descriptive statistics

Standard deviation7195.309053
Coefficient of variation (CV)15.97714149
Kurtosis32986.3913
Mean450.3502117
Median Absolute Deviation (MAD)31
Skewness153.7407412
Sum240138442
Variance51772472.36
MonotonicityNot monotonic
2021-09-16T08:37:16.975952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
214620
 
2.7%
314348
 
2.7%
413462
 
2.5%
113094
 
2.5%
512756
 
2.4%
611821
 
2.2%
711320
 
2.1%
810584
 
2.0%
910011
 
1.9%
109280
 
1.7%
Other values (10376)411930
77.3%
ValueCountFrequency (%)
0877
 
0.2%
113094
2.5%
214620
2.7%
314348
2.7%
413462
2.5%
512756
2.4%
611821
2.2%
711320
2.1%
810584
2.0%
910011
1.9%
ValueCountFrequency (%)
18218021
< 0.1%
17925611
< 0.1%
17668951
< 0.1%
16382891
< 0.1%
14657701
< 0.1%
12006611
< 0.1%
9284571
< 0.1%
7489261
< 0.1%
6759271
< 0.1%
6738871
< 0.1%

text_reviews_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct2389
Distinct (%)0.4%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean36.8285905
Minimum0
Maximum69096
Zeros82
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:17.107192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median7
Q319
95-th percentile116
Maximum69096
Range69096
Interquartile range (IQR)16

Descriptive statistics

Standard deviation303.9130462
Coefficient of variation (CV)8.252095507
Kurtosis10123.2784
Mean36.8285905
Median Absolute Deviation (MAD)5
Skewness75.53875794
Sum19637962
Variance92363.13967
MonotonicityNot monotonic
2021-09-16T08:37:17.239976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
173424
 
13.8%
257610
 
10.8%
344892
 
8.4%
436097
 
6.8%
529028
 
5.4%
623983
 
4.5%
720543
 
3.9%
817392
 
3.3%
915154
 
2.8%
1013209
 
2.5%
Other values (2379)201894
37.9%
ValueCountFrequency (%)
082
 
< 0.1%
173424
13.8%
257610
10.8%
344892
8.4%
436097
6.8%
529028
 
5.4%
623983
 
4.5%
720543
 
3.9%
817392
 
3.3%
915154
 
2.8%
ValueCountFrequency (%)
690961
< 0.1%
457411
< 0.1%
440941
< 0.1%
399601
< 0.1%
361211
< 0.1%
335351
< 0.1%
333611
< 0.1%
312731
< 0.1%
292041
< 0.1%
286131
< 0.1%

description
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct464101
Distinct (%)99.3%
Missing65982
Missing (%)12.4%
Memory size4.1 MiB
Boyds Mills Press publishes a wide range of high-quality fiction and nonfiction picture books, chapter books, novels, and nonfiction
 
35
A Simon & Schuster eBook
 
26
The library of America is dedicated to publishing America's best and most significant writing in handsome, enduring volumes, featuring authoritative texts. Hailed as the "finest-looking, longest-lasting editions ever made" (The New Republic), Library of America volumes make a fine gift for any occasion. Now, with exactly one hundred volumes to choose from, there is a perfect gift for everyone.
 
25
.
 
22
This is a pre-1923 historical reproduction that was curated for quality. Quality assurance was conducted on each of these books in an attempt to remove books with imperfections introduced by the digitization process. Though we have made best efforts - the books may have occasional errors that do not impede the reading experience. We believe this work is culturally important and have elected to bring the book back into print as part of our continuing commitment to the preservation of printed works worldwide.
 
20
Other values (464096)
467117 

Length

Max length29542
Median length780
Mean length866.3383835
Min length1

Characters and Unicode

Total characters404792278
Distinct characters98
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique461769 ?
Unique (%)98.8%

Sample

1st rowTo Kara's astonishment, she discovers that a portal has opened in her bedroom closet and two goblins have fallen through! They refuse to return to the fairy realms and be drafted for an impending war. In an attempt to roust the pesky creatures, Kara falls through the portal, smack into the middle of a huge war. Kara meets Queen Selinda, who appoints Kara as a Fairy Princess and assigns her an impossible task: to put an end to the war using her diplomatic skills. All's Fairy In Love And War is the eighth book in Avalon: Web of Magic, a twelve-book fantasy series for middle grade readers. Through their magical journey, the teenage heroines discover who they really are . . . and run into plenty of good guys, bad guys, and cute guys. Out of print for two years, Seven Seas is pleased to return the Avalon series to print in editions targeted for today's readers, with new manga-style covers and interior illustrations.
2nd rowWisdom, humor, and dark observations by the founder of the Church of Satan. LaVey ponders such topics as nonconformity, occult faddism, erotic politics, the "Goodguy badge," demoralization and the construction of artificial human companions.
3rd rowLondon, 1196. At the command of Richard the Lionheart, Sir John de Wolfe has left his beloved West Country for the Palace of Westminster, where he has been appointed Coroner of the Verge. But with the king overseas, embroiled in a costly war against King Philip of France, Sir John is dismayed to discover that the English court is a hotbed of greed, corruption and petty in-fighting. The murder of one of the palace clerks, stabbed in broad daylight and thrown into the River Thames, leads John to suspect that there's a conspiracy underway to overthrow King Richard. And with the visit of the dowager Queen Eleanor fast approaching, the new Coroner must risk his life to prove his suspicions are right, root out the traitors within and prevent a national catastrophe.
4th rowWhat is Heaven really going to be like? What will we look like? What will we do? Won't Heaven get boring after a while? We all have questions about what Heaven will be like, and after 25 years of extensive research, Dr. Randy Alcorn has the answers. In the most comprehensive and definitive book on Heaven to date, Randy invites you to picture Heaven the way Scripture describes it-- a bright, vibrant, and physical New Earth, free from sin, suffering, and death, and brimming with Christ's presence, wondrous natural beauty, and the richness of human culture as God intended it. God has put eternity in our hearts. Now, Randy Alcorn brings eternity to light in a way that will surprise you, spark your imagination, and change how you live life today. If you've always thought of Heaven as a realm of disembodied spirits, clouds, and eternal harp strumming, you're in for a wonderful surprise. This is a book about real people with real bodies enjoying close relationships with God and each other, eating, drinking, working, playing, traveling, worshiping, and discovering on a New Earth. Earth as God created it. Earth as he intended it to be. And the next time you hear someone say, "We cant begin to imagine what Heaven will be like," you'll be able to tell them, "I can."
5th rowIn Newbery Medalist Cynthia Rylant's classic bestseller, the author comforts readers young and old who have lost a dog. Recommended highly by pet lovers around the world, Dog Heaven not only comforts but also brings a tear to anyone who is devoted to a pet. From expansive fields where dogs can run and run to delicious biscuits no dog can resist, Rylant paints a warm and affectionate picture of the ideal place God would, of course, create for man's best friend. The first picture book illustrated by the author, Dog Heaven is enhanced by Rylant's bright, bold paintings that perfectly capture an afterlife sure to bring solace to anyone who is grieving.

Common Values

ValueCountFrequency (%)
Boyds Mills Press publishes a wide range of high-quality fiction and nonfiction picture books, chapter books, novels, and nonfiction35
 
< 0.1%
A Simon & Schuster eBook26
 
< 0.1%
The library of America is dedicated to publishing America's best and most significant writing in handsome, enduring volumes, featuring authoritative texts. Hailed as the "finest-looking, longest-lasting editions ever made" (The New Republic), Library of America volumes make a fine gift for any occasion. Now, with exactly one hundred volumes to choose from, there is a perfect gift for everyone.25
 
< 0.1%
.22
 
< 0.1%
This is a pre-1923 historical reproduction that was curated for quality. Quality assurance was conducted on each of these books in an attempt to remove books with imperfections introduced by the digitization process. Though we have made best efforts - the books may have occasional errors that do not impede the reading experience. We believe this work is culturally important and have elected to bring the book back into print as part of our continuing commitment to the preservation of printed works worldwide.20
 
< 0.1%
Readers assume the role of archaeologists, uncovering secrets of ancient civilizations. Stunning photographs and illustrations, plus detailed cutaways, maps and diagrams.18
 
< 0.1%
Many of the earliest books, particularly those dating back to the 1900s and before, are now extremely scarce and increasingly expensive. We are republishing these classic works in affordable, high quality, modern editions, using the original text and artwork.16
 
< 0.1%
Since 1973, Storey's Country Wisdom Bulletins have offered practical, hands-on instructions designed to help readers master dozens of country living skills quickly and easily. There are now more than 170 titles in this series, and their remarkable popularity reflects the common desire of country and city dwellers alike to cultivate personal independence in everyday life.16
 
< 0.1%
>15
 
< 0.1%
Important Note about PRINT ON DEMAND Editions: You are purchasing a print on demand edition of this book. This book is printed individually on uncoated (non-glossy) paper with the best quality printers available. The printing quality of this copy will vary from the original offset printing edition and may look more saturated. The information presented in this version is the same as the latest edition. Any pattern pullouts have been separated and presented as single pages. If the pullout patterns are missing, please contact c&t publishing.14
 
< 0.1%
Other values (464091)467038
87.6%
(Missing)65982
 
12.4%

Length

2021-09-16T08:37:17.621560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the3813924
 
5.7%
and2510952
 
3.8%
of2227175
 
3.4%
a1802318
 
2.7%
to1742369
 
2.6%
in1177947
 
1.8%
is750420
 
1.1%
for577892
 
0.9%
her571520
 
0.9%
with568648
 
0.9%
Other values (969573)50733364
76.3%

Most occurring characters

ValueCountFrequency (%)
65210584
16.1%
e38664009
 
9.6%
t26815365
 
6.6%
a25736825
 
6.4%
o24049610
 
5.9%
i23574550
 
5.8%
n23453365
 
5.8%
s21746516
 
5.4%
r20891079
 
5.2%
h16031384
 
4.0%
Other values (88)118618991
29.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter313942331
77.6%
Space Separator65210584
 
16.1%
Uppercase Letter11809493
 
2.9%
Other Punctuation10057595
 
2.5%
Dash Punctuation1632372
 
0.4%
Decimal Number1063974
 
0.3%
Control814206
 
0.2%
Close Punctuation121451
 
< 0.1%
Open Punctuation120335
 
< 0.1%
Math Symbol10201
 
< 0.1%
Other values (3)9736
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T1106248
 
9.4%
A1039049
 
8.8%
S927119
 
7.9%
C749383
 
6.3%
B713357
 
6.0%
I666373
 
5.6%
M634520
 
5.4%
W599115
 
5.1%
H541918
 
4.6%
E490882
 
4.2%
Other values (16)4341529
36.8%
Lowercase Letter
ValueCountFrequency (%)
e38664009
12.3%
t26815365
 
8.5%
a25736825
 
8.2%
o24049610
 
7.7%
i23574550
 
7.5%
n23453365
 
7.5%
s21746516
 
6.9%
r20891079
 
6.7%
h16031384
 
5.1%
l13856392
 
4.4%
Other values (16)79123236
25.2%
Other Punctuation
ValueCountFrequency (%)
,4225582
42.0%
.3348584
33.3%
'1171603
 
11.6%
"522133
 
5.2%
?213537
 
2.1%
:189075
 
1.9%
!132794
 
1.3%
;106369
 
1.1%
*67580
 
0.7%
/32676
 
0.3%
Other values (5)47662
 
0.5%
Decimal Number
ValueCountFrequency (%)
1242696
22.8%
0217710
20.5%
9127767
12.0%
2117022
11.0%
569531
 
6.5%
864696
 
6.1%
362241
 
5.8%
456494
 
5.3%
655023
 
5.2%
750794
 
4.8%
Math Symbol
ValueCountFrequency (%)
+3301
32.4%
~2419
23.7%
>1598
15.7%
=1420
13.9%
<957
 
9.4%
|506
 
5.0%
Control
ValueCountFrequency (%)
812784
99.8%
1420
 
0.2%
2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(113644
94.4%
[6557
 
5.4%
{134
 
0.1%
Close Punctuation
ValueCountFrequency (%)
)114708
94.4%
]6609
 
5.4%
}134
 
0.1%
Modifier Symbol
ValueCountFrequency (%)
`1676
98.4%
^28
 
1.6%
Space Separator
ValueCountFrequency (%)
65210584
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1632372
100.0%
Currency Symbol
ValueCountFrequency (%)
$3516
100.0%
Connector Punctuation
ValueCountFrequency (%)
_4516
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin325751824
80.5%
Common79040454
 
19.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e38664009
11.9%
t26815365
 
8.2%
a25736825
 
7.9%
o24049610
 
7.4%
i23574550
 
7.2%
n23453365
 
7.2%
s21746516
 
6.7%
r20891079
 
6.4%
h16031384
 
4.9%
l13856392
 
4.3%
Other values (42)90932729
27.9%
Common
ValueCountFrequency (%)
65210584
82.5%
,4225582
 
5.3%
.3348584
 
4.2%
-1632372
 
2.1%
'1171603
 
1.5%
812784
 
1.0%
"522133
 
0.7%
1242696
 
0.3%
0217710
 
0.3%
?213537
 
0.3%
Other values (36)1442869
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII404792278
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
65210584
16.1%
e38664009
 
9.6%
t26815365
 
6.6%
a25736825
 
6.4%
o24049610
 
5.9%
i23574550
 
5.8%
n23453365
 
5.8%
s21746516
 
5.4%
r20891079
 
5.2%
h16031384
 
4.0%
Other values (88)118618991
29.3%

title_az
Categorical

HIGH CARDINALITY
UNIFORM

Distinct519053
Distinct (%)97.3%
Missing4
Missing (%)< 0.1%
Memory size4.1 MiB
Selected Poems
 
35
Collected Poems
 
32
Redemption
 
25
Legacy
 
20
Second Chances
 
18
Other values (519048)
533093 

Length

Max length379
Median length39
Mean length44.41892416
Min length1

Characters and Unicode

Total characters23685192
Distinct characters94
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique510129 ?
Unique (%)95.7%

Sample

1st rowW. C. Fields: A Life on Film
2nd rowRunic Astrology: Starcraft and Timekeeping in the Northern Tradition
3rd rowThe Wanting of Levine
4th rowAvalon: Web of Magic Book 8: All's Fairy in Love and War
5th rowThe Devil's Notebook

Common Values

ValueCountFrequency (%)
Selected Poems35
 
< 0.1%
Collected Poems32
 
< 0.1%
Redemption25
 
< 0.1%
Legacy20
 
< 0.1%
Second Chances18
 
< 0.1%
Broken17
 
< 0.1%
Home16
 
< 0.1%
Obsession16
 
< 0.1%
The Promise16
 
< 0.1%
Little Red Riding Hood15
 
< 0.1%
Other values (519043)533013
> 99.9%

Length

2021-09-16T08:37:18.010798image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the309025
 
8.0%
of154640
 
4.0%
a113087
 
2.9%
and102121
 
2.7%
to58228
 
1.5%
in50047
 
1.3%
for34782
 
0.9%
book24984
 
0.6%
volume20396
 
0.5%
novel20234
 
0.5%
Other values (135950)2957537
76.9%

Most occurring characters

ValueCountFrequency (%)
3315188
 
14.0%
e2160188
 
9.1%
o1475970
 
6.2%
a1307667
 
5.5%
i1306447
 
5.5%
n1230020
 
5.2%
r1216610
 
5.1%
t1177696
 
5.0%
s1009525
 
4.3%
l746799
 
3.2%
Other values (84)8739082
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter16056817
67.8%
Space Separator3315188
 
14.0%
Uppercase Letter3204248
 
13.5%
Other Punctuation541186
 
2.3%
Decimal Number194866
 
0.8%
Open Punctuation159592
 
0.7%
Close Punctuation159546
 
0.7%
Dash Punctuation52447
 
0.2%
Math Symbol1013
 
< 0.1%
Currency Symbol191
 
< 0.1%
Other values (2)98
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T347056
 
10.8%
S292923
 
9.1%
A255492
 
8.0%
C223091
 
7.0%
M192547
 
6.0%
B187883
 
5.9%
P156819
 
4.9%
W149415
 
4.7%
L140768
 
4.4%
D140643
 
4.4%
Other values (16)1117611
34.9%
Lowercase Letter
ValueCountFrequency (%)
e2160188
13.5%
o1475970
 
9.2%
a1307667
 
8.1%
i1306447
 
8.1%
n1230020
 
7.7%
r1216610
 
7.6%
t1177696
 
7.3%
s1009525
 
6.3%
l746799
 
4.7%
h706069
 
4.4%
Other values (16)3719826
23.2%
Other Punctuation
ValueCountFrequency (%)
:247764
45.8%
,106012
19.6%
'72790
 
13.5%
.41805
 
7.7%
;19373
 
3.6%
&17698
 
3.3%
!12403
 
2.3%
#7264
 
1.3%
?6651
 
1.2%
/5978
 
1.1%
Other values (5)3448
 
0.6%
Decimal Number
ValueCountFrequency (%)
154696
28.1%
029631
15.2%
229027
14.9%
316854
 
8.6%
513882
 
7.1%
913225
 
6.8%
412214
 
6.3%
69003
 
4.6%
88771
 
4.5%
77563
 
3.9%
Math Symbol
ValueCountFrequency (%)
+778
76.8%
~118
 
11.6%
=64
 
6.3%
|47
 
4.6%
>6
 
0.6%
Open Punctuation
ValueCountFrequency (%)
(158927
99.6%
[649
 
0.4%
{16
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
)158878
99.6%
]650
 
0.4%
}18
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
`33
91.7%
^3
 
8.3%
Space Separator
ValueCountFrequency (%)
3315188
100.0%
Dash Punctuation
ValueCountFrequency (%)
-52447
100.0%
Currency Symbol
ValueCountFrequency (%)
$191
100.0%
Connector Punctuation
ValueCountFrequency (%)
_62
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin19261065
81.3%
Common4424127
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2160188
 
11.2%
o1475970
 
7.7%
a1307667
 
6.8%
i1306447
 
6.8%
n1230020
 
6.4%
r1216610
 
6.3%
t1177696
 
6.1%
s1009525
 
5.2%
l746799
 
3.9%
h706069
 
3.7%
Other values (42)6924074
35.9%
Common
ValueCountFrequency (%)
3315188
74.9%
:247764
 
5.6%
(158927
 
3.6%
)158878
 
3.6%
,106012
 
2.4%
'72790
 
1.6%
154696
 
1.2%
-52447
 
1.2%
.41805
 
0.9%
029631
 
0.7%
Other values (32)185989
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII23685192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3315188
 
14.0%
e2160188
 
9.1%
o1475970
 
6.2%
a1307667
 
5.5%
i1306447
 
5.5%
n1230020
 
5.2%
r1216610
 
5.1%
t1177696
 
5.0%
s1009525
 
4.3%
l746799
 
3.2%
Other values (84)8739082
36.9%

rank
Real number (ℝ≥0)

Distinct506628
Distinct (%)95.1%
Missing293
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean2797902.74
Minimum1
Maximum21667465
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2021-09-16T08:37:18.188361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile105503.65
Q1692429.75
median1709189.5
Q33609696
95-th percentile9865773.25
Maximum21667465
Range21667464
Interquartile range (IQR)2917266.25

Descriptive statistics

Standard deviation3239062.111
Coefficient of variation (CV)1.15767502
Kurtosis6.117488824
Mean2797902.74
Median Absolute Deviation (MAD)1242502.5
Skewness2.300125005
Sum1.491097499 × 1012
Variance1.049152336 × 1013
MonotonicityNot monotonic
2021-09-16T08:37:18.330201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
543305
 
< 0.1%
1211764
 
< 0.1%
2410344
 
< 0.1%
2107064
 
< 0.1%
6663034
 
< 0.1%
9380254
 
< 0.1%
9240844
 
< 0.1%
8114834
 
< 0.1%
970454
 
< 0.1%
13861684
 
< 0.1%
Other values (506618)532893
99.9%
(Missing)293
 
0.1%
ValueCountFrequency (%)
12
< 0.1%
23
< 0.1%
31
 
< 0.1%
51
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
112
< 0.1%
201
 
< 0.1%
231
 
< 0.1%
251
 
< 0.1%
ValueCountFrequency (%)
216674651
< 0.1%
216399371
< 0.1%
216368051
< 0.1%
216241311
< 0.1%
216199881
< 0.1%
216186951
< 0.1%
216185571
< 0.1%
216087291
< 0.1%
216025571
< 0.1%
216024711
< 0.1%

category
Categorical

HIGH CARDINALITY
MISSING

Distinct609
Distinct (%)0.1%
Missing18598
Missing (%)3.5%
Memory size4.1 MiB
GenreFiction
 
37069
Literature&amp;Fiction
 
30273
GrowingUp&amp;FactsofLife
 
17297
UnitedStates
 
16486
Contemporary
 
15217
Other values (604)
398287 

Length

Max length48
Median length14
Mean length15.29573343
Min length3

Characters and Unicode

Total characters7871628
Distinct characters61
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique69 ?
Unique (%)< 0.1%

Sample

1st rowBiographies&amp;Memoirs
2nd rowNewAge&Spirituality
3rd rowLiterature&amp;Fiction
4th rowScienceFiction&amp;Fantasy
5th rowWorld

Common Values

ValueCountFrequency (%)
GenreFiction37069
 
7.0%
Literature&amp;Fiction30273
 
5.7%
GrowingUp&amp;FactsofLife17297
 
3.2%
UnitedStates16486
 
3.1%
Contemporary15217
 
2.9%
Mystery14510
 
2.7%
GraphicNovels12992
 
2.4%
Americas11523
 
2.2%
Fantasy10351
 
1.9%
ChristianLiving10121
 
1.9%
Other values (599)338790
63.5%
(Missing)18598
 
3.5%

Length

2021-09-16T08:37:18.653440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/