diff --git "a/reports/311_data_1.html" "b/reports/311_data_1.html" new file mode 100644--- /dev/null +++ "b/reports/311_data_1.html" @@ -0,0 +1,11810 @@ +311 Service Calls Report

Overview

Dataset statistics

Number of variables23
Number of observations7631721
Missing cells68145580
Missing cells (%)38.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 GiB
Average record size in memory184.0 B

Variable types

Numeric1
DateTime1
Text11
Categorical10

Alerts

Facility Type is highly imbalanced (68.4%)Imbalance
Status is highly imbalanced (93.3%)Imbalance
Vehicle Type is highly imbalanced (74.8%)Imbalance
Closed Date has 143947 (1.9%) missing valuesMissing
Descriptor has 129525 (1.7%) missing valuesMissing
Location Type has 1699240 (22.3%) missing valuesMissing
Landmark has 7629562 (> 99.9%) missing valuesMissing
Facility Type has 5217452 (68.4%) missing valuesMissing
Vehicle Type has 7631568 (> 99.9%) missing valuesMissing
Taxi Company Borough has 7625165 (99.9%) missing valuesMissing
Taxi Pick Up Location has 7592481 (99.5%) missing valuesMissing
Bridge Highway Name has 7618111 (99.8%) missing valuesMissing
Bridge Highway Direction has 7618125 (99.8%) missing valuesMissing
Road Ramp has 7618286 (99.8%) missing valuesMissing
Bridge Highway Segment has 7615395 (99.8%) missing valuesMissing
Unique Key has unique valuesUnique

Reproduction

Analysis started2024-04-22 15:30:54.003259
Analysis finished2024-04-22 15:44:48.702554
Duration13 minutes and 54.7 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

Unique Key
Real number (ℝ)

UNIQUE 

Distinct7631721
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36901967
Minimum32305076
Maximum52179269
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size58.2 MiB
2024-04-22T11:44:48.788343image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum32305076
5-th percentile32778445
Q134613509
median36920091
Q339151185
95-th percentile41007169
Maximum52179269
Range19874193
Interquartile range (IQR)4537676

Descriptive statistics

Standard deviation2655089.9
Coefficient of variation (CV)0.07194982
Kurtosis-1.0864008
Mean36901967
Median Absolute Deviation (MAD)2268547
Skewness0.034143271
Sum2.8162551 × 1014
Variance7.0495023 × 1012
MonotonicityNot monotonic
2024-04-22T11:44:48.906480image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38237851 1
 
< 0.1%
34826212 1
 
< 0.1%
34826224 1
 
< 0.1%
34826223 1
 
< 0.1%
34826222 1
 
< 0.1%
34826221 1
 
< 0.1%
34826220 1
 
< 0.1%
34826219 1
 
< 0.1%
34826218 1
 
< 0.1%
34826217 1
 
< 0.1%
Other values (7631711) 7631711
> 99.9%
ValueCountFrequency (%)
32305076 1
< 0.1%
32305086 1
< 0.1%
32305088 1
< 0.1%
32305113 1
< 0.1%
32305114 1
< 0.1%
32305125 1
< 0.1%
32305135 1
< 0.1%
32305138 1
< 0.1%
32305139 1
< 0.1%
32305154 1
< 0.1%
ValueCountFrequency (%)
52179269 1
< 0.1%
52179256 1
< 0.1%
52179246 1
< 0.1%
52179245 1
< 0.1%
52179244 1
< 0.1%
52179235 1
< 0.1%
52179234 1
< 0.1%
52179233 1
< 0.1%
52179232 1
< 0.1%
52179231 1
< 0.1%
Distinct5657411
Distinct (%)74.1%
Missing0
Missing (%)0.0%
Memory size58.2 MiB
Minimum2016-01-01 00:00:00
Maximum2018-12-31 23:59:56
2024-04-22T11:44:49.022085image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-04-22T11:44:49.133183image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Closed Date
Text

MISSING 

Distinct3171371
Distinct (%)42.4%
Missing143947
Missing (%)1.9%
Memory size58.2 MiB
2024-04-22T11:44:49.359940image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters164731028
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1978311 ?
Unique (%)26.4%

Sample

1st row01/24/2018 12:00:00 AM
2nd row01/21/2018 12:00:00 AM
3rd row01/20/2018 10:02:00 PM
4th row01/19/2018 12:00:00 AM
5th row01/20/2018 07:41:00 PM
ValueCountFrequency (%)
am 3860303
 
17.2%
pm 3627471
 
16.1%
12:00:00 1198410
 
5.3%
10:00:00 16923
 
0.1%
11:00:00 16300
 
0.1%
01:00:00 13934
 
0.1%
09:00:00 13758
 
0.1%
01:45:00 13363
 
0.1%
02:00:00 13107
 
0.1%
10:30:00 13066
 
0.1%
Other values (45421) 13676687
60.9%
2024-04-22T11:44:49.658282image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 32503924
19.7%
1 21457693
13.0%
2 17201338
10.4%
/ 14975548
9.1%
14975548
9.1%
: 14975548
9.1%
M 7487774
 
4.5%
8 5606542
 
3.4%
7 5223507
 
3.2%
3 5154427
 
3.1%
Other values (6) 25169179
15.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 104828836
63.6%
Other Punctuation 29951096
 
18.2%
Space Separator 14975548
 
9.1%
Uppercase Letter 14975548
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 32503924
31.0%
1 21457693
20.5%
2 17201338
16.4%
8 5606542
 
5.3%
7 5223507
 
5.0%
3 5154427
 
4.9%
6 5042739
 
4.8%
5 4793207
 
4.6%
4 4611931
 
4.4%
9 3233528
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
M 7487774
50.0%
A 3860303
25.8%
P 3627471
24.2%
Other Punctuation
ValueCountFrequency (%)
/ 14975548
50.0%
: 14975548
50.0%
Space Separator
ValueCountFrequency (%)
14975548
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 149755480
90.9%
Latin 14975548
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 32503924
21.7%
1 21457693
14.3%
2 17201338
11.5%
/ 14975548
10.0%
14975548
10.0%
: 14975548
10.0%
8 5606542
 
3.7%
7 5223507
 
3.5%
3 5154427
 
3.4%
6 5042739
 
3.4%
Other values (3) 12638666
 
8.4%
Latin
ValueCountFrequency (%)
M 7487774
50.0%
A 3860303
25.8%
P 3627471
24.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 164731028
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 32503924
19.7%
1 21457693
13.0%
2 17201338
10.4%
/ 14975548
9.1%
14975548
9.1%
: 14975548
9.1%
M 7487774
 
4.5%
8 5606542
 
3.4%
7 5223507
 
3.2%
3 5154427
 
3.1%
Other values (6) 25169179
15.3%

Agency
Categorical

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.2 MiB
NYPD
2167548 
HPD
1778185 
DOT
877335 
DSNY
831025 
DEP
585582 
Other values (25)
1392046 

Length

Max length5
Median length3
Mean length3.450203
Min length3

Characters and Unicode

Total characters26330987
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDSNY
2nd rowDSNY
3rd rowDSNY
4th rowDSNY
5th rowDSNY

Common Values

ValueCountFrequency (%)
NYPD 2167548
28.4%
HPD 1778185
23.3%
DOT 877335
11.5%
DSNY 831025
 
10.9%
DEP 585582
 
7.7%
DOB 399518
 
5.2%
DPR 316820
 
4.2%
DOHMH 201343
 
2.6%
DOF 149812
 
2.0%
DHS 88780
 
1.2%
Other values (20) 235773
 
3.1%

Length

2024-04-22T11:44:49.777023image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
nypd 2167548
28.4%
hpd 1778185
23.3%
dot 877335
11.5%
dsny 831025
 
10.9%
dep 585582
 
7.7%
dob 399518
 
5.2%
dpr 316820
 
4.2%
dohmh 201343
 
2.6%
dof 149812
 
2.0%
dhs 88780
 
1.2%
Other values (20) 235773
 
3.1%

Most occurring characters

ValueCountFrequency (%)
D 7490189
28.4%
P 4848210
18.4%
N 2999777
11.4%
Y 2999777
11.4%
H 2325398
 
8.8%
O 1636139
 
6.2%
T 990234
 
3.8%
S 922330
 
3.5%
E 596426
 
2.3%
B 399536
 
1.5%
Other values (11) 1122971
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 26328747
> 99.9%
Decimal Number 1344
 
< 0.1%
Dash Punctuation 896
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D 7490189
28.4%
P 4848210
18.4%
N 2999777
11.4%
Y 2999777
11.4%
H 2325398
 
8.8%
O 1636139
 
6.2%
T 990234
 
3.8%
S 922330
 
3.5%
E 596426
 
2.3%
B 399536
 
1.5%
Other values (8) 1120731
 
4.3%
Decimal Number
ValueCountFrequency (%)
1 896
66.7%
3 448
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 896
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 26328747
> 99.9%
Common 2240
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 7490189
28.4%
P 4848210
18.4%
N 2999777
11.4%
Y 2999777
11.4%
H 2325398
 
8.8%
O 1636139
 
6.2%
T 990234
 
3.8%
S 922330
 
3.5%
E 596426
 
2.3%
B 399536
 
1.5%
Other values (8) 1120731
 
4.3%
Common
ValueCountFrequency (%)
- 896
40.0%
1 896
40.0%
3 448
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26330987
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 7490189
28.4%
P 4848210
18.4%
N 2999777
11.4%
Y 2999777
11.4%
H 2325398
 
8.8%
O 1636139
 
6.2%
T 990234
 
3.8%
S 922330
 
3.5%
E 596426
 
2.3%
B 399536
 
1.5%
Other values (11) 1122971
 
4.3%
Distinct1373
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.2 MiB
2024-04-22T11:44:49.924853image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length82
Median length78
Mean length34.290636
Min length3

Characters and Unicode

Total characters261696566
Distinct characters69
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique342 ?
Unique (%)< 0.1%

Sample

1st rowDepartment of Sanitation
2nd rowDepartment of Sanitation
3rd rowDepartment of Sanitation
4th rowDepartment of Sanitation
5th rowDepartment of Sanitation
ValueCountFrequency (%)
department 6917923
19.7%
of 4732019
13.5%
and 2411979
 
6.9%
new 2166530
 
6.2%
york 2166426
 
6.2%
city 2166367
 
6.2%
police 2166325
 
6.2%
development 1781227
 
5.1%
housing 1778111
 
5.1%
preservation 1778111
 
5.1%
Other values (1855) 7082746
20.2%
2024-04-22T11:44:50.217353image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 31300846
12.0%
27516043
 
10.5%
t 25609180
 
9.8%
n 22229250
 
8.5%
o 19878380
 
7.6%
r 16991927
 
6.5%
a 16146107
 
6.2%
i 13205430
 
5.0%
m 9835462
 
3.8%
p 9812916
 
3.7%
Other values (59) 69171025
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 205632134
78.6%
Uppercase Letter 27977089
 
10.7%
Space Separator 27516043
 
10.5%
Dash Punctuation 402569
 
0.2%
Decimal Number 161941
 
0.1%
Other Punctuation 6752
 
< 0.1%
Open Punctuation 19
 
< 0.1%
Close Punctuation 19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 31300846
15.2%
t 25609180
12.5%
n 22229250
10.8%
o 19878380
9.7%
r 16991927
8.3%
a 16146107
7.9%
i 13205430
 
6.4%
m 9835462
 
4.8%
p 9812916
 
4.8%
s 6292678
 
3.1%
Other values (16) 34329958
16.7%
Uppercase Letter
ValueCountFrequency (%)
D 8706702
31.1%
P 4915156
17.6%
C 2834178
 
10.1%
H 2329904
 
8.3%
N 2211407
 
7.9%
Y 2168172
 
7.7%
T 973118
 
3.5%
B 825310
 
2.9%
E 705860
 
2.5%
S 655743
 
2.3%
Other values (16) 1651539
 
5.9%
Decimal Number
ValueCountFrequency (%)
0 62358
38.5%
1 38414
23.7%
2 18367
 
11.3%
3 17112
 
10.6%
4 8020
 
5.0%
8 4971
 
3.1%
6 4934
 
3.0%
5 3615
 
2.2%
7 3466
 
2.1%
9 684
 
0.4%
Other Punctuation
ValueCountFrequency (%)
, 6644
98.4%
' 72
 
1.1%
: 36
 
0.5%
Space Separator
ValueCountFrequency (%)
27516043
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 402569
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 233609223
89.3%
Common 28087343
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 31300846
13.4%
t 25609180
11.0%
n 22229250
 
9.5%
o 19878380
 
8.5%
r 16991927
 
7.3%
a 16146107
 
6.9%
i 13205430
 
5.7%
m 9835462
 
4.2%
p 9812916
 
4.2%
D 8706702
 
3.7%
Other values (42) 59893023
25.6%
Common
ValueCountFrequency (%)
27516043
98.0%
- 402569
 
1.4%
0 62358
 
0.2%
1 38414
 
0.1%
2 18367
 
0.1%
3 17112
 
0.1%
4 8020
 
< 0.1%
, 6644
 
< 0.1%
8 4971
 
< 0.1%
6 4934
 
< 0.1%
Other values (7) 7911
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 261696566
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 31300846
12.0%
27516043
 
10.5%
t 25609180
 
9.8%
n 22229250
 
8.5%
o 19878380
 
7.6%
r 16991927
 
6.5%
a 16146107
 
6.2%
i 13205430
 
5.0%
m 9835462
 
3.8%
p 9812916
 
3.7%
Other values (59) 69171025
26.4%
Distinct271
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.2 MiB
2024-04-22T11:44:50.373079image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length41
Median length33
Mean length17.007385
Min length3

Characters and Unicode

Total characters129795617
Distinct characters59
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowRequest Large Bulky Item Collection
2nd rowRequest Large Bulky Item Collection
3rd rowRequest Large Bulky Item Collection
4th rowRequest Large Bulky Item Collection
5th rowRequest Large Bulky Item Collection
ValueCountFrequency (%)
1335244
 
7.7%
noise 1305443
 
7.5%
condition 1161910
 
6.7%
water 1012500
 
5.8%
residential 670050
 
3.9%
heat/hot 665315
 
3.8%
street 586764
 
3.4%
parking 467965
 
2.7%
illegal 443251
 
2.6%
blocked 392458
 
2.3%
Other values (356) 9299196
53.6%
2024-04-22T11:44:50.649766image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 12875357
 
9.9%
i 9907988
 
7.6%
9708375
 
7.5%
t 7478979
 
5.8%
o 6580378
 
5.1%
n 6475264
 
5.0%
l 5876127
 
4.5%
a 5342985
 
4.1%
r 4861128
 
3.7%
s 4621470
 
3.6%
Other values (49) 56067566
43.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 81656963
62.9%
Uppercase Letter 35167695
27.1%
Space Separator 9708375
 
7.5%
Other Punctuation 1635452
 
1.3%
Dash Punctuation 1354422
 
1.0%
Open Punctuation 136354
 
0.1%
Close Punctuation 136354
 
0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12875357
15.8%
i 9907988
12.1%
t 7478979
9.2%
o 6580378
8.1%
n 6475264
7.9%
l 5876127
7.2%
a 5342985
 
6.5%
r 4861128
 
6.0%
s 4621470
 
5.7%
d 3115955
 
3.8%
Other values (16) 14521332
17.8%
Uppercase Letter
ValueCountFrequency (%)
T 3617955
 
10.3%
N 3023045
 
8.6%
A 3010699
 
8.6%
R 2915949
 
8.3%
S 2509008
 
7.1%
C 2450028
 
7.0%
E 2382909
 
6.8%
I 2312430
 
6.6%
O 1864243
 
5.3%
P 1607318
 
4.6%
Other values (15) 9474111
26.9%
Other Punctuation
ValueCountFrequency (%)
/ 1634828
> 99.9%
' 612
 
< 0.1%
. 12
 
< 0.1%
Space Separator
ValueCountFrequency (%)
9708375
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1354422
100.0%
Open Punctuation
ValueCountFrequency (%)
( 136354
100.0%
Close Punctuation
ValueCountFrequency (%)
) 136354
100.0%
Decimal Number
ValueCountFrequency (%)
4 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 116824658
90.0%
Common 12970959
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12875357
 
11.0%
i 9907988
 
8.5%
t 7478979
 
6.4%
o 6580378
 
5.6%
n 6475264
 
5.5%
l 5876127
 
5.0%
a 5342985
 
4.6%
r 4861128
 
4.2%
s 4621470
 
4.0%
T 3617955
 
3.1%
Other values (41) 49187027
42.1%
Common
ValueCountFrequency (%)
9708375
74.8%
/ 1634828
 
12.6%
- 1354422
 
10.4%
( 136354
 
1.1%
) 136354
 
1.1%
' 612
 
< 0.1%
. 12
 
< 0.1%
4 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 129795617
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12875357
 
9.9%
i 9907988
 
7.6%
9708375
 
7.5%
t 7478979
 
5.8%
o 6580378
 
5.1%
n 6475264
 
5.0%
l 5876127
 
4.5%
a 5342985
 
4.1%
r 4861128
 
3.7%
s 4621470
 
3.6%
Other values (49) 56067566
43.2%

Descriptor
Text

MISSING 

Distinct1314
Distinct (%)< 0.1%
Missing129525
Missing (%)1.7%
Memory size58.2 MiB
2024-04-22T11:44:50.806499image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length80
Median length63
Mean length18.535133
Min length3

Characters and Unicode

Total characters139054204
Distinct characters76
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)< 0.1%

Sample

1st rowRequest Large Bulky Item Collection
2nd rowRequest Large Bulky Item Collection
3rd rowRequest Large Bulky Item Collection
4th rowRequest Large Bulky Item Collection
5th rowRequest Large Bulky Item Collection
ValueCountFrequency (%)
loud 829704
 
4.3%
music/party 696660
 
3.6%
building 528467
 
2.7%
entire 448188
 
2.3%
access 393451
 
2.0%
389412
 
2.0%
no 368568
 
1.9%
street 333805
 
1.7%
collection 299475
 
1.5%
request 276089
 
1.4%
Other values (1704) 14841851
76.5%
2024-04-22T11:44:51.105427image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11904997
 
8.6%
e 9940675
 
7.1%
i 7775645
 
5.6%
t 7296442
 
5.2%
o 7112437
 
5.1%
n 6330243
 
4.6%
r 5970261
 
4.3%
a 5697013
 
4.1%
l 4813929
 
3.5%
s 4703019
 
3.4%
Other values (66) 67509543
48.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 84376607
60.7%
Uppercase Letter 37037039
26.6%
Space Separator 11904997
 
8.6%
Other Punctuation 2403910
 
1.7%
Decimal Number 1093995
 
0.8%
Open Punctuation 791269
 
0.6%
Close Punctuation 790028
 
0.6%
Dash Punctuation 656308
 
0.5%
Other Symbol 51
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 3185828
 
8.6%
N 2797927
 
7.6%
P 2646636
 
7.1%
I 2622547
 
7.1%
E 2579532
 
7.0%
A 2291255
 
6.2%
R 2258719
 
6.1%
S 2071969
 
5.6%
C 2057280
 
5.6%
B 2054680
 
5.5%
Other values (17) 12470666
33.7%
Lowercase Letter
ValueCountFrequency (%)
e 9940675
11.8%
i 7775645
 
9.2%
t 7296442
 
8.6%
o 7112437
 
8.4%
n 6330243
 
7.5%
r 5970261
 
7.1%
a 5697013
 
6.8%
l 4813929
 
5.7%
s 4703019
 
5.6%
c 4116077
 
4.9%
Other values (16) 20620866
24.4%
Decimal Number
ValueCountFrequency (%)
1 481900
44.0%
2 210660
19.3%
3 128852
 
11.8%
5 125819
 
11.5%
4 53074
 
4.9%
0 44382
 
4.1%
9 19778
 
1.8%
8 18752
 
1.7%
6 9047
 
0.8%
7 1731
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 2090176
86.9%
: 178382
 
7.4%
, 100858
 
4.2%
. 21176
 
0.9%
\ 4842
 
0.2%
" 4842
 
0.2%
& 3624
 
0.2%
* 10
 
< 0.1%
Space Separator
ValueCountFrequency (%)
11904997
100.0%
Open Punctuation
ValueCountFrequency (%)
( 791269
100.0%
Close Punctuation
ValueCountFrequency (%)
) 790028
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 656308
100.0%
Other Symbol
ValueCountFrequency (%)
© 51
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 121413646
87.3%
Common 17640558
 
12.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 9940675
 
8.2%
i 7775645
 
6.4%
t 7296442
 
6.0%
o 7112437
 
5.9%
n 6330243
 
5.2%
r 5970261
 
4.9%
a 5697013
 
4.7%
l 4813929
 
4.0%
s 4703019
 
3.9%
c 4116077
 
3.4%
Other values (43) 57657905
47.5%
Common
ValueCountFrequency (%)
11904997
67.5%
/ 2090176
 
11.8%
( 791269
 
4.5%
) 790028
 
4.5%
- 656308
 
3.7%
1 481900
 
2.7%
2 210660
 
1.2%
: 178382
 
1.0%
3 128852
 
0.7%
5 125819
 
0.7%
Other values (13) 282167
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 139054102
> 99.9%
None 102
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11904997
 
8.6%
e 9940675
 
7.1%
i 7775645
 
5.6%
t 7296442
 
5.2%
o 7112437
 
5.1%
n 6330243
 
4.6%
r 5970261
 
4.3%
a 5697013
 
4.1%
l 4813929
 
3.5%
s 4703019
 
3.4%
Other values (64) 67509441
48.5%
None
ValueCountFrequency (%)
à 51
50.0%
© 51
50.0%

Location Type
Text

MISSING 

Distinct161
Distinct (%)< 0.1%
Missing1699240
Missing (%)22.3%
Memory size58.2 MiB
2024-04-22T11:44:51.354549image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length36
Median length30
Mean length16.105777
Min length3

Characters and Unicode

Total characters95547217
Distinct characters59
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowSidewalk
2nd rowSidewalk
3rd rowSidewalk
4th rowSidewalk
5th rowSidewalk
ValueCountFrequency (%)
residential 2491260
27.1%
building 1860682
20.3%
street/sidewalk 1331865
14.5%
street 718155
 
7.8%
building/house 709171
 
7.7%
sidewalk 652507
 
7.1%
address 176877
 
1.9%
family 124163
 
1.4%
store/commercial 103650
 
1.1%
3 90303
 
1.0%
Other values (179) 925986
 
10.1%
2024-04-22T11:44:51.711374image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 9218360
 
9.6%
I 7067894
 
7.4%
S 5969522
 
6.2%
i 5527392
 
5.8%
t 5407940
 
5.7%
l 4019177
 
4.2%
d 3973723
 
4.2%
D 3603967
 
3.8%
N 3592566
 
3.8%
L 3564165
 
3.7%
Other values (49) 43602511
45.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 47818150
50.0%
Uppercase Letter 41554795
43.5%
Space Separator 3252138
 
3.4%
Other Punctuation 2488411
 
2.6%
Decimal Number 203907
 
0.2%
Dash Punctuation 118757
 
0.1%
Math Symbol 67407
 
0.1%
Open Punctuation 21826
 
< 0.1%
Close Punctuation 21826
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 9218360
19.3%
i 5527392
11.6%
t 5407940
11.3%
l 4019177
8.4%
d 3973723
8.3%
a 3522275
 
7.4%
r 3094034
 
6.5%
k 2106450
 
4.4%
w 2051414
 
4.3%
s 1961602
 
4.1%
Other values (14) 6935783
14.5%
Uppercase Letter
ValueCountFrequency (%)
I 7067894
17.0%
S 5969522
14.4%
D 3603967
8.7%
N 3592566
8.6%
L 3564165
8.6%
E 3553648
8.6%
B 2735144
 
6.6%
R 2602326
 
6.3%
A 2011097
 
4.8%
U 1816876
 
4.4%
Other values (13) 5037590
12.1%
Other Punctuation
ValueCountFrequency (%)
/ 2415789
97.1%
. 46346
 
1.9%
, 22896
 
0.9%
' 3380
 
0.1%
Decimal Number
ValueCountFrequency (%)
3 90401
44.3%
1 56802
27.9%
2 56704
27.8%
Space Separator
ValueCountFrequency (%)
3252138
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118757
100.0%
Math Symbol
ValueCountFrequency (%)
+ 67407
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21826
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21826
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 89372945
93.5%
Common 6174272
 
6.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 9218360
 
10.3%
I 7067894
 
7.9%
S 5969522
 
6.7%
i 5527392
 
6.2%
t 5407940
 
6.1%
l 4019177
 
4.5%
d 3973723
 
4.4%
D 3603967
 
4.0%
N 3592566
 
4.0%
L 3564165
 
4.0%
Other values (37) 37428239
41.9%
Common
ValueCountFrequency (%)
3252138
52.7%
/ 2415789
39.1%
- 118757
 
1.9%
3 90401
 
1.5%
+ 67407
 
1.1%
1 56802
 
0.9%
2 56704
 
0.9%
. 46346
 
0.8%
, 22896
 
0.4%
( 21826
 
0.4%
Other values (2) 25206
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 95547217
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 9218360
 
9.6%
I 7067894
 
7.4%
S 5969522
 
6.2%
i 5527392
 
5.8%
t 5407940
 
5.7%
l 4019177
 
4.2%
d 3973723
 
4.2%
D 3603967
 
3.8%
N 3592566
 
3.8%
L 3564165
 
3.7%
Other values (49) 43602511
45.6%

Landmark
Text

MISSING 

Distinct416
Distinct (%)19.3%
Missing7629562
Missing (%)> 99.9%
Memory size58.2 MiB
2024-04-22T11:44:51.921131image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length32
Median length29
Mean length14.862899
Min length3

Characters and Unicode

Total characters32089
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique234 ?
Unique (%)10.8%

Sample

1st rowJ F K AIRPORT
2nd rowLGA
3rd rowFT TOTTEN
4th rowFLUSHING MEADOWS CORONA PARK
5th rowCITI FIELD
ValueCountFrequency (%)
park 1036
 
19.0%
central 295
 
5.4%
airport 262
 
4.8%
j 156
 
2.9%
f 152
 
2.8%
k 151
 
2.8%
square 119
 
2.2%
prospect 105
 
1.9%
la 100
 
1.8%
guardia 100
 
1.8%
Other values (490) 2984
54.7%
2024-04-22T11:44:52.242660image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 3577
 
11.1%
R 3509
 
10.9%
3307
 
10.3%
E 2197
 
6.8%
N 1844
 
5.7%
P 1822
 
5.7%
T 1808
 
5.6%
O 1666
 
5.2%
I 1608
 
5.0%
L 1510
 
4.7%
Other values (28) 9241
28.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 28666
89.3%
Space Separator 3307
 
10.3%
Decimal Number 109
 
0.3%
Other Punctuation 7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 3577
12.5%
R 3509
12.2%
E 2197
 
7.7%
N 1844
 
6.4%
P 1822
 
6.4%
T 1808
 
6.3%
O 1666
 
5.8%
I 1608
 
5.6%
L 1510
 
5.3%
K 1418
 
4.9%
Other values (16) 7707
26.9%
Decimal Number
ValueCountFrequency (%)
2 21
19.3%
1 15
13.8%
4 15
13.8%
9 14
12.8%
7 11
10.1%
6 8
 
7.3%
5 8
 
7.3%
3 7
 
6.4%
8 6
 
5.5%
0 4
 
3.7%
Space Separator
ValueCountFrequency (%)
3307
100.0%
Other Punctuation
ValueCountFrequency (%)
' 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28666
89.3%
Common 3423
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 3577
12.5%
R 3509
12.2%
E 2197
 
7.7%
N 1844
 
6.4%
P 1822
 
6.4%
T 1808
 
6.3%
O 1666
 
5.8%
I 1608
 
5.6%
L 1510
 
5.3%
K 1418
 
4.9%
Other values (16) 7707
26.9%
Common
ValueCountFrequency (%)
3307
96.6%
2 21
 
0.6%
1 15
 
0.4%
4 15
 
0.4%
9 14
 
0.4%
7 11
 
0.3%
6 8
 
0.2%
5 8
 
0.2%
' 7
 
0.2%
3 7
 
0.2%
Other values (2) 10
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32089
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 3577
 
11.1%
R 3509
 
10.9%
3307
 
10.3%
E 2197
 
6.8%
N 1844
 
5.7%
P 1822
 
5.7%
T 1808
 
5.6%
O 1666
 
5.2%
I 1608
 
5.0%
L 1510
 
4.7%
Other values (28) 9241
28.8%

Facility Type
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing5217452
Missing (%)68.4%
Memory size58.2 MiB
Precinct
2161151 
DSNY Garage
247123 
School
 
5995

Length

Max length11
Median length8
Mean length8.3021117
Min length6

Characters and Unicode

Total characters20043531
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrecinct
2nd rowPrecinct
3rd rowPrecinct
4th rowPrecinct
5th rowPrecinct

Common Values

ValueCountFrequency (%)
Precinct 2161151
28.3%
DSNY Garage 247123
 
3.2%
School 5995
 
0.1%
(Missing) 5217452
68.4%

Length

2024-04-22T11:44:52.352326image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:52.456112image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
precinct 2161151
81.2%
dsny 247123
 
9.3%
garage 247123
 
9.3%
school 5995
 
0.2%

Most occurring characters

ValueCountFrequency (%)
c 4328297
21.6%
e 2408274
12.0%
r 2408274
12.0%
P 2161151
10.8%
i 2161151
10.8%
n 2161151
10.8%
t 2161151
10.8%
a 494246
 
2.5%
S 253118
 
1.3%
G 247123
 
1.2%
Other values (8) 1259595
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16393647
81.8%
Uppercase Letter 3402761
 
17.0%
Space Separator 247123
 
1.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 4328297
26.4%
e 2408274
14.7%
r 2408274
14.7%
i 2161151
13.2%
n 2161151
13.2%
t 2161151
13.2%
a 494246
 
3.0%
g 247123
 
1.5%
o 11990
 
0.1%
h 5995
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2161151
63.5%
S 253118
 
7.4%
G 247123
 
7.3%
N 247123
 
7.3%
Y 247123
 
7.3%
D 247123
 
7.3%
Space Separator
ValueCountFrequency (%)
247123
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19796408
98.8%
Common 247123
 
1.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 4328297
21.9%
e 2408274
12.2%
r 2408274
12.2%
P 2161151
10.9%
i 2161151
10.9%
n 2161151
10.9%
t 2161151
10.9%
a 494246
 
2.5%
S 253118
 
1.3%
G 247123
 
1.2%
Other values (7) 1012472
 
5.1%
Common
ValueCountFrequency (%)
247123
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20043531
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 4328297
21.6%
e 2408274
12.0%
r 2408274
12.0%
P 2161151
10.8%
i 2161151
10.8%
n 2161151
10.8%
t 2161151
10.8%
a 494246
 
2.5%
S 253118
 
1.3%
G 247123
 
1.2%
Other values (8) 1259595
 
6.3%

Status
Categorical

IMBALANCE 

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.2 MiB
Closed
7430449 
Pending
 
69260
Open
 
53839
Assigned
 
50936
In Progress
 
21036
Other values (6)
 
6201

Length

Max length16
Median length6
Mean length6.024083
Min length4

Characters and Unicode

Total characters45974121
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowClosed
2nd rowClosed
3rd rowClosed
4th rowClosed
5th rowClosed

Common Values

ValueCountFrequency (%)
Closed 7430449
97.4%
Pending 69260
 
0.9%
Open 53839
 
0.7%
Assigned 50936
 
0.7%
In Progress 21036
 
0.3%
Started 3233
 
< 0.1%
Email Sent 2903
 
< 0.1%
Unassigned 26
 
< 0.1%
Closed - Testing 19
 
< 0.1%
Draft 13
 
< 0.1%

Length

2024-04-22T11:44:52.553235image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
closed 7430468
97.1%
pending 69260
 
0.9%
open 53839
 
0.7%
assigned 50936
 
0.7%
in 21036
 
0.3%
progress 21036
 
0.3%
started 3233
 
< 0.1%
email 2903
 
< 0.1%
sent 2903
 
< 0.1%
unassigned 26
 
< 0.1%
Other values (4) 58
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 7631734
16.6%
s 7574490
16.5%
d 7553930
16.4%
o 7451504
16.2%
l 7433371
16.2%
C 7430468
16.2%
n 267312
 
0.6%
g 141277
 
0.3%
i 123158
 
0.3%
P 90296
 
0.2%
Other values (17) 276581
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38294446
83.3%
Uppercase Letter 7655679
 
16.7%
Space Separator 23977
 
0.1%
Dash Punctuation 19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 7631734
19.9%
s 7574490
19.8%
d 7553930
19.7%
o 7451504
19.5%
l 7433371
19.4%
n 267312
 
0.7%
g 141277
 
0.4%
i 123158
 
0.3%
p 53846
 
0.1%
r 45318
 
0.1%
Other values (5) 18506
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
C 7430468
97.1%
P 90296
 
1.2%
O 53839
 
0.7%
A 50936
 
0.7%
I 21036
 
0.3%
S 6136
 
0.1%
E 2903
 
< 0.1%
U 33
 
< 0.1%
T 19
 
< 0.1%
D 13
 
< 0.1%
Space Separator
ValueCountFrequency (%)
23977
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 45950125
99.9%
Common 23996
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 7631734
16.6%
s 7574490
16.5%
d 7553930
16.4%
o 7451504
16.2%
l 7433371
16.2%
C 7430468
16.2%
n 267312
 
0.6%
g 141277
 
0.3%
i 123158
 
0.3%
P 90296
 
0.2%
Other values (15) 252585
 
0.5%
Common
ValueCountFrequency (%)
23977
99.9%
- 19
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45974121
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 7631734
16.6%
s 7574490
16.5%
d 7553930
16.4%
o 7451504
16.2%
l 7433371
16.2%
C 7430468
16.2%
n 267312
 
0.6%
g 141277
 
0.3%
i 123158
 
0.3%
P 90296
 
0.2%
Other values (17) 276581
 
0.6%
Distinct81
Distinct (%)< 0.1%
Missing2238
Missing (%)< 0.1%
Memory size58.2 MiB
2024-04-22T11:44:52.661847image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length801
Median length758
Mean length10.992296
Min length4

Characters and Unicode

Total characters83865533
Distinct characters67
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row01 BROOKLYN
2nd row03 STATEN ISLAND
3rd row11 QUEENS
4th row12 BRONX
5th row06 BROOKLYN
ValueCountFrequency (%)
brooklyn 2373911
15.2%
queens 1814028
 
11.6%
manhattan 1532407
 
9.8%
bronx 1384810
 
8.8%
12 668515
 
4.3%
01 631716
 
4.0%
03 604545
 
3.9%
05 586667
 
3.7%
unspecified 564638
 
3.6%
07 545379
 
3.5%
Other values (149) 4950638
31.6%
2024-04-22T11:44:52.892934image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 9433879
 
11.2%
8027768
 
9.6%
O 6132656
 
7.3%
A 5393529
 
6.4%
0 5199786
 
6.2%
E 4026263
 
4.8%
T 3861112
 
4.6%
R 3758740
 
4.5%
B 3758728
 
4.5%
1 3481625
 
4.2%
Other values (57) 30791447
36.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 55933885
66.7%
Decimal Number 14256489
 
17.0%
Space Separator 8027768
 
9.6%
Lowercase Letter 5647040
 
6.7%
Other Punctuation 314
 
< 0.1%
Dash Punctuation 15
 
< 0.1%
Open Punctuation 11
 
< 0.1%
Close Punctuation 8
 
< 0.1%
Control 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 9433879
16.9%
O 6132656
11.0%
A 5393529
9.6%
E 4026263
 
7.2%
T 3861112
 
6.9%
R 3758740
 
6.7%
B 3758728
 
6.7%
L 2772056
 
5.0%
S 2610335
 
4.7%
U 2378689
 
4.3%
Other values (14) 11807898
21.1%
Lowercase Letter
ValueCountFrequency (%)
e 1129366
20.0%
i 1129329
20.0%
n 564690
10.0%
c 564677
10.0%
d 564670
10.0%
s 564665
10.0%
p 564659
10.0%
f 564655
10.0%
o 66
 
< 0.1%
r 49
 
< 0.1%
Other values (12) 214
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 5199786
36.5%
1 3481625
24.4%
2 1133111
 
7.9%
3 796173
 
5.6%
4 726700
 
5.1%
5 708310
 
5.0%
7 706904
 
5.0%
8 574226
 
4.0%
9 501247
 
3.5%
6 428407
 
3.0%
Other Punctuation
ValueCountFrequency (%)
, 174
55.4%
" 59
 
18.8%
/ 30
 
9.6%
: 28
 
8.9%
. 20
 
6.4%
\ 3
 
1.0%
Space Separator
ValueCountFrequency (%)
8027768
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%
Control
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 61580925
73.4%
Common 22284608
 
26.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 9433879
15.3%
O 6132656
 
10.0%
A 5393529
 
8.8%
E 4026263
 
6.5%
T 3861112
 
6.3%
R 3758740
 
6.1%
B 3758728
 
6.1%
L 2772056
 
4.5%
S 2610335
 
4.2%
U 2378689
 
3.9%
Other values (36) 17454938
28.3%
Common
ValueCountFrequency (%)
8027768
36.0%
0 5199786
23.3%
1 3481625
15.6%
2 1133111
 
5.1%
3 796173
 
3.6%
4 726700
 
3.3%
5 708310
 
3.2%
7 706904
 
3.2%
8 574226
 
2.6%
9 501247
 
2.2%
Other values (11) 428758
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 83865533
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 9433879
 
11.2%
8027768
 
9.6%
O 6132656
 
7.3%
A 5393529
 
6.4%
0 5199786
 
6.2%
E 4026263
 
4.8%
T 3861112
 
4.6%
R 3758740
 
4.5%
B 3758728
 
4.5%
1 3481625
 
4.2%
Other values (57) 30791447
36.7%

Borough
Categorical

Distinct8
Distinct (%)< 0.1%
Missing2238
Missing (%)< 0.1%
Memory size58.2 MiB
BROOKLYN
2373911 
QUEENS
1811212 
MANHATTAN
1537823 
BRONX
1382211 
STATEN ISLAND
398135 
Other values (3)
 
126191

Length

Max length13
Median length11
Mean length7.4938049
Min length4

Characters and Unicode

Total characters57173857
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowBROOKLYN
2nd rowSTATEN ISLAND
3rd rowQUEENS
4th rowBRONX
5th rowBROOKLYN

Common Values

ValueCountFrequency (%)
BROOKLYN 2373911
31.1%
QUEENS 1811212
23.7%
MANHATTAN 1537823
20.2%
BRONX 1382211
18.1%
STATEN ISLAND 398135
 
5.2%
Unspecified 126188
 
1.7%
2016 2
 
< 0.1%
2017 1
 
< 0.1%
(Missing) 2238
 
< 0.1%

Length

2024-04-22T11:44:52.995040image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:53.087160image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 2373911
29.6%
queens 1811212
22.6%
manhattan 1537823
19.2%
bronx 1382211
17.2%
staten 398135
 
5.0%
island 398135
 
5.0%
unspecified 126188
 
1.6%
2016 2
 
< 0.1%
2017 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 9439250
16.5%
O 6130033
10.7%
A 5409739
9.5%
E 4020559
 
7.0%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.8%
S 2607482
 
4.6%
Y 2373911
 
4.2%
Other values (22) 13036677
22.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 55513830
97.1%
Lowercase Letter 1261880
 
2.2%
Space Separator 398135
 
0.7%
Decimal Number 12
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 9439250
17.0%
O 6130033
11.0%
A 5409739
9.7%
E 4020559
 
7.2%
T 3871916
 
7.0%
B 3756122
 
6.8%
R 3756122
 
6.8%
L 2772046
 
5.0%
S 2607482
 
4.7%
Y 2373911
 
4.3%
Other values (8) 11376650
20.5%
Lowercase Letter
ValueCountFrequency (%)
e 252376
20.0%
i 252376
20.0%
n 126188
10.0%
s 126188
10.0%
p 126188
10.0%
c 126188
10.0%
f 126188
10.0%
d 126188
10.0%
Decimal Number
ValueCountFrequency (%)
2 3
25.0%
0 3
25.0%
1 3
25.0%
6 2
16.7%
7 1
 
8.3%
Space Separator
ValueCountFrequency (%)
398135
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 56775710
99.3%
Common 398147
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 9439250
16.6%
O 6130033
10.8%
A 5409739
9.5%
E 4020559
 
7.1%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.9%
S 2607482
 
4.6%
Y 2373911
 
4.2%
Other values (16) 12638530
22.3%
Common
ValueCountFrequency (%)
398135
> 99.9%
2 3
 
< 0.1%
0 3
 
< 0.1%
1 3
 
< 0.1%
6 2
 
< 0.1%
7 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57173857
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 9439250
16.5%
O 6130033
10.7%
A 5409739
9.5%
E 4020559
 
7.0%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.8%
S 2607482
 
4.6%
Y 2373911
 
4.2%
Other values (22) 13036677
22.8%
Distinct5
Distinct (%)< 0.1%
Missing3
Missing (%)< 0.1%
Memory size58.2 MiB
PHONE
4054186 
ONLINE
1597490 
UNKNOWN
1051355 
MOBILE
838350 
OTHER
 
90337

Length

Max length7
Median length5
Mean length5.5946957
Min length5

Characters and Unicode

Total characters42697140
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPHONE
2nd rowPHONE
3rd rowPHONE
4th rowPHONE
5th rowPHONE

Common Values

ValueCountFrequency (%)
PHONE 4054186
53.1%
ONLINE 1597490
 
20.9%
UNKNOWN 1051355
 
13.8%
MOBILE 838350
 
11.0%
OTHER 90337
 
1.2%
(Missing) 3
 
< 0.1%

Length

2024-04-22T11:44:53.204441image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:53.297566image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
phone 4054186
53.1%
online 1597490
 
20.9%
unknown 1051355
 
13.8%
mobile 838350
 
11.0%
other 90337
 
1.2%

Most occurring characters

ValueCountFrequency (%)
N 10403231
24.4%
O 7631718
17.9%
E 6580363
15.4%
H 4144523
 
9.7%
P 4054186
 
9.5%
L 2435840
 
5.7%
I 2435840
 
5.7%
U 1051355
 
2.5%
K 1051355
 
2.5%
W 1051355
 
2.5%
Other values (4) 1857374
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 42697140
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 10403231
24.4%
O 7631718
17.9%
E 6580363
15.4%
H 4144523
 
9.7%
P 4054186
 
9.5%
L 2435840
 
5.7%
I 2435840
 
5.7%
U 1051355
 
2.5%
K 1051355
 
2.5%
W 1051355
 
2.5%
Other values (4) 1857374
 
4.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 42697140
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 10403231
24.4%
O 7631718
17.9%
E 6580363
15.4%
H 4144523
 
9.7%
P 4054186
 
9.5%
L 2435840
 
5.7%
I 2435840
 
5.7%
U 1051355
 
2.5%
K 1051355
 
2.5%
W 1051355
 
2.5%
Other values (4) 1857374
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42697140
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 10403231
24.4%
O 7631718
17.9%
E 6580363
15.4%
H 4144523
 
9.7%
P 4054186
 
9.5%
L 2435840
 
5.7%
I 2435840
 
5.7%
U 1051355
 
2.5%
K 1051355
 
2.5%
W 1051355
 
2.5%
Other values (4) 1857374
 
4.4%
Distinct3056
Distinct (%)< 0.1%
Missing3
Missing (%)< 0.1%
Memory size58.2 MiB
2024-04-22T11:44:53.541286image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length95
Median length11
Mean length11.09074
Min length6

Characters and Unicode

Total characters84641403
Distinct characters70
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique466 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 7572739
97.2%
park 33971
 
0.4%
17810
 
0.2%
playground 15042
 
0.2%
school 9027
 
0.1%
ps 4639
 
0.1%
center 2864
 
< 0.1%
central 2553
 
< 0.1%
beach 2406
 
< 0.1%
and 2272
 
< 0.1%
Other values (2963) 128820
 
1.7%
2024-04-22T11:44:53.902633image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 15230538
18.0%
i 15191095
17.9%
n 7648417
9.0%
d 7613055
9.0%
s 7605818
9.0%
c 7601552
9.0%
p 7579073
9.0%
f 7577850
9.0%
U 7573384
8.9%
160559
 
0.2%
Other values (60) 860062
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 76671347
90.6%
Uppercase Letter 7772935
 
9.2%
Space Separator 160559
 
0.2%
Dash Punctuation 18132
 
< 0.1%
Decimal Number 16407
 
< 0.1%
Other Punctuation 1751
 
< 0.1%
Open Punctuation 136
 
< 0.1%
Close Punctuation 136
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 15230538
19.9%
i 15191095
19.8%
n 7648417
10.0%
d 7613055
9.9%
s 7605818
9.9%
c 7601552
9.9%
p 7579073
9.9%
f 7577850
9.9%
a 114472
 
0.1%
r 113107
 
0.1%
Other values (16) 396370
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
U 7573384
97.4%
P 62086
 
0.8%
S 26063
 
0.3%
C 16200
 
0.2%
B 11219
 
0.1%
M 9980
 
0.1%
H 9643
 
0.1%
R 9623
 
0.1%
F 6628
 
0.1%
A 6616
 
0.1%
Other values (16) 41493
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 3798
23.1%
2 2067
12.6%
3 1611
9.8%
7 1548
9.4%
4 1426
 
8.7%
9 1302
 
7.9%
0 1266
 
7.7%
8 1263
 
7.7%
6 1077
 
6.6%
5 1049
 
6.4%
Other Punctuation
ValueCountFrequency (%)
' 1246
71.2%
. 394
 
22.5%
, 71
 
4.1%
: 40
 
2.3%
Space Separator
ValueCountFrequency (%)
160559
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18132
100.0%
Open Punctuation
ValueCountFrequency (%)
( 136
100.0%
Close Punctuation
ValueCountFrequency (%)
) 136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 84444282
99.8%
Common 197121
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 15230538
18.0%
i 15191095
18.0%
n 7648417
9.1%
d 7613055
9.0%
s 7605818
9.0%
c 7601552
9.0%
p 7579073
9.0%
f 7577850
9.0%
U 7573384
9.0%
a 114472
 
0.1%
Other values (42) 709028
 
0.8%
Common
ValueCountFrequency (%)
160559
81.5%
- 18132
 
9.2%
1 3798
 
1.9%
2 2067
 
1.0%
3 1611
 
0.8%
7 1548
 
0.8%
4 1426
 
0.7%
9 1302
 
0.7%
0 1266
 
0.6%
8 1263
 
0.6%
Other values (8) 4149
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84641403
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 15230538
18.0%
i 15191095
17.9%
n 7648417
9.0%
d 7613055
9.0%
s 7605818
9.0%
c 7601552
9.0%
p 7579073
9.0%
f 7577850
9.0%
U 7573384
8.9%
160559
 
0.2%
Other values (60) 860062
 
1.0%

Park Borough
Categorical

Distinct6
Distinct (%)< 0.1%
Missing2241
Missing (%)< 0.1%
Memory size58.2 MiB
BROOKLYN
2373911 
QUEENS
1811212 
MANHATTAN
1537823 
BRONX
1382211 
STATEN ISLAND
398135 

Length

Max length13
Median length11
Mean length7.4938063
Min length5

Characters and Unicode

Total characters57173845
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowSTATEN ISLAND
3rd rowQUEENS
4th rowBRONX
5th rowBROOKLYN

Common Values

ValueCountFrequency (%)
BROOKLYN 2373911
31.1%
QUEENS 1811212
23.7%
MANHATTAN 1537823
20.2%
BRONX 1382211
18.1%
STATEN ISLAND 398135
 
5.2%
Unspecified 126188
 
1.7%
(Missing) 2241
 
< 0.1%

Length

2024-04-22T11:44:54.018254image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:54.117872image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 2373911
29.6%
queens 1811212
22.6%
manhattan 1537823
19.2%
bronx 1382211
17.2%
staten 398135
 
5.0%
island 398135
 
5.0%
unspecified 126188
 
1.6%

Most occurring characters

ValueCountFrequency (%)
N 9439250
16.5%
O 6130033
10.7%
A 5409739
9.5%
E 4020559
 
7.0%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.8%
S 2607482
 
4.6%
K 2373911
 
4.2%
Other values (17) 13036665
22.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 55513830
97.1%
Lowercase Letter 1261880
 
2.2%
Space Separator 398135
 
0.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 9439250
17.0%
O 6130033
11.0%
A 5409739
9.7%
E 4020559
 
7.2%
T 3871916
 
7.0%
B 3756122
 
6.8%
R 3756122
 
6.8%
L 2772046
 
5.0%
S 2607482
 
4.7%
K 2373911
 
4.3%
Other values (8) 11376650
20.5%
Lowercase Letter
ValueCountFrequency (%)
e 252376
20.0%
i 252376
20.0%
n 126188
10.0%
s 126188
10.0%
p 126188
10.0%
c 126188
10.0%
f 126188
10.0%
d 126188
10.0%
Space Separator
ValueCountFrequency (%)
398135
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 56775710
99.3%
Common 398135
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 9439250
16.6%
O 6130033
10.8%
A 5409739
9.5%
E 4020559
 
7.1%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.9%
S 2607482
 
4.6%
K 2373911
 
4.2%
Other values (16) 12638530
22.3%
Common
ValueCountFrequency (%)
398135
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57173845
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 9439250
16.5%
O 6130033
10.7%
A 5409739
9.5%
E 4020559
 
7.0%
T 3871916
 
6.8%
B 3756122
 
6.6%
R 3756122
 
6.6%
L 2772046
 
4.8%
S 2607482
 
4.6%
K 2373911
 
4.2%
Other values (17) 13036665
22.8%

Vehicle Type
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)2.6%
Missing7631568
Missing (%)> 99.9%
Memory size58.2 MiB
Car Service
140 
Green Taxi
 
10
Commuter Van
 
2
Ambulette / Paratransit
 
1

Length

Max length23
Median length11
Mean length11.026144
Min length10

Characters and Unicode

Total characters1687
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.7%

Sample

1st rowCar Service
2nd rowGreen Taxi
3rd rowCar Service
4th rowCar Service
5th rowCar Service

Common Values

ValueCountFrequency (%)
Car Service 140
 
< 0.1%
Green Taxi 10
 
< 0.1%
Commuter Van 2
 
< 0.1%
Ambulette / Paratransit 1
 
< 0.1%
(Missing) 7631568
> 99.9%

Length

2024-04-22T11:44:54.226421image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:54.313298image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
car 140
45.6%
service 140
45.6%
green 10
 
3.3%
taxi 10
 
3.3%
commuter 2
 
0.7%
van 2
 
0.7%
ambulette 1
 
0.3%
1
 
0.3%
paratransit 1
 
0.3%

Most occurring characters

ValueCountFrequency (%)
e 304
18.0%
r 294
17.4%
a 155
9.2%
154
9.1%
i 151
9.0%
C 142
8.4%
S 140
8.3%
v 140
8.3%
c 140
8.3%
n 13
 
0.8%
Other values (14) 54
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1226
72.7%
Uppercase Letter 306
 
18.1%
Space Separator 154
 
9.1%
Other Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 304
24.8%
r 294
24.0%
a 155
12.6%
i 151
12.3%
v 140
11.4%
c 140
11.4%
n 13
 
1.1%
x 10
 
0.8%
t 6
 
0.5%
m 5
 
0.4%
Other values (5) 8
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
C 142
46.4%
S 140
45.8%
T 10
 
3.3%
G 10
 
3.3%
V 2
 
0.7%
A 1
 
0.3%
P 1
 
0.3%
Space Separator
ValueCountFrequency (%)
154
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1532
90.8%
Common 155
 
9.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 304
19.8%
r 294
19.2%
a 155
10.1%
i 151
9.9%
C 142
9.3%
S 140
9.1%
v 140
9.1%
c 140
9.1%
n 13
 
0.8%
T 10
 
0.7%
Other values (12) 43
 
2.8%
Common
ValueCountFrequency (%)
154
99.4%
/ 1
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1687
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 304
18.0%
r 294
17.4%
a 155
9.2%
154
9.1%
i 151
9.0%
C 142
8.4%
S 140
8.3%
v 140
8.3%
c 140
8.3%
n 13
 
0.8%
Other values (14) 54
 
3.2%

Taxi Company Borough
Categorical

MISSING 

Distinct5
Distinct (%)0.1%
Missing7625165
Missing (%)99.9%
Memory size58.2 MiB
MANHATTAN
1928 
BROOKLYN
1638 
QUEENS
1506 
BRONX
1194 
STATEN ISLAND
290 

Length

Max length13
Median length9
Mean length7.509457
Min length5

Characters and Unicode

Total characters49232
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBRONX
2nd rowBROOKLYN
3rd rowSTATEN ISLAND
4th rowQUEENS
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
MANHATTAN 1928
 
< 0.1%
BROOKLYN 1638
 
< 0.1%
QUEENS 1506
 
< 0.1%
BRONX 1194
 
< 0.1%
STATEN ISLAND 290
 
< 0.1%
(Missing) 7625165
99.9%

Length

2024-04-22T11:44:54.415126image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:54.509232image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
manhattan 1928
28.2%
brooklyn 1638
23.9%
queens 1506
22.0%
bronx 1194
17.4%
staten 290
 
4.2%
island 290
 
4.2%

Most occurring characters

ValueCountFrequency (%)
N 8774
17.8%
A 6364
12.9%
O 4470
9.1%
T 4436
9.0%
E 3302
 
6.7%
B 2832
 
5.8%
R 2832
 
5.8%
S 2086
 
4.2%
M 1928
 
3.9%
L 1928
 
3.9%
Other values (9) 10280
20.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 48942
99.4%
Space Separator 290
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 8774
17.9%
A 6364
13.0%
O 4470
9.1%
T 4436
9.1%
E 3302
 
6.7%
B 2832
 
5.8%
R 2832
 
5.8%
S 2086
 
4.3%
M 1928
 
3.9%
L 1928
 
3.9%
Other values (8) 9990
20.4%
Space Separator
ValueCountFrequency (%)
290
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 48942
99.4%
Common 290
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 8774
17.9%
A 6364
13.0%
O 4470
9.1%
T 4436
9.1%
E 3302
 
6.7%
B 2832
 
5.8%
R 2832
 
5.8%
S 2086
 
4.3%
M 1928
 
3.9%
L 1928
 
3.9%
Other values (8) 9990
20.4%
Common
ValueCountFrequency (%)
290
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 49232
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 8774
17.8%
A 6364
12.9%
O 4470
9.1%
T 4436
9.0%
E 3302
 
6.7%
B 2832
 
5.8%
R 2832
 
5.8%
S 2086
 
4.2%
M 1928
 
3.9%
L 1928
 
3.9%
Other values (9) 10280
20.9%

Taxi Pick Up Location
Text

MISSING 

Distinct2227
Distinct (%)5.7%
Missing7592481
Missing (%)99.5%
Memory size58.2 MiB
2024-04-22T11:44:54.745029image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length55
Median length5
Mean length8.825739
Min length5

Characters and Unicode

Total characters346322
Distinct characters58
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1955 ?
Unique (%)5.0%

Sample

1st rowJFK Airport
2nd rowOther
3rd rowLa Guardia Airport
4th rowOther
5th rowOther
ValueCountFrequency (%)
other 28137
45.1%
airport 6679
 
10.7%
jfk 4198
 
6.7%
la 2482
 
4.0%
guardia 2482
 
4.0%
street 1909
 
3.1%
avenue 1376
 
2.2%
manhattan 1031
 
1.7%
station 1009
 
1.6%
and 973
 
1.6%
Other values (1441) 12151
19.5%
2024-04-22T11:44:55.105247image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 46572
13.4%
t 38616
 
11.2%
e 30493
 
8.8%
O 30393
 
8.8%
h 28471
 
8.2%
24708
 
7.1%
A 15424
 
4.5%
i 11034
 
3.2%
E 10101
 
2.9%
a 9565
 
2.8%
Other values (48) 100945
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 193734
55.9%
Uppercase Letter 118624
34.3%
Space Separator 24708
 
7.1%
Decimal Number 8434
 
2.4%
Dash Punctuation 805
 
0.2%
Other Punctuation 17
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 30393
25.6%
A 15424
13.0%
E 10101
 
8.5%
T 8814
 
7.4%
N 7672
 
6.5%
S 5426
 
4.6%
K 4877
 
4.1%
F 4502
 
3.8%
J 4235
 
3.6%
R 3833
 
3.2%
Other values (16) 23347
19.7%
Lowercase Letter
ValueCountFrequency (%)
r 46572
24.0%
t 38616
19.9%
e 30493
15.7%
h 28471
14.7%
i 11034
 
5.7%
a 9565
 
4.9%
o 9173
 
4.7%
p 6679
 
3.4%
n 3753
 
1.9%
u 3150
 
1.6%
Other values (8) 6228
 
3.2%
Decimal Number
ValueCountFrequency (%)
1 1986
23.5%
2 1114
13.2%
3 906
10.7%
0 894
10.6%
5 743
 
8.8%
4 739
 
8.8%
7 587
 
7.0%
6 574
 
6.8%
8 472
 
5.6%
9 419
 
5.0%
Other Punctuation
ValueCountFrequency (%)
/ 14
82.4%
. 3
 
17.6%
Space Separator
ValueCountFrequency (%)
24708
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 805
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 312358
90.2%
Common 33964
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 46572
14.9%
t 38616
12.4%
e 30493
 
9.8%
O 30393
 
9.7%
h 28471
 
9.1%
A 15424
 
4.9%
i 11034
 
3.5%
E 10101
 
3.2%
a 9565
 
3.1%
o 9173
 
2.9%
Other values (34) 82516
26.4%
Common
ValueCountFrequency (%)
24708
72.7%
1 1986
 
5.8%
2 1114
 
3.3%
3 906
 
2.7%
0 894
 
2.6%
- 805
 
2.4%
5 743
 
2.2%
4 739
 
2.2%
7 587
 
1.7%
6 574
 
1.7%
Other values (4) 908
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 346322
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 46572
13.4%
t 38616
 
11.2%
e 30493
 
8.8%
O 30393
 
8.8%
h 28471
 
8.2%
24708
 
7.1%
A 15424
 
4.5%
i 11034
 
3.2%
E 10101
 
2.9%
a 9565
 
2.8%
Other values (48) 100945
29.1%

Bridge Highway Name
Text

MISSING 

Distinct81
Distinct (%)0.6%
Missing7618111
Missing (%)99.8%
Memory size58.2 MiB
2024-04-22T11:44:55.299852image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length42
Median length32
Mean length16.51543
Min length6

Characters and Unicode

Total characters224775
Distinct characters63
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowHenry Hudson Pkwy/Rt 9A
2nd rowGrand Central Pkwy
3rd rowBelt Pkwy
4th rowBQE/Gowanus Expwy
5th rowHenry Hudson Pkwy/Rt 9A
ValueCountFrequency (%)
expwy 6319
 
16.7%
pkwy 3699
 
9.8%
island 2122
 
5.6%
bqe/gowanus 1526
 
4.0%
dr 1316
 
3.5%
belt 1251
 
3.3%
cross 1221
 
3.2%
br 1151
 
3.1%
central 1042
 
2.8%
grand 1011
 
2.7%
Other values (127) 17069
45.2%
2024-04-22T11:44:55.627609image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24117
 
10.7%
n 14736
 
6.6%
r 13080
 
5.8%
w 12840
 
5.7%
y 12601
 
5.6%
a 11602
 
5.2%
e 10575
 
4.7%
o 9841
 
4.4%
s 9406
 
4.2%
E 7967
 
3.5%
Other values (53) 98010
43.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 149951
66.7%
Uppercase Letter 45203
 
20.1%
Space Separator 24117
 
10.7%
Other Punctuation 3468
 
1.5%
Decimal Number 1708
 
0.8%
Dash Punctuation 328
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 14736
 
9.8%
r 13080
 
8.7%
w 12840
 
8.6%
y 12601
 
8.4%
a 11602
 
7.7%
e 10575
 
7.1%
o 9841
 
6.6%
s 9406
 
6.3%
t 7313
 
4.9%
x 7261
 
4.8%
Other values (15) 40696
27.1%
Uppercase Letter
ValueCountFrequency (%)
E 7967
17.6%
B 5468
12.1%
P 4950
11.0%
R 3159
 
7.0%
D 2926
 
6.5%
G 2601
 
5.8%
C 2542
 
5.6%
I 2460
 
5.4%
W 1935
 
4.3%
Q 1775
 
3.9%
Other values (14) 9420
20.8%
Decimal Number
ValueCountFrequency (%)
9 816
47.8%
5 328
19.2%
1 246
 
14.4%
2 91
 
5.3%
4 76
 
4.4%
8 56
 
3.3%
3 54
 
3.2%
0 22
 
1.3%
7 19
 
1.1%
Other Punctuation
ValueCountFrequency (%)
/ 3191
92.0%
. 235
 
6.8%
, 42
 
1.2%
Space Separator
ValueCountFrequency (%)
24117
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 328
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 195154
86.8%
Common 29621
 
13.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 14736
 
7.6%
r 13080
 
6.7%
w 12840
 
6.6%
y 12601
 
6.5%
a 11602
 
5.9%
e 10575
 
5.4%
o 9841
 
5.0%
s 9406
 
4.8%
E 7967
 
4.1%
t 7313
 
3.7%
Other values (39) 85193
43.7%
Common
ValueCountFrequency (%)
24117
81.4%
/ 3191
 
10.8%
9 816
 
2.8%
- 328
 
1.1%
5 328
 
1.1%
1 246
 
0.8%
. 235
 
0.8%
2 91
 
0.3%
4 76
 
0.3%
8 56
 
0.2%
Other values (4) 137
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 224775
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24117
 
10.7%
n 14736
 
6.6%
r 13080
 
5.8%
w 12840
 
5.7%
y 12601
 
5.6%
a 11602
 
5.2%
e 10575
 
4.7%
o 9841
 
4.4%
s 9406
 
4.2%
E 7967
 
3.5%
Other values (53) 98010
43.6%

Bridge Highway Direction
Categorical

MISSING 

Distinct50
Distinct (%)0.4%
Missing7618125
Missing (%)99.8%
Memory size58.2 MiB
North/Bronx Bound
1306 
East/Long Island Bound
1182 
East/Queens Bound
985 
West/Staten Island Bound
 
709
West/Brooklyn Bound
 
596
Other values (45)
8818 

Length

Max length30
Median length24
Mean length19.149382
Min length9

Characters and Unicode

Total characters260355
Distinct characters47
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNorth/Bronx Bound
2nd rowEast/Long Island Bound
3rd rowEast/Queens Bound
4th rowEast/Bronx Bound
5th rowSouth/Downtown

Common Values

ValueCountFrequency (%)
North/Bronx Bound 1306
 
< 0.1%
East/Long Island Bound 1182
 
< 0.1%
East/Queens Bound 985
 
< 0.1%
West/Staten Island Bound 709
 
< 0.1%
West/Brooklyn Bound 596
 
< 0.1%
West/Manhattan Bound 493
 
< 0.1%
West/Toward Triborough Br 478
 
< 0.1%
Northbound/Uptown 470
 
< 0.1%
South/Downtown 468
 
< 0.1%
North/Westchester County Bound 454
 
< 0.1%
Other values (40) 6455
 
0.1%
(Missing) 7618125
99.8%

Length

2024-04-22T11:44:55.744004image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bound 8729
28.5%
island 2212
 
7.2%
br 1899
 
6.2%
north/bronx 1306
 
4.3%
east/long 1182
 
3.9%
triborough 1069
 
3.5%
east/queens 985
 
3.2%
west/staten 709
 
2.3%
to 613
 
2.0%
west/brooklyn 596
 
1.9%
Other values (57) 11357
37.0%

Most occurring characters

ValueCountFrequency (%)
o 32147
 
12.3%
n 25320
 
9.7%
t 20240
 
7.8%
u 18276
 
7.0%
17061
 
6.6%
d 15101
 
5.8%
B 13717
 
5.3%
r 13165
 
5.1%
s 11767
 
4.5%
/ 11178
 
4.3%
Other values (37) 82383
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 187774
72.1%
Uppercase Letter 43230
 
16.6%
Space Separator 17061
 
6.6%
Other Punctuation 11178
 
4.3%
Close Punctuation 556
 
0.2%
Open Punctuation 556
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 32147
17.1%
n 25320
13.5%
t 20240
10.8%
u 18276
9.7%
d 15101
8.0%
r 13165
7.0%
s 11767
 
6.3%
a 10573
 
5.6%
h 9653
 
5.1%
e 9148
 
4.9%
Other values (12) 22384
11.9%
Uppercase Letter
ValueCountFrequency (%)
B 13717
31.7%
S 4217
 
9.8%
W 3894
 
9.0%
N 3725
 
8.6%
E 3596
 
8.3%
T 3407
 
7.9%
I 2212
 
5.1%
L 1492
 
3.5%
Q 1333
 
3.1%
M 961
 
2.2%
Other values (11) 4676
 
10.8%
Space Separator
ValueCountFrequency (%)
17061
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 11178
100.0%
Close Punctuation
ValueCountFrequency (%)
) 556
100.0%
Open Punctuation
ValueCountFrequency (%)
( 556
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 231004
88.7%
Common 29351
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 32147
13.9%
n 25320
11.0%
t 20240
 
8.8%
u 18276
 
7.9%
d 15101
 
6.5%
B 13717
 
5.9%
r 13165
 
5.7%
s 11767
 
5.1%
a 10573
 
4.6%
h 9653
 
4.2%
Other values (33) 61045
26.4%
Common
ValueCountFrequency (%)
17061
58.1%
/ 11178
38.1%
) 556
 
1.9%
( 556
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 260355
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 32147
 
12.3%
n 25320
 
9.7%
t 20240
 
7.8%
u 18276
 
7.0%
17061
 
6.6%
d 15101
 
5.8%
B 13717
 
5.3%
r 13165
 
5.1%
s 11767
 
4.5%
/ 11178
 
4.3%
Other values (37) 82383
31.6%

Road Ramp
Categorical

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing7618286
Missing (%)99.8%
Memory size58.2 MiB
Roadway
9716 
Ramp
3719 

Length

Max length7
Median length7
Mean length6.1695571
Min length4

Characters and Unicode

Total characters82888
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRamp
2nd rowRamp
3rd rowRoadway
4th rowRoadway
5th rowRamp

Common Values

ValueCountFrequency (%)
Roadway 9716
 
0.1%
Ramp 3719
 
< 0.1%
(Missing) 7618286
99.8%

Length

2024-04-22T11:44:55.840143image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T11:44:55.917396image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
roadway 9716
72.3%
ramp 3719
 
27.7%

Most occurring characters

ValueCountFrequency (%)
a 23151
27.9%
R 13435
16.2%
o 9716
11.7%
d 9716
11.7%
w 9716
11.7%
y 9716
11.7%
m 3719
 
4.5%
p 3719
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 69453
83.8%
Uppercase Letter 13435
 
16.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 23151
33.3%
o 9716
14.0%
d 9716
14.0%
w 9716
14.0%
y 9716
14.0%
m 3719
 
5.4%
p 3719
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
R 13435
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 82888
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 23151
27.9%
R 13435
16.2%
o 9716
11.7%
d 9716
11.7%
w 9716
11.7%
y 9716
11.7%
m 3719
 
4.5%
p 3719
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 82888
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 23151
27.9%
R 13435
16.2%
o 9716
11.7%
d 9716
11.7%
w 9716
11.7%
y 9716
11.7%
m 3719
 
4.5%
p 3719
 
4.5%
Distinct3879
Distinct (%)23.8%
Missing7615395
Missing (%)99.8%
Memory size58.2 MiB
2024-04-22T11:44:56.097432image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length100
Median length77
Mean length41.255666
Min length4

Characters and Unicode

Total characters673540
Distinct characters67
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2972 ?
Unique (%)18.2%

Sample

1st rowW 96 St (Exit 11)
2nd row1-1-1514825238
3rd row1-1-1514671280
4th rowBrooklyn-Queens Expwy (I-278) (Exit 4)
5th rowFlatbush Ave (Exit 11N) - Gateway National Recreation Area (Exit 12)
ValueCountFrequency (%)
exit 18059
 
14.4%
11155
 
8.9%
ave 5916
 
4.7%
st 5111
 
4.1%
blvd 3342
 
2.7%
expwy 3175
 
2.5%
pkwy 2349
 
1.9%
east 2171
 
1.7%
ny 1322
 
1.1%
island 1220
 
1.0%
Other values (3482) 71485
57.0%
2024-04-22T11:44:56.420221image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
109027
 
16.2%
t 41602
 
6.2%
i 28370
 
4.2%
E 26833
 
4.0%
e 26197
 
3.9%
) 22880
 
3.4%
( 22880
 
3.4%
x 22617
 
3.4%
1 21606
 
3.2%
- 20194
 
3.0%
Other values (57) 331334
49.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 305280
45.3%
Space Separator 109027
 
16.2%
Uppercase Letter 98864
 
14.7%
Decimal Number 88247
 
13.1%
Close Punctuation 22880
 
3.4%
Open Punctuation 22880
 
3.4%
Dash Punctuation 20194
 
3.0%
Other Punctuation 6168
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 41602
13.6%
i 28370
 
9.3%
e 26197
 
8.6%
x 22617
 
7.4%
a 19831
 
6.5%
n 19491
 
6.4%
r 18384
 
6.0%
o 17139
 
5.6%
s 14946
 
4.9%
d 13407
 
4.4%
Other values (15) 83296
27.3%
Uppercase Letter
ValueCountFrequency (%)
E 26833
27.1%
B 9618
 
9.7%
A 9560
 
9.7%
S 8446
 
8.5%
W 5507
 
5.6%
N 4902
 
5.0%
R 4571
 
4.6%
P 4402
 
4.5%
I 4006
 
4.1%
C 3717
 
3.8%
Other values (15) 17302
17.5%
Decimal Number
ValueCountFrequency (%)
1 21606
24.5%
2 11678
13.2%
5 9599
10.9%
3 8422
 
9.5%
4 7380
 
8.4%
9 6484
 
7.3%
6 6360
 
7.2%
7 6245
 
7.1%
8 5934
 
6.7%
0 4539
 
5.1%
Other Punctuation
ValueCountFrequency (%)
/ 4560
73.9%
. 1532
 
24.8%
, 76
 
1.2%
Space Separator
ValueCountFrequency (%)
109027
100.0%
Close Punctuation
ValueCountFrequency (%)
) 22880
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22880
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 20194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 404144
60.0%
Common 269396
40.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 41602
 
10.3%
i 28370
 
7.0%
E 26833
 
6.6%
e 26197
 
6.5%
x 22617
 
5.6%
a 19831
 
4.9%
n 19491
 
4.8%
r 18384
 
4.5%
o 17139
 
4.2%
s 14946
 
3.7%
Other values (40) 168734
41.8%
Common
ValueCountFrequency (%)
109027
40.5%
) 22880
 
8.5%
( 22880
 
8.5%
1 21606
 
8.0%
- 20194
 
7.5%
2 11678
 
4.3%
5 9599
 
3.6%
3 8422
 
3.1%
4 7380
 
2.7%
9 6484
 
2.4%
Other values (7) 29246
 
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 673540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
109027
 
16.2%
t 41602
 
6.2%
i 28370
 
4.2%
E 26833
 
4.0%
e 26197
 
3.9%
) 22880
 
3.4%
( 22880
 
3.4%
x 22617
 
3.4%
1 21606
 
3.2%
- 20194
 
3.0%
Other values (57) 331334
49.2%
\ No newline at end of file