diff --git "a/src/assets/ml/eda-report.html" "b/src/assets/ml/eda-report.html" new file mode 100644--- /dev/null +++ "b/src/assets/ml/eda-report.html" @@ -0,0 +1,14035 @@ +Dataset

Overview

Dataset statistics

Number of variables5
Number of observations150
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.7%
Total size in memory6.0 KiB
Average record size in memory40.9 B

Variable types

Numeric4
Categorical1

Alerts

Dataset has 1 (0.7%) duplicate rowsDuplicates
sepal length (cm) is highly overall correlated with petal length (cm) and 2 other fieldsHigh correlation
petal length (cm) is highly overall correlated with sepal length (cm) and 2 other fieldsHigh correlation
petal width (cm) is highly overall correlated with sepal length (cm) and 2 other fieldsHigh correlation
target is highly overall correlated with sepal length (cm) and 2 other fieldsHigh correlation
target is uniformly distributedUniform

Reproduction

Analysis started2023-06-16 12:39:08.352180
Analysis finished2023-06-16 12:39:09.921831
Duration1.57 second
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

sepal length (cm)
Real number (ℝ)

Distinct35
Distinct (%)23.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.8433333
Minimum4.3
Maximum7.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-06-16T12:39:09.956337image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum4.3
5-th percentile4.6
Q15.1
median5.8
Q36.4
95-th percentile7.255
Maximum7.9
Range3.6
Interquartile range (IQR)1.3

Descriptive statistics

Standard deviation0.82806613
Coefficient of variation (CV)0.14171126
Kurtosis-0.55206404
Mean5.8433333
Median Absolute Deviation (MAD)0.7
Skewness0.31491096
Sum876.5
Variance0.68569351
MonotonicityNot monotonic
2023-06-16T12:39:10.019326image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
5 10
 
6.7%
5.1 9
 
6.0%
6.3 9
 
6.0%
5.7 8
 
5.3%
6.7 8
 
5.3%
5.8 7
 
4.7%
5.5 7
 
4.7%
6.4 7
 
4.7%
4.9 6
 
4.0%
5.4 6
 
4.0%
Other values (25) 73
48.7%
ValueCountFrequency (%)
4.3 1
 
0.7%
4.4 3
 
2.0%
4.5 1
 
0.7%
4.6 4
 
2.7%
4.7 2
 
1.3%
4.8 5
3.3%
4.9 6
4.0%
5 10
6.7%
5.1 9
6.0%
5.2 4
 
2.7%
ValueCountFrequency (%)
7.9 1
 
0.7%
7.7 4
2.7%
7.6 1
 
0.7%
7.4 1
 
0.7%
7.3 1
 
0.7%
7.2 3
2.0%
7.1 1
 
0.7%
7 1
 
0.7%
6.9 4
2.7%
6.8 3
2.0%

sepal width (cm)
Real number (ℝ)

Distinct23
Distinct (%)15.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.0573333
Minimum2
Maximum4.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-06-16T12:39:10.082508image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2.345
Q12.8
median3
Q33.3
95-th percentile3.8
Maximum4.4
Range2.4
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.43586628
Coefficient of variation (CV)0.1425642
Kurtosis0.22824904
Mean3.0573333
Median Absolute Deviation (MAD)0.3
Skewness0.31896566
Sum458.6
Variance0.18997942
MonotonicityNot monotonic
2023-06-16T12:39:10.159539image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
3 26
17.3%
2.8 14
 
9.3%
3.2 13
 
8.7%
3.4 12
 
8.0%
3.1 11
 
7.3%
2.9 10
 
6.7%
2.7 9
 
6.0%
2.5 8
 
5.3%
3.5 6
 
4.0%
3.3 6
 
4.0%
Other values (13) 35
23.3%
ValueCountFrequency (%)
2 1
 
0.7%
2.2 3
 
2.0%
2.3 4
 
2.7%
2.4 3
 
2.0%
2.5 8
 
5.3%
2.6 5
 
3.3%
2.7 9
 
6.0%
2.8 14
9.3%
2.9 10
 
6.7%
3 26
17.3%
ValueCountFrequency (%)
4.4 1
 
0.7%
4.2 1
 
0.7%
4.1 1
 
0.7%
4 1
 
0.7%
3.9 2
 
1.3%
3.8 6
4.0%
3.7 3
 
2.0%
3.6 4
 
2.7%
3.5 6
4.0%
3.4 12
8.0%

petal length (cm)
Real number (ℝ)

Distinct43
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.758
Minimum1
Maximum6.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-06-16T12:39:10.221019image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.3
Q11.6
median4.35
Q35.1
95-th percentile6.1
Maximum6.9
Range5.9
Interquartile range (IQR)3.5

Descriptive statistics

Standard deviation1.7652982
Coefficient of variation (CV)0.46974407
Kurtosis-1.4021034
Mean3.758
Median Absolute Deviation (MAD)1.25
Skewness-0.27488418
Sum563.7
Variance3.1162779
MonotonicityNot monotonic
2023-06-16T12:39:10.280181image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
1.4 13
 
8.7%
1.5 13
 
8.7%
5.1 8
 
5.3%
4.5 8
 
5.3%
1.6 7
 
4.7%
1.3 7
 
4.7%
5.6 6
 
4.0%
4.7 5
 
3.3%
4.9 5
 
3.3%
4 5
 
3.3%
Other values (33) 73
48.7%
ValueCountFrequency (%)
1 1
 
0.7%
1.1 1
 
0.7%
1.2 2
 
1.3%
1.3 7
4.7%
1.4 13
8.7%
1.5 13
8.7%
1.6 7
4.7%
1.7 4
 
2.7%
1.9 2
 
1.3%
3 1
 
0.7%
ValueCountFrequency (%)
6.9 1
 
0.7%
6.7 2
1.3%
6.6 1
 
0.7%
6.4 1
 
0.7%
6.3 1
 
0.7%
6.1 3
2.0%
6 2
1.3%
5.9 2
1.3%
5.8 3
2.0%
5.7 3
2.0%

petal width (cm)
Real number (ℝ)

Distinct22
Distinct (%)14.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1993333
Minimum0.1
Maximum2.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-06-16T12:39:10.339886image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.2
Q10.3
median1.3
Q31.8
95-th percentile2.3
Maximum2.5
Range2.4
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation0.76223767
Coefficient of variation (CV)0.63555114
Kurtosis-1.340604
Mean1.1993333
Median Absolute Deviation (MAD)0.7
Skewness-0.10296675
Sum179.9
Variance0.58100626
MonotonicityNot monotonic
2023-06-16T12:39:10.404622image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
0.2 29
19.3%
1.3 13
 
8.7%
1.8 12
 
8.0%
1.5 12
 
8.0%
1.4 8
 
5.3%
2.3 8
 
5.3%
1 7
 
4.7%
0.4 7
 
4.7%
0.3 7
 
4.7%
2.1 6
 
4.0%
Other values (12) 41
27.3%
ValueCountFrequency (%)
0.1 5
 
3.3%
0.2 29
19.3%
0.3 7
 
4.7%
0.4 7
 
4.7%
0.5 1
 
0.7%
0.6 1
 
0.7%
1 7
 
4.7%
1.1 3
 
2.0%
1.2 5
 
3.3%
1.3 13
8.7%
ValueCountFrequency (%)
2.5 3
 
2.0%
2.4 3
 
2.0%
2.3 8
5.3%
2.2 3
 
2.0%
2.1 6
4.0%
2 6
4.0%
1.9 5
3.3%
1.8 12
8.0%
1.7 2
 
1.3%
1.6 4
 
2.7%

target
Categorical

HIGH CORRELATION  UNIFORM 

Distinct3
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
0
50 
1
50 
2
50 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Length

2023-06-16T12:39:10.459224image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-16T12:39:10.523069image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Most occurring characters

ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 150
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 150
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 150
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 50
33.3%
1 50
33.3%
2 50
33.3%

Interactions

2023-06-16T12:39:09.584415image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:08.720192image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.149222image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.362418image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.647568image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:08.938116image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.200501image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.421200image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.695839image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.016976image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.248978image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.472177image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.747521image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.093014image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.306311image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-06-16T12:39:09.519149image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-06-16T12:39:10.565136image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
sepal length (cm)1.000-0.1670.8820.8340.617
sepal width (cm)-0.1671.000-0.310-0.2890.446
petal length (cm)0.882-0.3101.0000.9380.890
petal width (cm)0.834-0.2890.9381.0000.924
target0.6170.4460.8900.9241.000

Missing values

2023-06-16T12:39:09.815300image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-16T12:39:09.887294image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
55.43.91.70.40
64.63.41.40.30
75.03.41.50.20
84.42.91.40.20
94.93.11.50.10
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
1406.73.15.62.42
1416.93.15.12.32
1425.82.75.11.92
1436.83.25.92.32
1446.73.35.72.52
1456.73.05.22.32
1466.32.55.01.92
1476.53.05.22.02
1486.23.45.42.32
1495.93.05.11.82

Duplicate rows

Most frequently occurring

sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target# duplicates
05.82.75.11.922
\ No newline at end of file