BSI PD ISO/IEC/TS 4213:2022
$167.15
Information technology. Artificial Intelligence. Assessment of machine learning classification performance
Published By | Publication Date | Number of Pages |
BSI | 2022 | 42 |
PDF Catalog
PDF Pages | PDF Title |
---|---|
2 | National foreword |
7 | Foreword |
8 | Introduction |
9 | 1 Scope 2 Normative references 3 Terms and definitions 3.1 Classification and related terms 3.2 Metrics and related terms |
11 | 4 Abbreviated terms |
12 | 5 General principles 5.1 Generalized process for machine learning classification performance assessment 5.2 Purpose of machine learning classification performance assessment |
13 | 5.3 Control criteria in machine learning classification performance assessment 5.3.1 General 5.3.2 Data representativeness and bias 5.3.3 Preprocessing 5.3.4 Training data |
14 | 5.3.5 Test and validation data 5.3.6 Cross-validation 5.3.7 Limiting information leakage 5.3.8 Limiting channel effects |
15 | 5.3.9 Ground truth 5.3.10 Machine learning algorithms, hyperparameters and parameters |
16 | 5.3.11 Evaluation environment 5.3.12 Acceleration 5.3.13 Appropriate baselines 5.3.14 Machine learning classification performance context 6 Statistical measures of performance 6.1 General |
17 | 6.2 Base elements for metric computation 6.2.1 General 6.2.2 Confusion matrix 6.2.3 Accuracy 6.2.4 Precision, recall and specificity 6.2.5 F1 score |
18 | 6.2.6 Fβ 6.2.7 Kullback-Leibler divergence 6.3 Binary classification 6.3.1 General |
19 | 6.3.2 Confusion matrix for binary classification 6.3.3 Accuracy for binary classification 6.3.4 Precision, recall, specificity, F1 score and Fβ for binary classification 6.3.5 Kullback-Leibler divergence for binary classification 6.3.6 Receiver operating characteristic curve and area under the receiver operating characteristic curve |
20 | 6.3.7 Precision recall curve and area under the precision recall curve 6.3.8 Cumulative response curve 6.3.9 Lift curve 6.4 Multi-class classification 6.4.1 General 6.4.2 Accuracy for multi-class classification 6.4.3 Macro-average, weighted-average and micro-average |
22 | 6.4.4 Distribution difference or distance metrics 6.5 Multi-label classification 6.5.1 General 6.5.2 Hamming loss |
23 | 6.5.3 Exact match ratio 6.5.4 Jaccard index |
24 | 6.5.5 Distribution difference or distance metrics 6.6 Computational complexity 6.6.1 General 6.6.2 Classification latency |
25 | 6.6.3 Classification throughput 6.6.4 Classification efficiency 6.6.5 Energy consumption |
26 | 7 Statistical tests of significance 7.1 General |
27 | 7.2 Paired Student’s t-test 7.3 Analysis of variance 7.4 Kruskal-Wallis test 7.5 Chi-squared test 7.6 Wilcoxon signed-ranks test |
28 | 7.7 Fisher’s exact test 7.8 Central limit theorem 7.9 McNemar test 7.10 Accommodating multiple comparisons 7.10.1 General |
29 | 7.10.2 Bonferroni correction 7.10.3 False discovery rate 8 Reporting |
30 | Annex A (informative) Multi-class classification performance illustration |
32 | Annex B (informative) Illustration of ROC curve derived from classification results |
37 | Annex C (informative) Summary information on machine learning classification benchmark tests |
39 | Annex D (informative) Chance-corrected cause-specific mortality fraction |
40 | Bibliography |