# ML comparison report (2026-01-03) Scope: compare all ML runs to date, keeping POEMS+MMN unless noted, and document leakage checks. All runs use 5-fold stratified CV, winsorization (1-99%), median imputation, and center-balanced sample weights. Center is never used as a feature. ## Winner (performance + leakage checks) Winner by balanced accuracy: age + sensory + derived | best model ExtraTrees | bal_acc 0.812 | AUROC 0.890 | dir output/ml_benchmark_2026-01-03_age_sensory_with_derived Leakage signals for this family of runs: permutation baseline ~0.52-0.55; center-only baseline ~0.778; group-CV by center remains low (~0.48-0.59 bal_acc). Note: age shows high single-feature AUC (~0.81) and has imbalanced missingness by label; treat as a strong but potentially confounded signal. Leakage proxy: group-CV HGB balanced accuracy is highest for age+sensory+derived (0.590) and age_only_with_derived (0.590); no run shows a materially lower center-only baseline. ## Calculator Deployment (current best model) - Model: ExtraTrees trained on age + sensory + derived features. - Training dataset: `output/ml_benchmark_2026-01-03_age_sensory_with_derived/ml_dataset_hereditary_vs_inflammatory.csv` - Trained model artifact: `output/final_model_2026-01-03/extra_trees_model.joblib` - Calculator model path: `neuropathy_calculator/model/extra_trees_model.joblib` - Calculator features: age, sex, sensory_absent, median/ulnar latency/NCV/CMAP + derived ratio/diff. - Training script: `analysis/train_calculator_model.py` ## Training label counts by center (winner dataset) Saved to: `output/ml_benchmark_2026-01-03_age_sensory_with_derived/label_counts_by_center.csv` | center | hereditary | inflammatory | total | | --- | --- | --- | --- | | HCFMRP_USP | 172 | 25 | 197 | | Humanitas_Milano | 0 | 97 | 97 | | UFU | 12 | 14 | 26 | | USP_SP | 4 | 14 | 18 | | legacy_hcrp | 39 | 31 | 70 | ## Run log (all attempts) Columns: run, records, excluded(no_label/poems/mmn), features, best_model, bal_acc, AUROC, notes, dir | run | records | excl(no_label/poems/mmn) | features | best_model | bal_acc | AUROC | notes | dir | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | baseline_388_base7 | 388 | 20/0/0 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | RandomForest | 0.763 | 0.849 | 20 excluded (no label) | output/ml_benchmark_2026-01-02_022102 | | derived_no_poems | 395 | 0/13/0 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, median_ulnar_distal_latency_diff_gt_10ms | ExtraTrees | 0.751 | 0.845 | 13 POEMS excluded, 20 recovered by group | output/ml_benchmark_2026-01-03_130633 | | derived_mmn_included | 408 | 0/0/0 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, diff_ncv>10 | ExtraTrees | 0.762 | 0.845 | POEMS+MMN included, 20 recovered by group | output/ml_benchmark_2026-01-03_mmn_included | | derived_mmn_excluded | 388 | 0/0/20 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, diff_ncv>10 | SVC_RBF | 0.706 | 0.781 | MMN excluded, POEMS included | output/ml_benchmark_2026-01-03_mmn_excluded | | base7_408 | 408 | 0/0/0 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | ExtraTrees | 0.744 | 0.835 | POEMS+MMN included, 20 recovered by group | output/ml_benchmark_2026-01-03_base7_408 | | base7_no_poems | 395 | 0/13/0 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | ExtraTrees | 0.764 | 0.848 | POEMS excluded | output/ml_benchmark_2026-01-03_base7_no_poems | | base7_no_mmn | 388 | 0/0/20 | sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | SVC_RBF | 0.711 | 0.782 | MMN excluded | output/ml_benchmark_2026-01-03_base7_no_mmn | | age_only_with_derived | 408 | 0/0/0 | age, sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, diff_ncv>10 | RandomForest | 0.808 | 0.887 | POEMS+MMN included | output/ml_benchmark_2026-01-03_age_only_with_derived | | age_only_no_derived | 408 | 0/0/0 | age, sex, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | HistGradientBoosting | 0.807 | 0.877 | POEMS+MMN included | output/ml_benchmark_2026-01-03_age_only_no_derived | | sensory_only_with_derived | 408 | 0/0/0 | sex, sensory_absent, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, diff_ncv>10 | ExtraTrees | 0.761 | 0.835 | POEMS+MMN included | output/ml_benchmark_2026-01-03_sensory_only_with_derived | | sensory_only_no_derived | 408 | 0/0/0 | sex, sensory_absent, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | RandomForest | 0.748 | 0.828 | POEMS+MMN included | output/ml_benchmark_2026-01-03_sensory_only_no_derived | | age_sensory_with_derived | 408 | 0/0/0 | age, sex, sensory_absent, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp, ratio_amp, diff_ncv>10 | ExtraTrees | 0.812 | 0.890 | POEMS+MMN included | output/ml_benchmark_2026-01-03_age_sensory_with_derived | | age_sensory_no_derived | 408 | 0/0/0 | age, sex, sensory_absent, med_lat, med_ncv, med_amp, uln_lat, uln_ncv, uln_amp | RandomForest | 0.801 | 0.892 | POEMS+MMN included | output/ml_benchmark_2026-01-03_age_sensory_no_derived | ## Leakage audit summary (POEMS+MMN kept runs) Columns: run, perm_bal, center_only_bal, groupCV RF (bal/auc), HGB (bal/auc), XGB (bal/auc) | run | perm_bal | center_only_bal | groupCV_RF | groupCV_HGB | groupCV_XGB | | --- | --- | --- | --- | --- | --- | | age_only_with_derived | 0.541 | 0.778 | 0.509/0.605 | 0.590/0.651 | 0.569/0.642 | | age_only_no_derived | 0.537 | 0.778 | 0.535/0.607 | 0.580/0.621 | 0.573/0.628 | | sensory_only_with_derived | 0.529 | 0.778 | 0.456/0.493 | 0.485/0.511 | 0.515/0.528 | | sensory_only_no_derived | 0.539 | 0.778 | 0.481/0.494 | 0.496/0.513 | 0.518/0.530 | | age_sensory_with_derived | 0.525 | 0.778 | 0.518/0.593 | 0.590/0.646 | 0.559/0.629 | | age_sensory_no_derived | 0.546 | 0.778 | 0.523/0.591 | 0.572/0.619 | 0.572/0.625 | ## Age confound checks (dataset: output/ml_benchmark_2026-01-03_age_sensory_no_derived) Outputs: output/age_checks_2026-01-03/ - KS test (label 0 vs 1): stat 0.490, p=2.76e-16 (age distributions differ overall). - Significant within-center KS (p<0.05): HCFMRP_USP (stat 0.487), legacy_hcrp (stat 0.651). - Age missingness by label: label 0 = 0.64%, label 1 = 15.46% (imbalanced). - Age-only AUC: 0.812. - Stratified CV (balanced accuracy / AUROC): - age_only: RF 0.741 / 0.835; HGB 0.737 / 0.823 - base7: RF 0.734 / 0.832; HGB 0.695 / 0.791 - base7+age: RF 0.794 / 0.889; HGB 0.807 / 0.877 - Group-CV by center (balanced accuracy / AUROC): - with_age: RF 0.492 / 0.562; HGB 0.601 / 0.637 - without_age: RF 0.460 / 0.486; HGB 0.554 / 0.567 ## Hyperparameter search (base7, POEMS/MMN exclusion scenarios) Best per model (balanced accuracy): | run | model | bal_acc | AUROC | params | dir | | --- | --- | --- | --- | --- | --- | | base7_no_poems | ExtraTrees | 0.778 | 0.848 | {"max_depth": 12, "max_features": 0.7, "min_samples_leaf": 1, "n_estimators": 600} | output/ml_hyperparam_2026-01-03_base7_no_poems | | base7_no_poems | HistGradientBoosting | 0.705 | 0.802 | {"learning_rate": 0.1, "max_depth": null, "max_leaf_nodes": 15, "min_samples_leaf": 20} | output/ml_hyperparam_2026-01-03_base7_no_poems | | base7_no_poems | RandomForest | 0.745 | 0.835 | {"max_depth": 12, "max_features": "sqrt", "min_samples_leaf": 2, "n_estimators": 600} | output/ml_hyperparam_2026-01-03_base7_no_poems | | base7_no_poems | XGBoost | 0.730 | 0.804 | {"colsample_bytree": 0.8, "learning_rate": 0.1, "max_depth": 5, "min_child_weight": 1, "n_estimators": 400, "subsample": 0.8} | output/ml_hyperparam_2026-01-03_base7_no_poems | | base7_no_mmn | ExtraTrees | 0.724 | 0.797 | {"max_depth": 12, "max_features": "sqrt", "min_samples_leaf": 1, "n_estimators": 300} | output/ml_hyperparam_2026-01-03_base7_no_mmn | | base7_no_mmn | HistGradientBoosting | 0.684 | 0.749 | {"learning_rate": 0.03, "max_depth": null, "max_leaf_nodes": 15, "min_samples_leaf": 5} | output/ml_hyperparam_2026-01-03_base7_no_mmn | | base7_no_mmn | RandomForest | 0.713 | 0.794 | {"max_depth": null, "max_features": 0.7, "min_samples_leaf": 2, "n_estimators": 300} | output/ml_hyperparam_2026-01-03_base7_no_mmn | | base7_no_mmn | XGBoost | 0.704 | 0.763 | {"colsample_bytree": 0.8, "learning_rate": 0.03, "max_depth": 5, "min_child_weight": 1, "n_estimators": 400, "subsample": 0.8} | output/ml_hyperparam_2026-01-03_base7_no_mmn | ## Notes - Derived features (ratio_amp, diff_ncv>10) did not show strong single-feature AUC (ratio_amp ~0.48; diff_ncv>10 ~0.60). - Age is a strong single feature (AUC ~0.81) and shows imbalanced missingness by label; interpret center generalization with care. - Center-only baseline remains high (~0.778) regardless of features, reflecting center imbalance in labels. - One earlier run used a latency-difference derived feature (median_ulnar_distal_latency_diff_gt_10ms) before switching to NCV diff. ## Artifact Inventory (paths) - Benchmarks: `output/ml_benchmark_2026-01-02_022102/`, `output/ml_benchmark_2026-01-03_mmn_included/`, `output/ml_benchmark_2026-01-03_mmn_excluded/`, `output/ml_benchmark_2026-01-03_base7_408/`, `output/ml_benchmark_2026-01-03_base7_no_poems/`, `output/ml_benchmark_2026-01-03_base7_no_mmn/`, `output/ml_benchmark_2026-01-03_age_only_with_derived/`, `output/ml_benchmark_2026-01-03_age_only_no_derived/`, `output/ml_benchmark_2026-01-03_sensory_only_with_derived/`, `output/ml_benchmark_2026-01-03_sensory_only_no_derived/`, `output/ml_benchmark_2026-01-03_age_sensory_with_derived/`, `output/ml_benchmark_2026-01-03_age_sensory_no_derived/`. - Leakage audits: `output/ml_leakage_audit_2026-01-03_*` (see table above), plus `output/ml_leakage_audit_2026-01-03_mmn_excluded/`. - Hyperparameter search: `output/ml_hyperparam_2026-01-03_base7_no_poems/`, `output/ml_hyperparam_2026-01-03_base7_no_mmn/`. - Age confound checks: `output/age_checks_2026-01-03/` (KS tests, missingness, CVs, plots). - Final model: `output/final_model_2026-01-03/` and `neuropathy_calculator/model/extra_trees_model.joblib`. - Figure index: `knowledge_base/presentations/FIGURE_INDEX_2026-01-03.md`.