# Figure Guidance for Dr. Pedro Jose Tomaselli (2026-01-03)

All figures reference the current ExtraTrees model trained with age, sensory absence,
median/ulnar motor metrics, and derived inter-nerve features. Stratified 5-fold CV
metrics are reported on N=408 labeled exams (227 hereditary, 181 inflammatory).

## Figure 1 - Class Distribution (Hereditary vs Inflammatory)
1. Purpose: show the overall class balance for the multicenter cohort used in the final model.
2. Counts: 227 hereditary cases versus 181 inflammatory cases.
3. This balance reduces the risk of a majority-class-only model.
4. Labels correspond to hereditary (CMT/HNPP) and inflammatory/paraneoplastic (CIDP/POEMS/MMN).
5. Reminder: only labeled records are included (N=408).
6. Suggested script: "The cohort is reasonably balanced, so the model is not starved of treatable cases."

## Figure 2 - ROC Curve (Stratified CV, ExtraTrees)
1. Purpose: quantify separation performance across thresholds.
2. AUROC is 0.890, meaning the model ranks an inflammatory case above a hereditary case about 89% of the time.
3. Based on 5-fold stratified cross-validation (in-distribution performance).
4. Center is not used as a feature; the model sees ENMG metrics plus age and sensory absence.
5. Suggested script: "The curve stays above the diagonal, indicating consistent signal beyond chance."

## Figure 3 - Precision-Recall Curve (Stratified CV, ExtraTrees)
1. Purpose: show precision vs recall for the inflammatory class.
2. Average precision is 0.846, indicating good PPV as recall increases.
3. This plot reflects clinical review burden more directly than ROC.
4. Suggested script: "We can tune sensitivity without overwhelming clinicians with false positives."

## Figure 4 - Confusion Matrix (Stratified CV, ExtraTrees)
1. Purpose: visualize agreement with the reference diagnosis.
2. Out-of-fold counts: TN 193, FP 34, FN 41, TP 140.
3. Sensitivity for inflammatory cases is 140/181 (about 0.77); specificity for hereditary is 193/227 (about 0.85).
4. Balanced accuracy is 0.812, reflecting symmetric performance across classes.
5. Suggested script: "False negatives are the most critical errors because inflammatory cases are treatable."

## Figure 5 - Feature Importance (ExtraTrees)
1. Purpose: show which features drive predictions without using center.
2. Age and median/ulnar conduction metrics (NCV and distal latency) dominate importance.
3. CMAP amplitudes contribute strongly; derived ratio/diff features add smaller gains.
4. Suggested script: "The model relies on familiar ENMG metrics, augmented by age and sensory context."

## Figure 6 - Leave-One-Center-Out Performance
1. Purpose: estimate performance on an unseen center.
2. Balanced accuracy by center (AUC in parentheses):
   - HCFMRP_USP: 0.661 (0.766)
   - UFU: 0.696 (0.839)
   - USP_SP: 0.589 (0.732)
   - legacy_hcrp: 0.688 (0.749)
3. Humanitas is excluded because it contains only one class in this dataset.
4. Suggested script: "Generalization across centers remains the main limitation and will improve with additional data harmonization."