# Session Summary and PNS 2026 Abstract Draft (2026-01-03)

## What we did (end-to-end)
- Confirmed the final goal: fast day-to-day prediction of hereditary vs acquired demyelinating neuropathy without using center as a model feature.
- Audited missingness drivers and updated the feature set to include age and sensory absence when they improved performance.
- Built and validated a reproducible ML pipeline with leakage checks and group-based generalization tests.
- Trained the best-performing ExtraTrees model and updated the public calculator.
- Deployed the public calculator behind the existing Traefik stack without touching other sites.
- Verified HTTPS certificate issuance and confirmed TLS works for the new host.

## What we learned (key decisions and rationale)
- Center is a strong confound; center-only prediction is high, so it must never be used as a feature.
- Age-at-exam improves performance but shows class/center imbalance; it is kept with explicit caveats.
- Sensory absence improves performance; it is kept while monitoring center effects.
- Side-specific missingness is expected because CMT records the non-dominant side; using any-side
  (left if present, else right) preserves coverage without leakage.

## Final modeling choices (locked)
- Task: binary classification (hereditary vs inflammatory/paraneoplastic demyelinating neuropathy).
- Input dataset: `output/final_dataset/final_dataset_labeled.jsonl` (408 labeled records used).
- Features (11): age, sex, sensory absence, median/ulnar distal latency, motor conduction velocity,
  and CMAP amplitude, plus derived ratio (median/ulnar CMAP) and NCV difference > 10 (any-side).
- Pre-processing: winsorization (1st-99th percentiles), median imputation, center-balanced sample
  weights for training only (center not used as a feature).
- Model: ExtraTrees, selected for best overall performance.

## Validation and leakage checks
- Stratified cross-validation: accuracy 0.816; balanced accuracy 0.812; AUROC 0.890; average precision 0.846.
- Permutation test (labels shuffled): balanced accuracy around 0.52 to 0.55, suggesting no leakage.
- Group evaluation: leave-one-center-out showed lower performance, highlighting generalization limits
  when a new center is not represented in training.

## Artifacts and reproducibility
- ML summary and metrics: `docs/ML_PROCESSING_2026-01-02.md`
- Final model: `output/final_model_2026-01-03/extra_trees_model.joblib`
- Figures for congress: `knowledge_base/presentations/2026_neuropathy_*.png` and
  `knowledge_base/presentations/2026_neuropathy_ml_summary.pdf`
- Calculator deployment notes: `docs/NEUROPATHY_CALCULATOR_2026-01-02.md`
- ML comparison report: `docs/ML_COMPARISON_REPORT_2026-01-03.md`

## Deployment and TLS (no impact to other sites)
- The calculator is routed through the existing Traefik instance used by Louis, on the same
  external network, with its own router and headers.
- TLS certificate for `predictneuropathy.tpfbrain.com` is issued by the existing resolver and is valid.
- Any "not secure" browser warning is typically caused by accessing via HTTP or cached state; HTTPS is
  confirmed to work. HSTS can be added only for this host if desired.

## PNS 2026 Abstract Draft (for online submission)

Title:
PredictNeuropathy: A machine-learning tool to differentiate hereditary from acquired demyelinating neuropathies

Introduction * Hereditary demyelinating neuropathies and acquired inflammatory demyelinating neuropathies can present with similar clinical and electrophysiological features, which complicates differential diagnosis despite their different causes and delays disease-specific treatment. This study aimed to develop and validate a quick, triage-level machine-learning algorithm to differentiate acquired from genetic demyelinating neuropathies using routine nerve conduction study data.

Methods * Data from 408 patients with confirmed hereditary demyelinating neuropathies (CMT1A, HNPP, CMT1B, males with CMTX1, and CMT4C) and acquired demyelinating neuropathies (CIDP, MMN and POEMS) were collected from multiple centers, yielding 227 genetic and 181 acquired cases. Structured electrophysiological variables were extracted using a standardized proforma. Features included age, sex, sensory response presence, and median and ulnar motor conduction parameters, plus two inter-nerve metrics: median-to-ulnar amplitude ratio and conduction velocity difference >=10 m/s. When bilateral studies were available, the left side was selected as the primary site. Performance was evaluated using an ExtraTrees classifier with 500 trees and stratified five-fold cross-validation.

Results * Cross-validation yielded an accuracy of 0.816, a balanced accuracy of 0.812, an area under the receiver operating characteristic curve of 0.890, and an average precision of 0.846. At a 0.50 threshold, precision was 0.805, recall was 0.773, and the F1 score was 0.789. The confusion matrix showed 193 true negatives, 34 false positives, 41 false negatives, and 140 true positives, with a specificity of 0.850. Feature importance highlighted age, median, and ulnar conduction metrics.

Conclusions * PredictNeuropathy is an automated machine-learning algorithm based on routine upper limb nerve conduction studies with clinically useful accuracy. It can support quick triage when neuromuscular expertise is limited, acting as a screening step to identify cases that need expert reassessment or as a confirmatory tool for non-specialists. The first version is already available online.