Reading the assumption report¶
Every mfgQC analysis checks its own assumptions and reports the outcome. It
warns, it quantifies the impact, and it recommends a next step — but it never
silently switches methods or transforms your data. If a capability study finds
your measurements are non-normal, it tells you and keeps computing the normal-theory
number you asked for; it does not quietly Box-Cox the data behind your back and hand
you a prettier Cpk. Auto-correction is opt-in only: a non-normal method runs only
when you pass method= explicitly. This is the "statistical guardrails" pillar, and
it is the deliberate difference between mfgQC and tools that transform-to-pass without
telling you. The assumption report is where that contract lives — so it is worth
learning to read.
The philosophy in the code's own words: "type hints, not decisions." Each check reports a binary verdict from the direct test of the assumption, plus two pieces of adjacent context — practical impact and the test's resolving power at your sample size. The context never flips the verdict, and the verdict never changes the analysis. You decide what to do.
Where the report comes from¶
Every result carries a list of structured AssumptionCheck records and renders them
in .report() under an Assumption checks block, followed by Recommendations.
The same data is available structurally via .to_dict() — consume that from code,
never parse the report text (see Quickstart).
import numpy as np, pandas as pd, mfgqc
rng = np.random.default_rng(11)
df = pd.DataFrame({"bore": np.round(rng.normal(10.0, 0.05, size=15), 4)})
qc = mfgqc.load(df, measure="bore").spec(lower=9.8, upper=10.2)
print(qc.capability())
Process Capability (method=normal)
==================================
n = 15 mean = 10.008
sigma (within) = n/a
sigma (overall) = 0.044324
Cp/Cpk sigma = overall
Cp = 1.504 95% CI (0.954, 2.05)
Cpk = 1.446 95% CI (0.884, 2.01) (Cpu=1.446, Cpl=1.562)
Pp = 1.504 Ppk = 1.446 (Ppu=1.446, Ppl=1.562)
Cpm = n/a
Assumption checks:
[PASS] normality (Anderson-Darling): AD=0.368, p=0.383; est. Cpk impact 3.4%; n=15 [low power]
Anatomy of an assumption line¶
Take that line apart, field by field. Each piece maps to a field on the
AssumptionCheck dataclass (mfgqc/assumptions.py):
[PASS] normality (Anderson-Darling): AD=0.368, p=0.383; est. Cpk impact 3.4%; n=15 [low power]
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └─ reliability
│ │ │ │ │ │ └─ n (sample size)
│ │ │ │ │ └─ magnitude + magnitude_label
│ │ │ │ └─ p_value
│ │ │ └─ statistic
│ │ └─ test
│ └─ name
└─ passed (verdict)
| Part of the line | AssumptionCheck field |
What it means |
|---|---|---|
[PASS] / [FAIL] |
passed |
The binary verdict from the direct test at \(\alpha = 0.05\). Nothing else in the line changes this. |
normality |
name |
What was checked. |
(Anderson-Darling) |
test |
The exact test used to reach the verdict. |
AD=0.368 |
statistic |
The test statistic. |
p=0.383 |
p_value |
P-value of the direct test. passed is simply p >= 0.05. Omitted when a test has no defined p-value. |
est. Cpk impact 3.4% |
magnitude + magnitude_label |
The practical-impact / effect-size context — here, how much the capability index would move under a non-normal fit. |
n=15 |
n |
The sample size the check ran on. |
[low power] |
reliability |
The test's resolving power at this \(n\): ok (no marker), low power, or oversensitive. |
The magnitude_label varies by check — est. Cpk impact and skew for normality,
variance ratio for homogeneity (Levene), dispersion ratio for attribute charts,
lag-1 autocorr for independence, subgroup count/ndc for adequacy rules. The
label tells you what the magnitude number is.
[PASS] vs [FAIL]: a verdict, not an action¶
A FAIL is information, not an automatic method change
[FAIL] means the assumption was rejected at \(\alpha = 0.05\). It does not
mean mfgQC changed anything. The number above the assumption block was still
computed with the method you asked for. A FAIL is a flag that says "the reported
number rests on an assumption that didn't hold — read on, and decide." The
recommendation line tells you the conventional remedy; acting on it is your call.
The marker is driven purely by the direct test of the assumption, never by the context fields. This is deliberate: it stops a coincidentally-small impact estimate from issuing a false all-clear on grossly non-normal data. The verdict answers "did the assumption hold?"; the context answers "does it matter, and could the test even tell?"
[low power] and [oversensitive] — the reliability flag¶
The reliability flag is a fact about the test, not a judgment about your data:
[low power]— the sample is small enough (\(n < 20\) for normality, with per-check thresholds) that the test has weak power to detect a real violation. A[PASS]carrying[low power]is weak evidence: the test did not reject, but at this \(n\) it might not have caught a violation that is genuinely there. Treat it as "no evidence against," not "confirmed."[oversensitive]— at very large \(n\) (\(> 5000\)) significance tests reject trivial, practically-irrelevant departures. A[FAIL]carrying[oversensitive]is the cue to lean on the magnitude, not the p-value.
This is exactly why mfgQC reports the effect size alongside the p-value.
Magnitude: statistical vs practical significance¶
A statistically significant violation can be practically negligible, and a practically serious one can hide under a non-significant p-value at small \(n\). The p-value answers "is the departure real?"; the magnitude answers "is it big enough to care about?" mfgQC reports both so you can judge practical significance yourself.
In the [PASS] example above, est. Cpk impact 3.4% says: even if you switched to a
non-normal method, your Cpk would move by about 3% — not enough to change a sourcing
decision. Contrast that with the failing case below, where the same label reads
89.1%. Same statistic name, completely different engineering consequence — and you
can only see the difference because the magnitude is on the line.
A FAIL, worked end to end¶
Here is a genuinely right-skewed positive measurement — the kind you get from flatness, surface roughness, or runout, where values are bounded at zero and tail to the right. Run the default capability study, then run the explicitly-chosen Box-Cox method:
import numpy as np, pandas as pd, mfgqc
rng = np.random.default_rng(42)
x = np.round(rng.gamma(shape=2.0, scale=0.4, size=120) + 0.2, 3)
df = pd.DataFrame({"flatness": x})
qc = mfgqc.load(df, measure="flatness").spec(upper=3.0)
print(qc.capability()) # default: method='normal'
print(qc.capability(method="boxcox")) # explicit opt-in
Process Capability (method=normal)
==================================
n = 120 mean = 0.94012
sigma (within) = n/a
sigma (overall) = 0.5121
Cp/Cpk sigma = overall
Cp = n/a
Cpk = 1.341 95% CI (1.16, 1.52) (Cpu=1.341, Cpl= n/a)
Pp = n/a Ppk = 1.341 (Ppu=1.341, Ppl= n/a)
Cpm = n/a
Assumption checks:
[FAIL] normality (Anderson-Darling): AD=2.74, p=6.14e-07; est. Cpk impact 89.1%; n=120
Recommendations:
- Data are not normal (AD=2.74, p=6.14e-07); for capability use a non-normal method (method='clements'/'johnson').
Process Capability (method=boxcox)
==================================
n = 120 mean = 0.94012
sigma (within) = n/a
sigma (overall) = 0.5121
Cp/Cpk sigma = box-cox (lambda=0.017)
Cp = n/a CI: n/a (non-normal method)
Cpk = 0.8366 CI: n/a (non-normal method) (Cpu=0.8366, Cpl= n/a)
Pp = n/a Ppk = 0.8366 (Ppu=0.8366, Ppl= n/a)
Cpm = n/a
Assumption checks:
[FAIL] normality (Anderson-Darling): AD=2.74, p=6.14e-07; est. Cpk impact 89.1%; n=120
Read the FAIL line: the Anderson-Darling test rejects normality hard
(p=6.14e-07), and the magnitude says it matters — est. Cpk impact 89.1%. The
default normal method reports Cpk = 1.341, comfortably above the usual 1.33 gate.
The legitimate Box-Cox number is Cpk = 0.8366. The normality assumption was
inflating capability by a third. That is the entire reason mfgQC refuses to transform
silently: had it auto-applied Box-Cox, you'd never have seen 1.341 and might never
have questioned it; had it auto-suppressed the transform, you'd have shipped a Cpk
that is fiction.
The default never transforms
Non-normal methods (boxcox, clements, johnson) run only when you pass
method= explicitly. qc.capability() with no argument always reports the
normal-theory number, FAIL flag and all. The choice to transform is yours, on the
record, in the provenance history (see Provenance model).
When to opt into a correction — and when not to¶
A FAIL gives you two legitimate responses. Picking the right one is an engineering judgment, not a statistical one.
DO choose a non-normal method when the shape is the true nature of the data
If the measurement is inherently skewed or bounded — flatness, roundness, taper,
particle counts, time-to-event — then normality was never the right model and the
non-normal index is the honest one. The flatness example above is exactly this
case: opt into method="boxcox" (or "clements"/"johnson"), and the choice is
recorded in the lineage. See
Non-normal capability for choosing among
the methods.
DON'T 'correct away' a FAIL that is signaling a special cause
A normality FAIL can also mean your process is not in control — a mixture of two streams, a drifting mean, a tool-change step, an outlier from a bad part. That is a special cause to investigate, not a distribution to transform. Box-Cox-ing an unstable process to make it look capable is precisely the malpractice mfgQC refuses to do silently — do not do it by hand either. Before reaching for a transform, plot the data on a control chart and confirm the process is stable. See Control charts.
The discipline: establish stability first, then assess capability. A capability index — normal or non-normal — only means something for an in-control process. If the control chart shows out-of-control signals, the assumption FAIL is a symptom of the instability; fix the process, don't reshape the math. If the chart is clean and the shape is genuinely non-normal, then opting into the matching method is the right, honest call — and mfgQC makes you make it on purpose.
See also¶
- Quickstart — the
load → spec → analysisflow and the result surface. - Capability — the normal-method indices and sigma families.
- Non-normal capability — Box-Cox, Clements, and Johnson methods.
- Control charts — establish stability before judging capability.
- Provenance model — how an opted-in transform is recorded.
Sources: AssumptionCheck and the per-check logic in mfgqc/assumptions.py; the
report rendering (_assumption_line, report) in mfgqc/_result.py; the
explicit-opt-in method= gate in mfgqc/capability.py. The Anderson-Darling,
Box-Cox, Clements, and Nelson references are catalogued in the
bibliography.