Reading the assumption report¶

Every mfgQC analysis checks its own assumptions and reports the outcome. It warns, it quantifies the impact, and it recommends a next step — but it never silently switches methods or transforms your data. If a capability study finds your measurements are non-normal, it tells you and keeps computing the normal-theory number you asked for; it does not quietly Box-Cox the data behind your back and hand you a prettier Cpk. Auto-correction is opt-in only: a non-normal method runs only when you pass method= explicitly. This is the "statistical guardrails" pillar, and it is the deliberate difference between mfgQC and tools that transform-to-pass without telling you. The assumption report is where that contract lives — so it is worth learning to read.

The philosophy in the code's own words: "type hints, not decisions." Each check reports a binary verdict from the direct test of the assumption, plus two pieces of adjacent context — practical impact and the test's resolving power at your sample size. The context never flips the verdict, and the verdict never changes the analysis. You decide what to do.

Where the report comes from¶

Every result carries a list of structured AssumptionCheck records and renders them in .report() under an Assumption checks block, followed by Recommendations. The same data is available structurally via .to_dict() — consume that from code, never parse the report text (see Quickstart).

import numpy as np, pandas as pd, mfgqc

rng = np.random.default_rng(11)
df = pd.DataFrame({"bore": np.round(rng.normal(10.0, 0.05, size=15), 4)})
qc = mfgqc.load(df, measure="bore").spec(lower=9.8, upper=10.2)
print(qc.capability())

Process Capability (method=normal)
==================================
n = 15   mean = 10.008
sigma (within)  =   n/a
sigma (overall) = 0.044324
Cp/Cpk sigma    = overall

Cp  = 1.504  95% CI (0.954, 2.05)
Cpk = 1.446  95% CI (0.884, 2.01)   (Cpu=1.446, Cpl=1.562)
Pp  = 1.504    Ppk = 1.446   (Ppu=1.446, Ppl=1.562)
Cpm =   n/a

Assumption checks:
  [PASS] normality (Anderson-Darling): AD=0.368, p=0.383; est. Cpk impact 3.4%; n=15 [low power]

Anatomy of an assumption line¶

Take that line apart, field by field. Each piece maps to a field on the AssumptionCheck dataclass (mfgqc/assumptions.py):

[PASS] normality (Anderson-Darling): AD=0.368, p=0.383; est. Cpk impact 3.4%; n=15 [low power]
  │        │            │              │       │              │                   │     │
  │        │            │              │       │              │                   │     └─ reliability
  │        │            │              │       │              │                   └─ n (sample size)
  │        │            │              │       │              └─ magnitude + magnitude_label
  │        │            │              │       └─ p_value
  │        │            │              └─ statistic
  │        │            └─ test
  │        └─ name
  └─ passed (verdict)

Part of the line	`AssumptionCheck` field	What it means
`[PASS]` / `[FAIL]`	`passed`	The binary verdict from the direct test at \(\alpha = 0.05\). Nothing else in the line changes this.
`normality`	`name`	What was checked.
`(Anderson-Darling)`	`test`	The exact test used to reach the verdict.
`AD=0.368`	`statistic`	The test statistic.
`p=0.383`	`p_value`	P-value of the direct test. `passed` is simply `p >= 0.05`. Omitted when a test has no defined p-value.
`est. Cpk impact 3.4%`	`magnitude` + `magnitude_label`	The practical-impact / effect-size context — here, how much the capability index would move under a non-normal fit.
`n=15`	`n`	The sample size the check ran on.
`[low power]`	`reliability`	The test's resolving power at this \(n\): `ok` (no marker), `low power`, or `oversensitive`.

The magnitude_label varies by check — est. Cpk impact and skew for normality, variance ratio for homogeneity (Levene), dispersion ratio for attribute charts, lag-1 autocorr for independence, subgroup count/ndc for adequacy rules. The label tells you what the magnitude number is.

`[PASS]` vs `[FAIL]`: a verdict, not an action¶

A FAIL is information, not an automatic method change

[FAIL] means the assumption was rejected at \(\alpha = 0.05\). It does not mean mfgQC changed anything. The number above the assumption block was still computed with the method you asked for. A FAIL is a flag that says "the reported number rests on an assumption that didn't hold — read on, and decide." The recommendation line tells you the conventional remedy; acting on it is your call.

The marker is driven purely by the direct test of the assumption, never by the context fields. This is deliberate: it stops a coincidentally-small impact estimate from issuing a false all-clear on grossly non-normal data. The verdict answers "did the assumption hold?"; the context answers "does it matter, and could the test even tell?"

`[low power]` and `[oversensitive]` — the reliability flag¶

The reliability flag is a fact about the test, not a judgment about your data:

[low power] — the sample is small enough (\(n < 20\) for normality, with per-check thresholds) that the test has weak power to detect a real violation. A [PASS] carrying [low power] is weak evidence: the test did not reject, but at this \(n\) it might not have caught a violation that is genuinely there. Treat it as "no evidence against," not "confirmed."
[oversensitive] — at very large \(n\) (\(> 5000\)) significance tests reject trivial, practically-irrelevant departures. A [FAIL] carrying [oversensitive] is the cue to lean on the magnitude, not the p-value.

This is exactly why mfgQC reports the effect size alongside the p-value.

Magnitude: statistical vs practical significance¶

A statistically significant violation can be practically negligible, and a practically serious one can hide under a non-significant p-value at small \(n\). The p-value answers "is the departure real?"; the magnitude answers "is it big enough to care about?" mfgQC reports both so you can judge practical significance yourself.

In the [PASS] example above, est. Cpk impact 3.4% says: even if you switched to a non-normal method, your Cpk would move by about 3% — not enough to change a sourcing decision. Contrast that with the failing case below, where the same label reads 89.1%. Same statistic name, completely different engineering consequence — and you can only see the difference because the magnitude is on the line.

A FAIL, worked end to end¶

Here is a genuinely right-skewed positive measurement — the kind you get from flatness, surface roughness, or runout, where values are bounded at zero and tail to the right. Run the default capability study, then run the explicitly-chosen Box-Cox method:

import numpy as np, pandas as pd, mfgqc

rng = np.random.default_rng(42)
x = np.round(rng.gamma(shape=2.0, scale=0.4, size=120) + 0.2, 3)
df = pd.DataFrame({"flatness": x})
qc = mfgqc.load(df, measure="flatness").spec(upper=3.0)

print(qc.capability())                    # default: method='normal'
print(qc.capability(method="boxcox"))     # explicit opt-in

Process Capability (method=normal)
==================================
n = 120   mean = 0.94012
sigma (within)  =   n/a
sigma (overall) = 0.5121
Cp/Cpk sigma    = overall

Cp  =   n/a
Cpk = 1.341  95% CI (1.16, 1.52)   (Cpu=1.341, Cpl=  n/a)
Pp  =   n/a    Ppk = 1.341   (Ppu=1.341, Ppl=  n/a)
Cpm =   n/a

Assumption checks:
  [FAIL] normality (Anderson-Darling): AD=2.74, p=6.14e-07; est. Cpk impact 89.1%; n=120

Recommendations:
  - Data are not normal (AD=2.74, p=6.14e-07); for capability use a non-normal method (method='clements'/'johnson').

Process Capability (method=boxcox)
==================================
n = 120   mean = 0.94012
sigma (within)  =   n/a
sigma (overall) = 0.5121
Cp/Cpk sigma    = box-cox (lambda=0.017)

Cp  =   n/a  CI: n/a (non-normal method)
Cpk = 0.8366  CI: n/a (non-normal method)   (Cpu=0.8366, Cpl=  n/a)
Pp  =   n/a    Ppk = 0.8366   (Ppu=0.8366, Ppl=  n/a)
Cpm =   n/a

Assumption checks:
  [FAIL] normality (Anderson-Darling): AD=2.74, p=6.14e-07; est. Cpk impact 89.1%; n=120

Read the FAIL line: the Anderson-Darling test rejects normality hard (p=6.14e-07), and the magnitude says it matters — est. Cpk impact 89.1%. The default normal method reports Cpk = 1.341, comfortably above the usual 1.33 gate. The legitimate Box-Cox number is Cpk = 0.8366. The normality assumption was inflating capability by a third. That is the entire reason mfgQC refuses to transform silently: had it auto-applied Box-Cox, you'd never have seen 1.341 and might never have questioned it; had it auto-suppressed the transform, you'd have shipped a Cpk that is fiction.

The default never transforms

Non-normal methods (boxcox, clements, johnson) run only when you pass method= explicitly. qc.capability() with no argument always reports the normal-theory number, FAIL flag and all. The choice to transform is yours, on the record, in the provenance history (see Provenance model).

When to opt into a correction — and when not to¶

A FAIL gives you two legitimate responses. Picking the right one is an engineering judgment, not a statistical one.

DO choose a non-normal method when the shape is the true nature of the data

If the measurement is inherently skewed or bounded — flatness, roundness, taper, particle counts, time-to-event — then normality was never the right model and the non-normal index is the honest one. The flatness example above is exactly this case: opt into method="boxcox" (or "clements"/"johnson"), and the choice is recorded in the lineage. See Non-normal capability for choosing among the methods.

DON'T 'correct away' a FAIL that is signaling a special cause

A normality FAIL can also mean your process is not in control — a mixture of two streams, a drifting mean, a tool-change step, an outlier from a bad part. That is a special cause to investigate, not a distribution to transform. Box-Cox-ing an unstable process to make it look capable is precisely the malpractice mfgQC refuses to do silently — do not do it by hand either. Before reaching for a transform, plot the data on a control chart and confirm the process is stable. See Control charts.

The discipline: establish stability first, then assess capability. A capability index — normal or non-normal — only means something for an in-control process. If the control chart shows out-of-control signals, the assumption FAIL is a symptom of the instability; fix the process, don't reshape the math. If the chart is clean and the shape is genuinely non-normal, then opting into the matching method is the right, honest call — and mfgQC makes you make it on purpose.

Reading the assumption report¶

Where the report comes from¶

Anatomy of an assumption line¶

[PASS] vs [FAIL]: a verdict, not an action¶

[low power] and [oversensitive] — the reliability flag¶

Magnitude: statistical vs practical significance¶

A FAIL, worked end to end¶

When to opt into a correction — and when not to¶

See also¶

`[PASS]` vs `[FAIL]`: a verdict, not an action¶

`[low power]` and `[oversensitive]` — the reliability flag¶