0004 — face & eye/blink cull signal¶
Status: IN PROGRESS (2026-06-22) — companion to 0001-krites.md
(master design), 0002-interface-contracts.md
(the R-* test basis), and 0003-studio.md.
Decisions (2026-06-22, Matt). §9 open questions resolved: Q1 — both model shapes, config-selected (
face.onnx.strategy: ear | classifier); seed with EAR, classifier slots in behind the same interface for A/B on real frames (gates Layer 2 only). Q2 — skip the pigo stopgap, go straight to ONNX-CPU. Q3 — blink demotes to maybe, never auto-rejects (wedding-defaultkeepsEyeOpenHard = 0). Q4 — worst face above a min-size floor drivesMinEyeOpen(background blinkers ignored).Scope: the
FaceAnalyzerprovider seam and the eye/blink cull signal — the one Phase-1 cull capability (0001§11) not yet built. This spec closes that gap. It deliberately splits the work into a platform-agnostic majority (interface, cull-signal logic, profile, pipeline wiring, fakes — all built and tested on the amd64 Linux dev box) and a small Apple-Silicon remainder (the CoreML execution provider + dylib packaging) deferred to an M-series Mac. It implements the runtime decision recorded in0001§13-Q1 (ONNX Runtime + CoreML EP viapurego, no CGO, bundled dylib).
1. Why this spec, and the constraint that shapes it¶
Phase 1 is otherwise complete: ingest → quality (blur/exposure) + perceptual-hash
burst dedup → verdicts → studio review → export + XMP. The missing signal is
eyes/blinks (0001 §4.2, the "Eyes / blinks" row of the signal table): per
face, are the subject's eyes open or closed. A blink on a wedding frame is one of
the most common reasons a sharp, well-exposed shot is still unusable — so it is
core cull value, not a nice-to-have.
The honest constraint: the production target is macOS / arm64 with the
CoreML execution provider (0001 §13-Q6, Q1), and we are developing on
linux / amd64. This spec exists to prove — and pin in requirements — that
almost all of the eye/blink feature is platform-neutral and ships now, and
that the Apple-specific part is a configuration swap, not a reimplementation.
1.1 The platform split (load-bearing)¶
ONNX Runtime is cross-platform; the CoreML EP is the only Apple-specific
piece. The same model, the same purego binding, the same inference code run
under different execution providers:
| dev (today) | prod (on the M-series Mac) | |
|---|---|---|
| OS / arch | linux / amd64 | darwin / arm64 |
| ORT shared lib | libonnxruntime.so (linux-x64) |
libonnxruntime.dylib (osx-arm64) |
| Execution provider | CPUExecutionProvider |
CoreMLExecutionProvider |
| Numerics | reference | same (CoreML accelerates, doesn't change the maths) |
Because the CPU EP yields the same outputs as CoreML, accuracy is validated on Linux; the M-series only changes latency. This is what makes the deferral safe.
2. Package layout & the WASM boundary¶
0001 §7 already names pkg/analyze/face as "face / eye / expression
(provider-backed)" and §12.1 / just wasm-check require the deterministic engine
(including pkg/analyze/...) to stay cgo-free and WASM-compilable. A
purego+dylib backend is not WASM-compatible (0001 §12.1, explicit). The
resolution is a clean split:
pkg/analyze/face/ # PURE — interface + types + signal logic. WASM-safe.
face.go # Analyzer interface, Face, Result, Options
signal.go # Result → cull eye-signal projection (deterministic)
face_test.go # table tests, fake analyzer
pkg/face/onnx/ # NATIVE — purego + ORT dylib. NOT in wasm-check.
onnx.go # Analyzer impl; EP selected from config
models/ # (referenced, not vendored — see §6)
onnx_integration_test.go # INT_TEST=1 gated; runs the real model on CPU EP
pkg/face/pigo/ # OPTIONAL pure-Go stopgap (see §7). WASM-safe.
R-FACE-1(MUST)pkg/analyze/facecontains only theAnalyzerinterface, its value types, and pure signal logic. It imports no native/cgo orpurego+dylib code and stays in thejust wasm-checkbuild set.R-FACE-2(MUST) Every concrete model-backed analyzer (pkg/face/onnx, any cloud adapter) lives in its own package behindface.Analyzerand is excluded fromwasm-check(0001§7, §12.1).R-FACE-3(MUST) The analyzer is injected into the pipeline (functional option / struct field), never reached through a package-level mock hook (the keryx rule,0001§10) — so tests stayt.Parallel()-safe.
3. The FaceAnalyzer interface (pure, ships now)¶
A provider-neutral contract over "find faces, report per-face eye state". Decode
is already the caller's job (the pipeline holds the decoded image.Image), so the
analyzer takes pixels, not a path.
// Package face is the provider seam for face / eye-state analysis. Pure: the
// interface and its types only — concrete model backends live in their own
// packages (R-FACE-2).
package face
// Analyzer finds faces in a frame and reports per-face eye state. Implementations
// are model-backed (pkg/face/onnx) or faked in tests; the engine codes only to
// this interface.
type Analyzer interface {
// Analyze reports the faces detected in img. It MUST be deterministic for a
// given (img, model) and MUST honour ctx cancellation/timeout (R-GLOBAL-8).
Analyze(ctx context.Context, img image.Image) (Result, error)
}
// Result is the per-frame face analysis.
type Result struct {
Faces []Face
}
// Face is one detected face and its eye state.
type Face struct {
// Box is the face bounding box in pixel coords.
Box image.Rectangle
// Confidence is the detector's face-presence score (0..1).
Confidence float64
// EyeOpen is the probability (0..1) that the eyes are open. 1 = wide open,
// 0 = fully closed/blinking. A model that cannot judge eyes reports a
// documented sentinel (see R-FACE-5).
EyeOpen float64
}
R-FACE-4(MUST)Analyzeis deterministic given the same image and model (R-GLOBAL-7) and respectsctxwith a timeout (R-GLOBAL-8).R-FACE-5(MUST)EyeOpenis a per-face probability in[0,1]. When a backend detects a face but cannot assess eyes, it reportsEyeOpen = -1(unknown), and the cull-signal projection treats unknown as "do not penalise" (R-EYE-4). Unknown is never silently coerced to a number in[0,1].R-FACE-6(MUST) A frame with no faces yieldsResult{Faces: nil}and is never penalised for eyes (a landscape/detail shot has no eyes to close).
4. The eye/blink cull signal (pure, ships now)¶
The deterministic projection from a face Result to the cull signal, plus the
verdict semantics. This is the heart of the feature and is entirely
platform-neutral.
4.1 New signal fields¶
pkg/cull.Signals gains:
// FaceCount is the number of faces large enough to drive the eye signal. It
// keeps a zero-value Signals safe: FaceCount == 0 means "no eyes judged", so a
// zero-valued MinEyeOpen (0.0) is not mistaken for "eyes fully closed".
FaceCount int
// MinEyeOpen is the lowest per-face eye-open probability (0..1) among the
// counted faces, or -1 (face.EyeUnknown) when eye state is unknown. The "worst"
// face drives the signal: one blinking subject spoils a group shot.
MinEyeOpen float64
EyeSignal in pkg/analyze/face computes both from a Result and the frame
bounds, counting only faces whose box clears a minimum-size fraction of the
frame (Q4 — background blinkers are ignored) and taking MinEyeOpen as the min
over those with a known EyeOpen. FaceCount is not redundant with the -1
sentinel: the zero value of MinEyeOpen is a valid probability (eyes closed),
so a presence flag is what makes a zero-value Signals skip the gate. The size
floor is passed in from the profile (MinFaceBox, §4.2).
4.2 New profile knobs¶
pkg/cull.Profile gains (config-driven, never Go constants — 0001 §6):
// EyeOpenSoft is the eye-open soft floor (0..1): a frame whose worst face is
// below it (but at/above EyeOpenHard) is demoted to maybe ("subject blinking").
// 0 disables the eye signal entirely.
EyeOpenSoft float64 `yaml:"eye_open_soft" json:"eyeOpenSoft"`
// EyeOpenHard is the eye-open hard gate (0..1): below it the frame is rejected
// (eyes clearly shut). 0 disables the hard gate (soft-only).
EyeOpenHard float64 `yaml:"eye_open_hard" json:"eyeOpenHard"`
// MinFaceBox is the minimum face-box size as a fraction of the frame's smaller
// dimension (0..1) for a face to count toward the eye signal; faces below it
// (distant background guests) are ignored. 0 counts every detected face.
MinFaceBox float64 `yaml:"min_face_box" json:"minFaceBox"`
Seed values for wedding-default (tunable starting points, calibrated against
real previews — like the sharpness seeds):
| knob | seed | rationale |
|---|---|---|
EyeOpenSoft |
0.50 |
a likely blink → maybe, surfaced for review |
EyeOpenHard |
0.0 (disabled) |
no auto-reject on eyes by default — see R-EYE-3 |
MinFaceBox |
0.10 |
ignore faces below 10% of the frame's short side (background guests) |
4.3 Verdict semantics¶
cull.Resolve extends with the eye gate, slotted alongside the existing
hard/soft logic (focus, exposure):
R-EYE-1(MUST) WhenFaceCount > 0andMinEyeOpenis known andEyeOpenHard > 0andMinEyeOpen < EyeOpenHard→ reject with reason"eyes closed (NN% open, below MM%)".R-EYE-2(MUST) Else when known andEyeOpenSoft > 0andMinEyeOpen < EyeOpenSoft→ maybe with reason"subject blinking (NN% eye-open, below MM%)"(does not override an existing hard reject from focus/exposure).R-EYE-3(MUST) Thewedding-defaultseed setsEyeOpenHard = 0(disabled): a blink demotes to maybe, never auto-rejects. Rationale: a closed-eye frame may still be the only record of a moment; the human disposes (R-CULL-3). Hard-reject-on-blink is opt-in by raisingEyeOpenHard.R-EYE-4(MUST)MinEyeOpen == -1(no faces, or eyes unknown) never affects the verdict — focus/exposure/dedup decide as they do today.R-EYE-5(MUST) Resolution stays pure given(Signals, Profile)(R-CULL-4); the eye fields are just more signal. Golden-fixture unit tests pin every branch.
4.4 Burst ranking uses eyes (pure, ships now)¶
Within a near-duplicate burst (pkg/pipeline best-of-burst), the kept frame is
currently the sharpest. Eyes should inform that pick:
R-EYE-6(SHOULD) Among a near-duplicate cluster, best-of-burst prefers the frame with open eyes before sharpness when one frame is blinking and another is not (a marginally softer open-eyed frame beats a sharp blink). Deterministic and unit-tested (R-DUP-2). The exact ranking weight is a tuning detail seeded conservatively and left profile-adjustable.
5. Pipeline wiring (pure orchestration, ships now)¶
pkg/pipeline.Cull gains an optional injected face.Analyzer:
R-FACE-7(MUST) When no analyzer is injected,Cullbehaves exactly as today (eye signal absent,MinEyeOpen = -1) — eye support is purely additive and cannot regress the model-free cull.R-FACE-8(MUST) When an analyzer is injected,judgecallsAnalyze(ctx, img)on the same decoded image used for quality/hash, projects the eye signal (§4.1), and feeds it toResolve. A per-frame analyzer error or timeout degrades gracefully toMinEyeOpen = -1for that frame (the cull still completes; the frame is logged, not failed) — eyes are a signal, not a gate on the run.R-FACE-9(MUST) Analysis runs against the cull-time image — the embedded JPEG preview per0001§13-Q4 — not a full RAW decode.R-FACE-10(SHOULD) Per-frame eye results are cached under.krites/analysis/like other signals (R-GLOBAL-6), keyed by frame + model id, so--reanalyzecontrols recompute.
The studio already renders verdict reasons and a providers indicator; eye reasons
and the faces provider status flow through the existing surfaces with no new UI
contract (the 0003 filter chips already list blinks).
6. The ONNX adapter (pkg/face/onnx) — built and validated on Linux¶
This is the native backend. It is fully developed and accuracy-validated on linux/amd64 with the CPU EP; the M-series adds the CoreML EP.
R-MLR-1(MUST) Inference uses ONNX Runtime viapurego— no CGO (0001§13-Q1). The ORT shared library is loaded at runtime (libonnxruntime.soon linux dev,…dylibon darwin prod); its path is resolved from config with a documented default.R-MLR-2(MUST) The execution provider is config-selected:cpu(dev default) orcoreml(darwin). Selecting an EP unavailable on the host falls back to CPU with a logged disclosure, never a hard failure — so a dev build on Linux and a misconfigured Mac both still run.R-MLR-3(MUST) The real-model test isINT_TEST=1-gated, in*_integration_test.go, run viajust test-integration(0001§10) — it runs on CPU EP on Linux in CI/dev and asserts known fixtures (open-eyed → highEyeOpen; blink → low). Not build-tag gated.R-MLR-4(MUST) Models are not vendored into the repo; they are fetched/ resolved out-of-band to a cache dir and referenced by id + checksum (size + licensing). The adapter validates the checksum before load.R-MLR-5(SHOULD) The adapter exposes a model id (name + version) surfaced in the analysis cache key (R-FACE-10) so a model change invalidates stale results.
6.1 Model strategy — both, config-selected (§9-Q1 resolved)¶
Two shapes, both ONNX, both runnable on CPU EP, both behind face.Analyzer
so they are interchangeable and A/B-comparable on real frames:
ear— detector + landmarks → eye-aspect-ratio. A face-landmark model (68-pt / mesh) → compute EAR (eye height ÷ width) geometrically → map toEyeOpen. One model; the open/closed decision is our pure, unit-tested code (testable + tunable on Linux without the model). Weaker on extreme head poses.-
classifier— detector + eye-state classifier. A small face detector (YuNet / SCRFD / BlazeFace-class) → crop eye regions → a lightweight open/closed CNN. DirectEyeOpen; typically more pose/lighting-robust; two models, and the decision is opaque to our tests. -
R-MLR-6(MUST) The strategy is config-selected (face.onnx.strategy: ear | classifier); both emit the sameface.Resultso the engine and cull signal are identical regardless. Default:ear(most validated on Linux, smallest model surface). The classifier is wired for comparison, not abandoned — Hailey A/Bs them on her own frames once on the M-series.
Both share the same purego/ORT plumbing in pkg/face/onnx; the strategy is a
graph + post-processing switch, not a separate package.
7. Optional pure-Go stopgap (pkg/face/pigo)¶
To exercise the whole feature end-to-end on Linux with real detections (not
just canned fakes) before the ONNX path lands — and to keep a WASM-safe backend
in the catalog — a pigo-based adapter (pure-Go face + pupil detection, cgo-free)
can sit behind face.Analyzer. It approximates eye state from pupil-detection
confidence / geometry; weaker than a CNN but adds zero native dependency.
R-FACE-11(MAY) Apkg/face/pigoadapter implementsface.Analyzerin pure Go for dev/demo and as a WASM-capable fallback. It is swappable forpkg/face/onnxwith no call-site change and is clearly labelled lower-fidelity.
Decision on whether to build it at all: §9-Q2. Default recommendation: skip it — the ONNX-CPU adapter already runs on Linux, so a throwaway dep buys little once §6 lands; the fake analyzer already de-risks the pure integration.
8. Implementation plan (TDD/BDD, mapped to the platform split)¶
Layer 1 — platform-agnostic, no models (build + ship on Linux):
1. pkg/analyze/face: Analyzer interface, types, signal.go projection, fake.
Mocks via just mocks. (R-FACE-1..6)
2. pkg/cull: add FaceCount/MinEyeOpen to Signals; EyeOpenSoft/
EyeOpenHard to Profile; extend Resolve; seed wedding-default. Golden
unit tests for every branch. (R-EYE-1..5)
3. pkg/pipeline: inject optional face.Analyzer; nil-safe; graceful per-frame
degrade; eye-aware best-of-burst. Unit tests with a fake. (R-FACE-7,8,
R-EYE-6)
Layer 2 — the real adapter, built + validated on Linux (CPU EP):
4. pkg/face/onnx: purego ORT binding, config-driven EP (cpu default), model
load + checksum, Analyze. (R-MLR-1..6)
5. Wire the analyzer into the cull command + studio cull action (config-
selected provider, off by default), so the binary can produce eye verdicts.
6. Pick + wire the model(s) per §6.1; INT_TEST=1 integration test asserting
fixtures on CPU EP in CI. Calibrate seeds against real previews.
7. BDD: extend features/culling.feature with blink scenarios (open-eyed
keep, blink → maybe, group-with-one-blink, background-blinker-ignored), driven
by the real provider against committed eye fixtures. Deferred to here — not
Layer 1 — because the e2e suite drives the compiled binary, which has no
analyzer until step 5 (injecting a fake into the shipping binary would be a
test-only hook, which the keryx rule forbids).
Layer 3 — Apple-Silicon remainder (deferred to the M-series Mac): 8. Wire + validate the CoreML EP; confirm CPU↔CoreML parity on-device. 9. Bundle the osx-arm64 ORT dylib + goreleaser packaging / signing / notarization; latency benchmark on the Neural Engine.
Definition of done for the Linux-shippable slice: Layers 1–2 green under
just ci (incl. wasm-check) and just test-integration; docs updated; the
studio shows eye reasons. Layer 3 is tracked but explicitly out of this slice.
8.1 Deferred polish (from the Layer-1 code review)¶
Tracked, low-priority refinements not blocking the slice:
- Profile validation
EyeOpenHard <= EyeOpenSoft. A profile misconfigured withhard > softauto-rejects a band meant only to warn. Belongs to the broader cull-profile validation story (krites profile …), not a per-feature guard. Until then, the seeds are correct (hard = 0). - Eye-aware demotion reason. When best-of-burst keeps an open-eyed frame over a sharper blink, the dropped frame's reason is generic ("kept the best of the burst"); it could name why (open eyes) so the pick doesn't look like a sharpness regression.
- Graded burst ranking. Ranking classifies eyes as a binary at the blink threshold, so among two sub-threshold blinks the less-shut one isn't preferred. Faithful to R-EYE-6's "open vs blink" wording; a monotonic compare would generalise it. The spec notes the weight is conservative and profile-adjustable.
9. Open questions — all resolved (2026-06-22, Matt)¶
- Q1 — model shape. ✅ Both, config-selected (
face.onnx.strategy: ear | classifier); seedear, classifier slots in behind the same interface for A/B (§6.1,R-MLR-6). Gates Layer 2 only. - Q2 — pigo stopgap? ✅ Skip — go straight to ONNX-CPU (
pkg/face/onnxruns on Linux via CPU EP; the fake covers Layer 1).pkg/face/pigois not built (R-FACE-11stays a MAY, unexercised). - Q3 — eye-reject default. ✅ Maybe, never auto-reject —
wedding-defaultkeepsEyeOpenHard = 0(R-EYE-3). Hard-reject is opt-in. - Q4 — multi-face weighting. ✅ Worst face above a min-size floor — a
MinFaceBoxprofile knob;MinEyeOpenis the minEyeOpenover faces whose box clears the floor, so background blinkers don't demote a portrait.