Skip to content

0004 — face & eye/blink cull signal

Status: IN PROGRESS (2026-06-22) — companion to 0001-krites.md (master design), 0002-interface-contracts.md (the R-* test basis), and 0003-studio.md.

Decisions (2026-06-22, Matt). §9 open questions resolved: Q1 — both model shapes, config-selected (face.onnx.strategy: ear | classifier); seed with EAR, classifier slots in behind the same interface for A/B on real frames (gates Layer 2 only). Q2 — skip the pigo stopgap, go straight to ONNX-CPU. Q3 — blink demotes to maybe, never auto-rejects (wedding-default keeps EyeOpenHard = 0). Q4 — worst face above a min-size floor drives MinEyeOpen (background blinkers ignored).

Scope: the FaceAnalyzer provider seam and the eye/blink cull signal — the one Phase-1 cull capability (0001 §11) not yet built. This spec closes that gap. It deliberately splits the work into a platform-agnostic majority (interface, cull-signal logic, profile, pipeline wiring, fakes — all built and tested on the amd64 Linux dev box) and a small Apple-Silicon remainder (the CoreML execution provider + dylib packaging) deferred to an M-series Mac. It implements the runtime decision recorded in 0001 §13-Q1 (ONNX Runtime + CoreML EP via purego, no CGO, bundled dylib).


1. Why this spec, and the constraint that shapes it

Phase 1 is otherwise complete: ingest → quality (blur/exposure) + perceptual-hash burst dedup → verdicts → studio review → export + XMP. The missing signal is eyes/blinks (0001 §4.2, the "Eyes / blinks" row of the signal table): per face, are the subject's eyes open or closed. A blink on a wedding frame is one of the most common reasons a sharp, well-exposed shot is still unusable — so it is core cull value, not a nice-to-have.

The honest constraint: the production target is macOS / arm64 with the CoreML execution provider (0001 §13-Q6, Q1), and we are developing on linux / amd64. This spec exists to prove — and pin in requirements — that almost all of the eye/blink feature is platform-neutral and ships now, and that the Apple-specific part is a configuration swap, not a reimplementation.

1.1 The platform split (load-bearing)

ONNX Runtime is cross-platform; the CoreML EP is the only Apple-specific piece. The same model, the same purego binding, the same inference code run under different execution providers:

dev (today) prod (on the M-series Mac)
OS / arch linux / amd64 darwin / arm64
ORT shared lib libonnxruntime.so (linux-x64) libonnxruntime.dylib (osx-arm64)
Execution provider CPUExecutionProvider CoreMLExecutionProvider
Numerics reference same (CoreML accelerates, doesn't change the maths)

Because the CPU EP yields the same outputs as CoreML, accuracy is validated on Linux; the M-series only changes latency. This is what makes the deferral safe.


2. Package layout & the WASM boundary

0001 §7 already names pkg/analyze/face as "face / eye / expression (provider-backed)" and §12.1 / just wasm-check require the deterministic engine (including pkg/analyze/...) to stay cgo-free and WASM-compilable. A purego+dylib backend is not WASM-compatible (0001 §12.1, explicit). The resolution is a clean split:

pkg/analyze/face/            # PURE — interface + types + signal logic. WASM-safe.
  face.go                    #   Analyzer interface, Face, Result, Options
  signal.go                  #   Result → cull eye-signal projection (deterministic)
  face_test.go               #   table tests, fake analyzer
pkg/face/onnx/               # NATIVE — purego + ORT dylib. NOT in wasm-check.
  onnx.go                    #   Analyzer impl; EP selected from config
  models/                    #   (referenced, not vendored — see §6)
  onnx_integration_test.go   #   INT_TEST=1 gated; runs the real model on CPU EP
pkg/face/pigo/               # OPTIONAL pure-Go stopgap (see §7). WASM-safe.
  • R-FACE-1 (MUST) pkg/analyze/face contains only the Analyzer interface, its value types, and pure signal logic. It imports no native/cgo or purego+dylib code and stays in the just wasm-check build set.
  • R-FACE-2 (MUST) Every concrete model-backed analyzer (pkg/face/onnx, any cloud adapter) lives in its own package behind face.Analyzer and is excluded from wasm-check (0001 §7, §12.1).
  • R-FACE-3 (MUST) The analyzer is injected into the pipeline (functional option / struct field), never reached through a package-level mock hook (the keryx rule, 0001 §10) — so tests stay t.Parallel()-safe.

3. The FaceAnalyzer interface (pure, ships now)

A provider-neutral contract over "find faces, report per-face eye state". Decode is already the caller's job (the pipeline holds the decoded image.Image), so the analyzer takes pixels, not a path.

// Package face is the provider seam for face / eye-state analysis. Pure: the
// interface and its types only — concrete model backends live in their own
// packages (R-FACE-2).
package face

// Analyzer finds faces in a frame and reports per-face eye state. Implementations
// are model-backed (pkg/face/onnx) or faked in tests; the engine codes only to
// this interface.
type Analyzer interface {
    // Analyze reports the faces detected in img. It MUST be deterministic for a
    // given (img, model) and MUST honour ctx cancellation/timeout (R-GLOBAL-8).
    Analyze(ctx context.Context, img image.Image) (Result, error)
}

// Result is the per-frame face analysis.
type Result struct {
    Faces []Face
}

// Face is one detected face and its eye state.
type Face struct {
    // Box is the face bounding box in pixel coords.
    Box image.Rectangle
    // Confidence is the detector's face-presence score (0..1).
    Confidence float64
    // EyeOpen is the probability (0..1) that the eyes are open. 1 = wide open,
    // 0 = fully closed/blinking. A model that cannot judge eyes reports a
    // documented sentinel (see R-FACE-5).
    EyeOpen float64
}
  • R-FACE-4 (MUST) Analyze is deterministic given the same image and model (R-GLOBAL-7) and respects ctx with a timeout (R-GLOBAL-8).
  • R-FACE-5 (MUST) EyeOpen is a per-face probability in [0,1]. When a backend detects a face but cannot assess eyes, it reports EyeOpen = -1 (unknown), and the cull-signal projection treats unknown as "do not penalise" (R-EYE-4). Unknown is never silently coerced to a number in [0,1].
  • R-FACE-6 (MUST) A frame with no faces yields Result{Faces: nil} and is never penalised for eyes (a landscape/detail shot has no eyes to close).

The deterministic projection from a face Result to the cull signal, plus the verdict semantics. This is the heart of the feature and is entirely platform-neutral.

4.1 New signal fields

pkg/cull.Signals gains:

// FaceCount is the number of faces large enough to drive the eye signal. It
// keeps a zero-value Signals safe: FaceCount == 0 means "no eyes judged", so a
// zero-valued MinEyeOpen (0.0) is not mistaken for "eyes fully closed".
FaceCount int
// MinEyeOpen is the lowest per-face eye-open probability (0..1) among the
// counted faces, or -1 (face.EyeUnknown) when eye state is unknown. The "worst"
// face drives the signal: one blinking subject spoils a group shot.
MinEyeOpen float64

EyeSignal in pkg/analyze/face computes both from a Result and the frame bounds, counting only faces whose box clears a minimum-size fraction of the frame (Q4 — background blinkers are ignored) and taking MinEyeOpen as the min over those with a known EyeOpen. FaceCount is not redundant with the -1 sentinel: the zero value of MinEyeOpen is a valid probability (eyes closed), so a presence flag is what makes a zero-value Signals skip the gate. The size floor is passed in from the profile (MinFaceBox, §4.2).

4.2 New profile knobs

pkg/cull.Profile gains (config-driven, never Go constants — 0001 §6):

// EyeOpenSoft is the eye-open soft floor (0..1): a frame whose worst face is
// below it (but at/above EyeOpenHard) is demoted to maybe ("subject blinking").
// 0 disables the eye signal entirely.
EyeOpenSoft float64 `yaml:"eye_open_soft" json:"eyeOpenSoft"`
// EyeOpenHard is the eye-open hard gate (0..1): below it the frame is rejected
// (eyes clearly shut). 0 disables the hard gate (soft-only).
EyeOpenHard float64 `yaml:"eye_open_hard" json:"eyeOpenHard"`
// MinFaceBox is the minimum face-box size as a fraction of the frame's smaller
// dimension (0..1) for a face to count toward the eye signal; faces below it
// (distant background guests) are ignored. 0 counts every detected face.
MinFaceBox float64 `yaml:"min_face_box" json:"minFaceBox"`

Seed values for wedding-default (tunable starting points, calibrated against real previews — like the sharpness seeds):

knob seed rationale
EyeOpenSoft 0.50 a likely blink → maybe, surfaced for review
EyeOpenHard 0.0 (disabled) no auto-reject on eyes by default — see R-EYE-3
MinFaceBox 0.10 ignore faces below 10% of the frame's short side (background guests)

4.3 Verdict semantics

cull.Resolve extends with the eye gate, slotted alongside the existing hard/soft logic (focus, exposure):

  • R-EYE-1 (MUST) When FaceCount > 0 and MinEyeOpen is known and EyeOpenHard > 0 and MinEyeOpen < EyeOpenHardreject with reason "eyes closed (NN% open, below MM%)".
  • R-EYE-2 (MUST) Else when known and EyeOpenSoft > 0 and MinEyeOpen < EyeOpenSoftmaybe with reason "subject blinking (NN% eye-open, below MM%)" (does not override an existing hard reject from focus/exposure).
  • R-EYE-3 (MUST) The wedding-default seed sets EyeOpenHard = 0 (disabled): a blink demotes to maybe, never auto-rejects. Rationale: a closed-eye frame may still be the only record of a moment; the human disposes (R-CULL-3). Hard-reject-on-blink is opt-in by raising EyeOpenHard.
  • R-EYE-4 (MUST) MinEyeOpen == -1 (no faces, or eyes unknown) never affects the verdict — focus/exposure/dedup decide as they do today.
  • R-EYE-5 (MUST) Resolution stays pure given (Signals, Profile) (R-CULL-4); the eye fields are just more signal. Golden-fixture unit tests pin every branch.

4.4 Burst ranking uses eyes (pure, ships now)

Within a near-duplicate burst (pkg/pipeline best-of-burst), the kept frame is currently the sharpest. Eyes should inform that pick:

  • R-EYE-6 (SHOULD) Among a near-duplicate cluster, best-of-burst prefers the frame with open eyes before sharpness when one frame is blinking and another is not (a marginally softer open-eyed frame beats a sharp blink). Deterministic and unit-tested (R-DUP-2). The exact ranking weight is a tuning detail seeded conservatively and left profile-adjustable.

5. Pipeline wiring (pure orchestration, ships now)

pkg/pipeline.Cull gains an optional injected face.Analyzer:

  • R-FACE-7 (MUST) When no analyzer is injected, Cull behaves exactly as today (eye signal absent, MinEyeOpen = -1) — eye support is purely additive and cannot regress the model-free cull.
  • R-FACE-8 (MUST) When an analyzer is injected, judge calls Analyze(ctx, img) on the same decoded image used for quality/hash, projects the eye signal (§4.1), and feeds it to Resolve. A per-frame analyzer error or timeout degrades gracefully to MinEyeOpen = -1 for that frame (the cull still completes; the frame is logged, not failed) — eyes are a signal, not a gate on the run.
  • R-FACE-9 (MUST) Analysis runs against the cull-time image — the embedded JPEG preview per 0001 §13-Q4 — not a full RAW decode.
  • R-FACE-10 (SHOULD) Per-frame eye results are cached under .krites/analysis/ like other signals (R-GLOBAL-6), keyed by frame + model id, so --reanalyze controls recompute.

The studio already renders verdict reasons and a providers indicator; eye reasons and the faces provider status flow through the existing surfaces with no new UI contract (the 0003 filter chips already list blinks).


6. The ONNX adapter (pkg/face/onnx) — built and validated on Linux

This is the native backend. It is fully developed and accuracy-validated on linux/amd64 with the CPU EP; the M-series adds the CoreML EP.

  • R-MLR-1 (MUST) Inference uses ONNX Runtime via puregono CGO (0001 §13-Q1). The ORT shared library is loaded at runtime (libonnxruntime.so on linux dev, …dylib on darwin prod); its path is resolved from config with a documented default.
  • R-MLR-2 (MUST) The execution provider is config-selected: cpu (dev default) or coreml (darwin). Selecting an EP unavailable on the host falls back to CPU with a logged disclosure, never a hard failure — so a dev build on Linux and a misconfigured Mac both still run.
  • R-MLR-3 (MUST) The real-model test is INT_TEST=1-gated, in *_integration_test.go, run via just test-integration (0001 §10) — it runs on CPU EP on Linux in CI/dev and asserts known fixtures (open-eyed → high EyeOpen; blink → low). Not build-tag gated.
  • R-MLR-4 (MUST) Models are not vendored into the repo; they are fetched/ resolved out-of-band to a cache dir and referenced by id + checksum (size + licensing). The adapter validates the checksum before load.
  • R-MLR-5 (SHOULD) The adapter exposes a model id (name + version) surfaced in the analysis cache key (R-FACE-10) so a model change invalidates stale results.

6.1 Model strategy — both, config-selected (§9-Q1 resolved)

Two shapes, both ONNX, both runnable on CPU EP, both behind face.Analyzer so they are interchangeable and A/B-comparable on real frames:

  1. ear — detector + landmarks → eye-aspect-ratio. A face-landmark model (68-pt / mesh) → compute EAR (eye height ÷ width) geometrically → map to EyeOpen. One model; the open/closed decision is our pure, unit-tested code (testable + tunable on Linux without the model). Weaker on extreme head poses.
  2. classifier — detector + eye-state classifier. A small face detector (YuNet / SCRFD / BlazeFace-class) → crop eye regions → a lightweight open/closed CNN. Direct EyeOpen; typically more pose/lighting-robust; two models, and the decision is opaque to our tests.

  3. R-MLR-6 (MUST) The strategy is config-selected (face.onnx.strategy: ear | classifier); both emit the same face.Result so the engine and cull signal are identical regardless. Default: ear (most validated on Linux, smallest model surface). The classifier is wired for comparison, not abandoned — Hailey A/Bs them on her own frames once on the M-series.

Both share the same purego/ORT plumbing in pkg/face/onnx; the strategy is a graph + post-processing switch, not a separate package.


7. Optional pure-Go stopgap (pkg/face/pigo)

To exercise the whole feature end-to-end on Linux with real detections (not just canned fakes) before the ONNX path lands — and to keep a WASM-safe backend in the catalog — a pigo-based adapter (pure-Go face + pupil detection, cgo-free) can sit behind face.Analyzer. It approximates eye state from pupil-detection confidence / geometry; weaker than a CNN but adds zero native dependency.

  • R-FACE-11 (MAY) A pkg/face/pigo adapter implements face.Analyzer in pure Go for dev/demo and as a WASM-capable fallback. It is swappable for pkg/face/onnx with no call-site change and is clearly labelled lower-fidelity.

Decision on whether to build it at all: §9-Q2. Default recommendation: skip it — the ONNX-CPU adapter already runs on Linux, so a throwaway dep buys little once §6 lands; the fake analyzer already de-risks the pure integration.


8. Implementation plan (TDD/BDD, mapped to the platform split)

Layer 1 — platform-agnostic, no models (build + ship on Linux): 1. pkg/analyze/face: Analyzer interface, types, signal.go projection, fake. Mocks via just mocks. (R-FACE-1..6) 2. pkg/cull: add FaceCount/MinEyeOpen to Signals; EyeOpenSoft/ EyeOpenHard to Profile; extend Resolve; seed wedding-default. Golden unit tests for every branch. (R-EYE-1..5) 3. pkg/pipeline: inject optional face.Analyzer; nil-safe; graceful per-frame degrade; eye-aware best-of-burst. Unit tests with a fake. (R-FACE-7,8, R-EYE-6)

Layer 2 — the real adapter, built + validated on Linux (CPU EP): 4. pkg/face/onnx: purego ORT binding, config-driven EP (cpu default), model load + checksum, Analyze. (R-MLR-1..6) 5. Wire the analyzer into the cull command + studio cull action (config- selected provider, off by default), so the binary can produce eye verdicts. 6. Pick + wire the model(s) per §6.1; INT_TEST=1 integration test asserting fixtures on CPU EP in CI. Calibrate seeds against real previews. 7. BDD: extend features/culling.feature with blink scenarios (open-eyed keep, blink → maybe, group-with-one-blink, background-blinker-ignored), driven by the real provider against committed eye fixtures. Deferred to here — not Layer 1 — because the e2e suite drives the compiled binary, which has no analyzer until step 5 (injecting a fake into the shipping binary would be a test-only hook, which the keryx rule forbids).

Layer 3 — Apple-Silicon remainder (deferred to the M-series Mac): 8. Wire + validate the CoreML EP; confirm CPU↔CoreML parity on-device. 9. Bundle the osx-arm64 ORT dylib + goreleaser packaging / signing / notarization; latency benchmark on the Neural Engine.

Definition of done for the Linux-shippable slice: Layers 1–2 green under just ci (incl. wasm-check) and just test-integration; docs updated; the studio shows eye reasons. Layer 3 is tracked but explicitly out of this slice.

8.1 Deferred polish (from the Layer-1 code review)

Tracked, low-priority refinements not blocking the slice:

  • Profile validation EyeOpenHard <= EyeOpenSoft. A profile misconfigured with hard > soft auto-rejects a band meant only to warn. Belongs to the broader cull-profile validation story (krites profile …), not a per-feature guard. Until then, the seeds are correct (hard = 0).
  • Eye-aware demotion reason. When best-of-burst keeps an open-eyed frame over a sharper blink, the dropped frame's reason is generic ("kept the best of the burst"); it could name why (open eyes) so the pick doesn't look like a sharpness regression.
  • Graded burst ranking. Ranking classifies eyes as a binary at the blink threshold, so among two sub-threshold blinks the less-shut one isn't preferred. Faithful to R-EYE-6's "open vs blink" wording; a monotonic compare would generalise it. The spec notes the weight is conservative and profile-adjustable.

9. Open questions — all resolved (2026-06-22, Matt)

  • Q1 — model shape.Both, config-selected (face.onnx.strategy: ear | classifier); seed ear, classifier slots in behind the same interface for A/B (§6.1, R-MLR-6). Gates Layer 2 only.
  • Q2 — pigo stopgap?Skip — go straight to ONNX-CPU (pkg/face/onnx runs on Linux via CPU EP; the fake covers Layer 1). pkg/face/pigo is not built (R-FACE-11 stays a MAY, unexercised).
  • Q3 — eye-reject default.Maybe, never auto-rejectwedding-default keeps EyeOpenHard = 0 (R-EYE-3). Hard-reject is opt-in.
  • Q4 — multi-face weighting.Worst face above a min-size floor — a MinFaceBox profile knob; MinEyeOpen is the min EyeOpen over faces whose box clears the floor, so background blinkers don't demote a portrait.