Method / 03

Constraint Indexing

After parsing and augmentation, SatIR projects each trial's SMT constraints into a quantifier-free CNF formula and stores it as a first-class relational object in a database — making retrieval over hundreds of thousands of trials a standard SQL query.

Indexing / 01

The Clinical Trial Matching Task

Formally, matching is a constraint-satisfaction problem over trials and patients. The challenge is doing it at scale.

A Clinical Trial Matching task takes a medical ontology O, trials C, patient records P, and a match objective θ, and seeks all trial–patient pairs satisfying the objective and all formal constraints:

CTM(O, T, P, θ) = Match(c, p), ∀(c, p) ∈ SatIR(Φ(O, T, P, θ))

SatIR takes the combined SMT constraints T_C, P_C, and θ, and returns as few non-matching pairs as possible while guaranteeing that every true match is returned. Key principle: if all parsed constraints are necessary conditions, SatIR achieves 100% recall with respect to the SMT constraints.

Running a full SMT solver pairwise over every (trial, patient) combination is not feasible at scale. SatIR's solution is to project each entity's constraints into a simpler relational form that can be stored offline and evaluated at query time using database joins — without sacrificing recall.

Indexing / 02

Conservative SMT → CNF Projection

Non-canonical predicates — introduced by the parser for concepts not in the ontology — cannot be matched reliably across trials and patients. SatIR eliminates them via quantifier elimination, yielding a recall-safe CNF formula over canonical predicates only.

Patient records contain concrete facts, not Boolean clauses, so quantifier elimination can be applied independently to each trial and each patient. Elimination removes variables and subformulas involving non-canonical predicates while preserving entailment — so every solution to the original SMT constraints is also a solution to the projected formula.

The projection is conservative: when a clause cannot be safely translated (e.g., a disjunction where one arm contains a non-canonical predicate), the entire clause is dropped rather than truncated. This weakens the retrieval filter, increasing false positives but never introducing false negatives. The resulting formula is written in CNF:

CNF(e) = d₁ ∧ d₂ ∧ … ∧ d_m
where each d_i = a_i1 ∨ a_i2 ∨ … ∨ a_in

Each atomic constraint a_ij uses only canonical predicates and retained canonical qualifier fields. On the trial side, clinically acceptable alternatives — such as ontology-supported generalizations introduced during augmentation — are materialized as explicit disjunctions within the clauses. The patient side remains sparse, contributing only the projected atoms directly supported by the record.

Indexing / 03

CNF as First-Class Relational Objects

Each projected CNF is stored in five relational tables — one per level of the CNF hierarchy. This lets constraints be stored once offline and evaluated repeatedly at query time via joins.

SatIR maintains two databases: a trial database C_D built offline, and a patient database P_D built at query time. Both share the same five-table schema, making retrieval a symmetric relational join between them.

Table	Signature	Role
ECNF	(e, cnf)	Links each entity (trial or patient) to its projected CNF identifier
CNFD	(cnf, d)	Expands each CNF into its m disjunctive clauses, one row per clause
DA	(d, a)	Expands each clause into its n member atoms, one row per atom
AB	(a, π, cmp, t)	Stores Boolean atoms: canonical predicate π, comparison field cmp, qualifier t
AN	(a, π, cmp, t)	Stores numerical atoms: same schema, with t as (lower, upper, inclusive) bounds

Trial-side alternatives (from augmentation) are stored as multiple atoms linked to the same clause via DA — so disjunctive flexibility is encoded structurally rather than requiring special runtime logic. New trials can be indexed incrementally without re-processing existing entries.

← Constraint Augmentation Constraint Execution →