Constraint Indexing
After parsing and augmentation, SatIR projects each trial's SMT constraints into a quantifier-free CNF formula and stores it as a first-class relational object in a database — making retrieval over hundreds of thousands of trials a standard SQL query.
A Clinical Trial Matching task takes a medical ontology O, trials C, patient records P, and a match objective θ, and seeks all trial–patient pairs satisfying the objective and all formal constraints:
CTM(O, T, P, θ) = Match(c, p), ∀(c, p) ∈ SatIR(Φ(O, T, P, θ))
SatIR takes the combined SMT constraints TC, PC, and θ, and returns as few non-matching pairs as possible while guaranteeing that every true match is returned. Key principle: if all parsed constraints are necessary conditions, SatIR achieves 100% recall with respect to the SMT constraints.
Running a full SMT solver pairwise over every (trial, patient) combination is not feasible at scale. SatIR's solution is to project each entity's constraints into a simpler relational form that can be stored offline and evaluated at query time using database joins — without sacrificing recall.
Patient records contain concrete facts, not Boolean clauses, so quantifier elimination can be applied independently to each trial and each patient. Elimination removes variables and subformulas involving non-canonical predicates while preserving entailment — so every solution to the original SMT constraints is also a solution to the projected formula.
The projection is conservative: when a clause cannot be safely translated (e.g., a disjunction where one arm contains a non-canonical predicate), the entire clause is dropped rather than truncated. This weakens the retrieval filter, increasing false positives but never introducing false negatives. The resulting formula is written in CNF:
CNF(e) = d₁ ∧ d₂ ∧ … ∧ dm
where each di = ai1 ∨ ai2 ∨ … ∨ ain
Each atomic constraint aij uses only canonical predicates and retained canonical qualifier fields. On the trial side, clinically acceptable alternatives — such as ontology-supported generalizations introduced during augmentation — are materialized as explicit disjunctions within the clauses. The patient side remains sparse, contributing only the projected atoms directly supported by the record.
SatIR maintains two databases: a trial database CD built offline, and a patient database PD built at query time. Both share the same five-table schema, making retrieval a symmetric relational join between them.
| Table | Signature | Role |
|---|---|---|
| ECNF | (e, cnf) | Links each entity (trial or patient) to its projected CNF identifier |
| CNFD | (cnf, d) | Expands each CNF into its m disjunctive clauses, one row per clause |
| DA | (d, a) | Expands each clause into its n member atoms, one row per atom |
| AB | (a, π, cmp, t) | Stores Boolean atoms: canonical predicate π, comparison field cmp, qualifier t |
| AN | (a, π, cmp, t) | Stores numerical atoms: same schema, with t as (lower, upper, inclusive) bounds |
Trial-side alternatives (from augmentation) are stored as multiple atoms linked to the same clause via DA — so disjunctive flexibility is encoded structurally rather than requiring special runtime logic. New trials can be indexed incrementally without re-processing existing entries.