Methodology
How a drug page is built, end to end.
1 · Source retrieval
Per-regulator scrapers run on cron schedules and respect each source’s rate limits. Raw payloads are retained for an audit trail.
2 · Normalization
Regulator-specific fields are mapped to a consistent schema. Active ingredients are normalized via RxNorm and the WHO INN list; brands are cross-referenced to their ingredient per country; classification follows the WHO ATC hierarchy.
3 · AI-assisted compilation
A language model receives structured data plus verbatim source text and is prompted to compile (not author) per fixed templates, preserving verbatim quotes for safety-critical sections.
4 · Quality verification
Before anything is published, each compiled page must clear a series of automated quality checks: grounding against the cited source, verbatim preservation for safety-critical sections, citation discipline (no source names in body text), readability, and length discipline. Content that fails a check is held back rather than published.
5 · Publishing
Approved content is cached and rendered as static pages. Sitemap lastmoduses the regulator’s last revision date - not our publish date.
Limitations we disclose
- Information may lag - some regulators publish updates infrequently. Each page shows the last regulator revision date.
- Pill identification is US-only (DailyMed images).
- Comprehensive drug-drug interactions are not in V1.
- Translation accuracy depends on the clarity of the source label.
- Errors happen - all known errors are logged publicly in /corrections.