Event Taxonomy
CTWise Event Taxonomy is a curated classification system that automatically maps manufacturing events to regulatory requirements with evidence-backed confidence scores.
What is the Event Taxonomy?
The Event Taxonomy is a structured knowledge base of 39 quality event types organized across 10+ categories, designed to classify manufacturing events and map them to relevant CFR regulations and ICH guidelines.
Key Facts
| Attribute | Detail |
|---|---|
| Event Types | 39 curated manufacturing event types |
| Categories | 10+ compliance domains (environmental, documentation, equipment, quality systems, etc.) |
| Product Types | Drug, Food, Device, Supplement, API |
| CFR Mappings | Product-type-specific regulatory mappings |
| Confidence Scoring | 0.00-1.00 scale with 0.60 minimum threshold |
| Scoring Algorithm | Hybrid keyword matching + Bedrock Titan v2 embeddings with Platt sigmoid calibration |
| Compound Observations | Top-N classification for multi-issue observations (up to 5 distinct event types) |
| Format | JSONL stored in S3 with pointer versioning |
| Current Version | v3.0 (39 event types) |
Event Type Structure
Each event type in the taxonomy includes:
- Event ID -- unique identifier (e.g., "pest_control")
- Category -- compliance domain (e.g., "environmental_controls")
- Severity -- risk level (critical, high, medium, low)
- Keywords -- classification terms (multi-word phrases, domain terms)
- CFR Mappings -- product-type-specific regulation references
- ICH Guidelines -- international guideline cross-references
- Confidence Weights -- TF-IDF rarity scores for distinctive terms
Why It Matters
The Problem
Manufacturing organizations face these challenges when classifying quality events:
- Manual classification -- Quality teams spend hours categorizing events, slowing response time
- Inconsistent classification -- Different analysts classify the same event type differently
- Missing regulatory context -- Events are logged without connecting them to specific CFR requirements
- No confidence measurement -- Classification decisions lack quantifiable evidence scores
- Product-type confusion -- Same event maps to different regulations for drugs vs. food vs. devices
The Cost of Misclassification
| Issue | Typical Impact |
|---|---|
| Delayed CAPA initiation | Regulatory observation escalation |
| Wrong CFR cited in response | FDA Form 483 follow-up citation |
| Missed trend analysis | Repeated violations, Warning Letter risk |
| Incomplete investigation | OAI classification at next inspection |
| Product-type mismatch | Citing 21 CFR 211 (drug) for food facility event |
The Solution
Event Taxonomy provides:
- Automated classification -- Classify events in under 500ms via API
- Standardized categories -- Consistent event types across all facilities
- Evidence-backed scoring -- Confidence scores with transparent methodology
- Regulatory mapping -- Direct connection to applicable CFR sections and ICH guidelines
- Product-type awareness -- Correct regulation mapping for drug/food/device/supplement/API
How It Works -- Hybrid Entity Resolution Algorithm
The Event Taxonomy uses a hybrid 6-step entity resolution algorithm combining keyword matching with neural embeddings to classify manufacturing events:
Classification Flow
Input Text → Extract Keywords → Infer Product Type → Keyword Score → Embedding Score → Hybrid Merge → Apply Threshold → Return Match
Step-by-Step Process
Step 1: Extract Keywords
- Remove 50+ domain stopwords ("the", "a", "found", "observed", etc.)
- Extract multi-word phrases (bigrams, trigrams)
- Preserve technical terms ("pest control", "batch record", "data integrity")
- Normalize text (lowercase, punctuation removal)
- Multi-word plural normalization ("audit trails" matches "audit trail", "media fills" matches "media fill")
Step 2: Infer Product Type
Detect product type from text keywords:
| Product Type | Detection Keywords |
|---|---|
| drug | pharmaceutical, tablet, capsule, injectable, API |
| food | food, beverage, HACCP, allergen, pathogen |
| supplement | dietary supplement, vitamin, herbal |
| api | active pharmaceutical ingredient, bulk drug |
| device | medical device, implant, diagnostic, 510(k) |
Default: drug (if no keywords match)
Step 3: Keyword Score (Fast Path)
For each of the 39 event types, calculate a keyword-based composite score:
Scoring Formula:
base_score = (0.4 x coverage_score) + (0.6 x absolute_score)
keyword_score = base_score x (1.0 + rarity_boost x 0.4)
Where:
coverage_score= ratio of matched keywords to total event keywordsabsolute_score= count of matched keywords / max possible matchesrarity_boost= TF-IDF weight for distinctive terms (e.g., "spider" = high rarity)- Multi-word fuzzy matching supports morphological variants (prefix-based, min 5-char prefix, 60% length ratio)
Step 4: Embedding Score (Semantic Path)
For ambiguous or low-keyword-match inputs, compute a neural embedding similarity score:
Embedding Pipeline:
input_text → Amazon Bedrock Titan Text Embedding v2 (1024-dim) → cosine similarity vs taxonomy embeddings
Platt Sigmoid Calibration:
calibrated_confidence = 1 / (1 + exp(-(A x cosine_sim + B)))
Where A=9.44, B=-3.03 (fitted from labeled calibration data), blended with keyword score using alpha=0.3.
Hybrid Merge:
final_score = (alpha x keyword_score) + ((1 - alpha) x embedding_score)
This hybrid approach ensures high-confidence keyword matches are preserved while allowing semantic understanding for novel phrasings.
Step 5: Apply Threshold
- Minimum confidence: 0.60
- Events scoring below 0.60 are rejected
- Top-scoring event above threshold is selected
- If no events exceed threshold, return "unclassified"
Step 6: Return Match
Return classified event with:
- Event type ID and category
- Severity level
- Product-type-specific CFR mappings
- ICH guideline references
- Confidence score
- Source provenance (taxonomy version, algorithm version)
Compound Observation Support (Top-N Classification)
Real-world 483 observations frequently describe multiple issues in a single sentence. The classifier supports top-N classification to capture all relevant event types:
Request:
{
"event": "The firm's QU failed to ensure CGMP compliance, failed to ensure adequate investigations, and failed to establish adequate systems for document control",
"product_type": "drug",
"top_n": 3
}
Response includes:
- Primary classification: Highest-scoring event type (e.g.,
quality_unit_failure) - Secondary classifications: Up to N-1 additional distinct event types above the confidence threshold, each with their own confidence score, category, severity, and applicable CFR sections
Top-N Rules:
- Each returned classification is a distinct event type (no duplicates)
- All secondary classifications must meet the 0.60 minimum confidence threshold
- Maximum of 5 classifications per request (
top_nrange: 1-5) - Secondary classifications include
applicable_cfr_sectionsfor immediate regulatory context
Event Categories
The taxonomy organizes 39 event types across 10+ categories:
| Category | Event Types | Example Events |
|---|---|---|
| environmental_controls | 3 | pest_control, temperature_excursion, environmental_monitoring_excursion |
| documentation | 3 | data_integrity_failure, batch_record_incomplete, sop_deviation |
| equipment | 3 | equipment_not_calibrated, equipment_maintenance, equipment_cleaning |
| manufacturing | 2 | cross_contamination, in_process_control |
| personnel | 2 | operator_training_gap, personnel_hygiene |
| labeling | 3 | label_mix_up, packaging_material, label_control |
| laboratory | 5 | laboratory_testing_failure, microbial_contamination, stability_program, laboratory_controls_deficiency, release_testing_failure |
| quality_systems | 3 | capa_failure, investigation_deficiency, quality_unit_failure |
| quality_management | 4 | complaint_handling, deviation_management, change_control, change_control_failure |
| food_safety | 2 | haccp_plan, allergen_control |
| validation | 2 | process_validation, validation_failure |
| medical_device | 2 | design_control, dhr_incomplete |
| facilities | 1 | facility_design |
| utilities | 1 | water_system_failure |
| supply_chain | 1 | supplier_qualification_gap |
| stability | 1 | stability_testing_gap |
| complaints | 1 | complaint_handling_deficiency |
Product Type Coverage
CFR mappings adapt to product type, ensuring correct regulation citation:
Example: Pest Control Event
| Product Type | CFR Mapping | Regulation Title |
|---|---|---|
| drug | 21 CFR 211.56 | Buildings and Facilities -- Sanitation |
| food | 21 CFR 117.35 | Sanitary Operations |
| supplement | 21 CFR 111.20 | What sanitation requirements apply to your physical plant and grounds? |
| api | 21 CFR 211.56 | Buildings and Facilities -- Sanitation |
| device | 21 CFR 820.70 | Production and Process Controls |
Example: Calibration Failure Event
| Product Type | CFR Mapping | Regulation Title |
|---|---|---|
| drug | 21 CFR 211.68 | Automatic, Mechanical, and Electronic Equipment |
| food | 21 CFR 117.160 | Calibration of Process Monitoring and Control Instruments |
| supplement | 21 CFR 111.160 | What requirements apply to laboratory methods, facilities, and controls? |
| api | 21 CFR 211.68 | Automatic, Mechanical, and Electronic Equipment |
| device | 21 CFR 820.72 | Inspection, Measuring, and Test Equipment |
Example Classification Flow
Here's a complete example showing all 5 steps:
Input Event
"Spider found in manufacturing area during routine inspection"
Step 1: Keywords Extracted
{
"extracted_keywords": ["spider", "manufacturing", "area", "routine", "inspection"],
"stopwords_removed": ["found", "in", "during"],
"multi_word_phrases": ["manufacturing area", "routine inspection"]
}
Step 2: Product Type Inferred
{
"product_type": "drug",
"reasoning": "No specific product keywords detected, using default",
"confidence": 0.50
}
Step 3: Hybrid Scoring
{
"scored_events": [
{
"event_type": "pest_control",
"category": "environmental_controls",
"keyword_score": 0.85,
"embedding_score": 0.99,
"hybrid_score": 0.97,
"matched_keywords": ["spider", "pest", "manufacturing", "area"],
"rarity_boost": 0.15,
"scoring_method": "hybrid_keyword_embedding"
},
{
"event_type": "environmental_monitoring_excursion",
"category": "environmental_controls",
"keyword_score": 0.20,
"embedding_score": 0.52,
"hybrid_score": 0.42,
"matched_keywords": ["area"],
"rarity_boost": 0.02,
"scoring_method": "hybrid_keyword_embedding"
}
]
}
Step 4: Threshold Check
{
"threshold": 0.60,
"passed_events": [
{
"event_type": "pest_control",
"score": 0.97,
"status": "PASS"
}
],
"rejected_events": [
{
"event_type": "contamination",
"score": 0.45,
"status": "FAIL (below threshold)"
}
]
}
Step 5: Return Match
{
"event_type": "pest_control",
"category": "environmental_controls",
"severity": "critical",
"confidence": 0.97,
"cfr_mappings": [
{
"cfr": "21 CFR 211.56",
"title": "Buildings and Facilities -- Sanitation",
"product_type": "drug"
}
],
"ich_mappings": [
{
"guideline": "ICH Q7",
"section": "3.1",
"title": "Buildings and Facilities"
}
],
"matched_keywords": ["spider", "pest", "manufacturing", "area"],
"algorithm_version": "v3.0",
"scoring_method": "hybrid_keyword_embedding"
}
APIs That Use Event Taxonomy
The Event Taxonomy powers multiple CTWise intelligence endpoints:
1. Event Classification API
Classify a single event description into a taxonomy event type:
API: POST /v1/kg/classify
curl -X POST https://api.ctwise.ai/v1/kg/classify \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event": "Spider found in manufacturing area",
"product_type": "drug"
}'
Response:
{
"event_classification": {
"event_type": "pest_control",
"event_category": "environmental_controls",
"severity": "critical",
"confidence": 0.97
},
"applicable_cfr_sections": ["21 CFR 211.56"]
}
Compound Observation (Top-N) Example
For observations describing multiple issues, request up to 5 classifications:
curl -X POST https://api.ctwise.ai/v1/kg/classify \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event": "The QU failed to ensure CGMP compliance, failed to ensure adequate investigations, and failed to establish adequate systems for document control",
"product_type": "drug",
"top_n": 3
}'
Response:
{
"event_classification": {
"event_type": "quality_unit_failure",
"event_category": "quality_systems",
"severity": "critical",
"confidence": 0.95
},
"applicable_cfr_sections": ["21 CFR 211.22"],
"top_n_classifications": [
{
"event_type": "investigation_deficiency",
"event_category": "quality_systems",
"severity": "critical",
"confidence": 0.88,
"applicable_cfr_sections": ["21 CFR 211.192"]
},
{
"event_type": "change_control_failure",
"event_category": "quality_management",
"severity": "major",
"confidence": 0.72,
"applicable_cfr_sections": ["21 CFR 211.100"]
}
]
}
2. Full Investigation API
Classify an event and get complete regulatory context with similar 483 observations:
API: POST /v1/intelligence/investigate
curl -X POST https://api.ctwise.ai/v1/intelligence/investigate \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event_text": "Temperature excursion in cold storage",
"product_type": "drug",
"include_similar_observations": true,
"include_regulatory_text": true
}'
Response includes:
- Event classification with confidence score
- Product-type-specific CFR mappings
- ICH guideline cross-references
- Similar 483 observations from FDA inspections
- Full eCFR regulation text
- Risk assessment with evidence chain
3. Trending Event Types API
Analyze trending event types across time periods:
curl -X GET "https://api.ctwise.ai/v1/analytics/trends?event_type=pest_control&period=6m" \
-H "X-Api-Key: YOUR_API_KEY"
Response:
{
"event_type": "pest_control",
"category": "environmental_controls",
"trend_data": [
{
"month": "2026-01",
"occurrence_count": 23,
"facilities_affected": 18,
"avg_confidence": 0.94
},
{
"month": "2026-02",
"occurrence_count": 31,
"facilities_affected": 24,
"avg_confidence": 0.96
}
],
"trend_direction": "increasing",
"change_percentage": 34.8
}
Versioning & Updates
Storage Format
Event Taxonomy uses JSONL (JSON Lines) format stored in Amazon S3:
s3://ctwise-data-lake-{env}/483-intelligence/datasets/event-taxonomy/
├── current.json # Pointer to active version
├── v3/
│ └── event-taxonomy-v3.0.jsonl # 39 event types
├── v2/
│ └── event-taxonomy-v2.0.jsonl # 30 event types (archived)
└── v1/
└── event-taxonomy.jsonl # 24 event types (archived)
Pointer Versioning
current.json contains a version pointer:
{
"version": "v3.0",
"released": "2026-03-13",
"event_count": 39,
"s3_path": "s3://ctwise-data-lake-{env}/483-intelligence/datasets/event-taxonomy/v3/event-taxonomy-v3.0.jsonl"
}
Version History
| Version | Released | Event Types | Key Changes |
|---|---|---|---|
| v3.0 | 2026-03-13 | 39 | Added 9 event types (quality_systems, supply_chain categories); hybrid keyword+embedding scoring; Platt sigmoid calibration; top-N classification for compound observations; multi-word plural keyword matching; enriched keywords for data_integrity, env_monitoring, cross_contamination, operator_training |
| v2.0 | 2025-12-01 | 30 | Added food safety, medical device, utilities categories; enhanced scoring algorithm |
| v1.0 | 2024-06-15 | 24 | Initial release with core GMP event types |
Update Process
- Quarterly Review -- CTWise team reviews FDA inspection trends
- Event Type Evaluation -- Identify emerging event patterns
- Validation -- Test new event types against 483 observation corpus
- Release -- Deploy new version with backward-compatible versioning
- API Transparency -- API responses include
algorithm_versionfield
Relationship to 483 Intelligence
The Event Taxonomy enables automated classification of 483 observations, connecting raw inspection findings to specific regulatory requirements.
How They Work Together
| Capability | 483 Intelligence | Event Taxonomy |
|---|---|---|
| What it answers | "What violations did FDA cite?" | "What type of event is this?" |
| Data source | FDA inspection observations | Curated event classification system |
| Output | 483 citations with CFR references | Event type with confidence score |
| Use together | Search for similar 483 observations... | ...based on classified event type |
Example: Combined Workflow
import requests
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.ctwise.ai/v1"
headers = {"X-Api-Key": API_KEY, "Content-Type": "application/json"}
# Step 1: Classify your manufacturing event
classification = requests.post(f"{BASE_URL}/kg/classify",
headers=headers,
json={
"event": "HPLC system not calibrated for 6 months",
"product_type": "drug"
}
).json()
# Returns: event_type=equipment_not_calibrated, cfr_sections=["21 CFR 211.68"]
# Step 2: Find similar 483 observations
observations = requests.post(f"{BASE_URL}/483/observations/search",
headers=headers,
json={
"query": "calibration failure HPLC",
"filters": {
"cfr": "21 CFR 211.68"
},
"top_k": 20
}
).json()
# Returns: 47 matching 483 citations with similarity scores
# Step 3: Get full regulatory text
regulation = requests.get(f"{BASE_URL}/kg/regulations/21%20CFR%20211.68",
headers=headers
).json()
# Returns: Full eCFR text, ICH cross-references, enforcement statistics
Integration Benefits
- Automated root cause analysis -- Connect your event to similar FDA observations
- Evidence-backed CAPA -- Reference specific 483 citations in corrective actions
- Predictive compliance -- See which events frequently lead to OAI classifications
- Regulatory trend awareness -- Track if your event type is trending in inspections
Getting Started
Ready to use the Event Taxonomy for automated event classification?
- KG Intelligence Overview -- Understand Knowledge Graph capabilities
- Event Classification API -- Classify events with confidence scores
- Full Investigation API -- Get complete regulatory context
- Trending Analysis API -- Track event type trends over time
- 483 Quickstart Guide -- Search similar 483 observations
- API Reference -- Complete endpoint documentation