Skip to main content

Event Taxonomy

CTWise Event Taxonomy is a curated classification system that automatically maps manufacturing events to regulatory requirements with evidence-backed confidence scores.


What is the Event Taxonomy?

The Event Taxonomy is a structured knowledge base of 39 quality event types organized across 10+ categories, designed to classify manufacturing events and map them to relevant CFR regulations and ICH guidelines.

Key Facts

AttributeDetail
Event Types39 curated manufacturing event types
Categories10+ compliance domains (environmental, documentation, equipment, quality systems, etc.)
Product TypesDrug, Food, Device, Supplement, API
CFR MappingsProduct-type-specific regulatory mappings
Confidence Scoring0.00-1.00 scale with 0.60 minimum threshold
Scoring AlgorithmHybrid keyword matching + Bedrock Titan v2 embeddings with Platt sigmoid calibration
Compound ObservationsTop-N classification for multi-issue observations (up to 5 distinct event types)
FormatJSONL stored in S3 with pointer versioning
Current Versionv3.0 (39 event types)

Event Type Structure

Each event type in the taxonomy includes:

  • Event ID -- unique identifier (e.g., "pest_control")
  • Category -- compliance domain (e.g., "environmental_controls")
  • Severity -- risk level (critical, high, medium, low)
  • Keywords -- classification terms (multi-word phrases, domain terms)
  • CFR Mappings -- product-type-specific regulation references
  • ICH Guidelines -- international guideline cross-references
  • Confidence Weights -- TF-IDF rarity scores for distinctive terms

Why It Matters

The Problem

Manufacturing organizations face these challenges when classifying quality events:

  • Manual classification -- Quality teams spend hours categorizing events, slowing response time
  • Inconsistent classification -- Different analysts classify the same event type differently
  • Missing regulatory context -- Events are logged without connecting them to specific CFR requirements
  • No confidence measurement -- Classification decisions lack quantifiable evidence scores
  • Product-type confusion -- Same event maps to different regulations for drugs vs. food vs. devices

The Cost of Misclassification

IssueTypical Impact
Delayed CAPA initiationRegulatory observation escalation
Wrong CFR cited in responseFDA Form 483 follow-up citation
Missed trend analysisRepeated violations, Warning Letter risk
Incomplete investigationOAI classification at next inspection
Product-type mismatchCiting 21 CFR 211 (drug) for food facility event

The Solution

Event Taxonomy provides:

  • Automated classification -- Classify events in under 500ms via API
  • Standardized categories -- Consistent event types across all facilities
  • Evidence-backed scoring -- Confidence scores with transparent methodology
  • Regulatory mapping -- Direct connection to applicable CFR sections and ICH guidelines
  • Product-type awareness -- Correct regulation mapping for drug/food/device/supplement/API

How It Works -- Hybrid Entity Resolution Algorithm

The Event Taxonomy uses a hybrid 6-step entity resolution algorithm combining keyword matching with neural embeddings to classify manufacturing events:

Classification Flow

Input Text → Extract Keywords → Infer Product Type → Keyword Score → Embedding Score → Hybrid Merge → Apply Threshold → Return Match

Step-by-Step Process

Step 1: Extract Keywords

  • Remove 50+ domain stopwords ("the", "a", "found", "observed", etc.)
  • Extract multi-word phrases (bigrams, trigrams)
  • Preserve technical terms ("pest control", "batch record", "data integrity")
  • Normalize text (lowercase, punctuation removal)
  • Multi-word plural normalization ("audit trails" matches "audit trail", "media fills" matches "media fill")

Step 2: Infer Product Type

Detect product type from text keywords:

Product TypeDetection Keywords
drugpharmaceutical, tablet, capsule, injectable, API
foodfood, beverage, HACCP, allergen, pathogen
supplementdietary supplement, vitamin, herbal
apiactive pharmaceutical ingredient, bulk drug
devicemedical device, implant, diagnostic, 510(k)

Default: drug (if no keywords match)

Step 3: Keyword Score (Fast Path)

For each of the 39 event types, calculate a keyword-based composite score:

Scoring Formula:

base_score = (0.4 x coverage_score) + (0.6 x absolute_score)
keyword_score = base_score x (1.0 + rarity_boost x 0.4)

Where:

  • coverage_score = ratio of matched keywords to total event keywords
  • absolute_score = count of matched keywords / max possible matches
  • rarity_boost = TF-IDF weight for distinctive terms (e.g., "spider" = high rarity)
  • Multi-word fuzzy matching supports morphological variants (prefix-based, min 5-char prefix, 60% length ratio)

Step 4: Embedding Score (Semantic Path)

For ambiguous or low-keyword-match inputs, compute a neural embedding similarity score:

Embedding Pipeline:

input_text → Amazon Bedrock Titan Text Embedding v2 (1024-dim) → cosine similarity vs taxonomy embeddings

Platt Sigmoid Calibration:

calibrated_confidence = 1 / (1 + exp(-(A x cosine_sim + B)))

Where A=9.44, B=-3.03 (fitted from labeled calibration data), blended with keyword score using alpha=0.3.

Hybrid Merge:

final_score = (alpha x keyword_score) + ((1 - alpha) x embedding_score)

This hybrid approach ensures high-confidence keyword matches are preserved while allowing semantic understanding for novel phrasings.

Step 5: Apply Threshold

  • Minimum confidence: 0.60
  • Events scoring below 0.60 are rejected
  • Top-scoring event above threshold is selected
  • If no events exceed threshold, return "unclassified"

Step 6: Return Match

Return classified event with:

  • Event type ID and category
  • Severity level
  • Product-type-specific CFR mappings
  • ICH guideline references
  • Confidence score
  • Source provenance (taxonomy version, algorithm version)

Compound Observation Support (Top-N Classification)

Real-world 483 observations frequently describe multiple issues in a single sentence. The classifier supports top-N classification to capture all relevant event types:

Request:

{
"event": "The firm's QU failed to ensure CGMP compliance, failed to ensure adequate investigations, and failed to establish adequate systems for document control",
"product_type": "drug",
"top_n": 3
}

Response includes:

  • Primary classification: Highest-scoring event type (e.g., quality_unit_failure)
  • Secondary classifications: Up to N-1 additional distinct event types above the confidence threshold, each with their own confidence score, category, severity, and applicable CFR sections

Top-N Rules:

  • Each returned classification is a distinct event type (no duplicates)
  • All secondary classifications must meet the 0.60 minimum confidence threshold
  • Maximum of 5 classifications per request (top_n range: 1-5)
  • Secondary classifications include applicable_cfr_sections for immediate regulatory context

Event Categories

The taxonomy organizes 39 event types across 10+ categories:

CategoryEvent TypesExample Events
environmental_controls3pest_control, temperature_excursion, environmental_monitoring_excursion
documentation3data_integrity_failure, batch_record_incomplete, sop_deviation
equipment3equipment_not_calibrated, equipment_maintenance, equipment_cleaning
manufacturing2cross_contamination, in_process_control
personnel2operator_training_gap, personnel_hygiene
labeling3label_mix_up, packaging_material, label_control
laboratory5laboratory_testing_failure, microbial_contamination, stability_program, laboratory_controls_deficiency, release_testing_failure
quality_systems3capa_failure, investigation_deficiency, quality_unit_failure
quality_management4complaint_handling, deviation_management, change_control, change_control_failure
food_safety2haccp_plan, allergen_control
validation2process_validation, validation_failure
medical_device2design_control, dhr_incomplete
facilities1facility_design
utilities1water_system_failure
supply_chain1supplier_qualification_gap
stability1stability_testing_gap
complaints1complaint_handling_deficiency

Product Type Coverage

CFR mappings adapt to product type, ensuring correct regulation citation:

Example: Pest Control Event

Product TypeCFR MappingRegulation Title
drug21 CFR 211.56Buildings and Facilities -- Sanitation
food21 CFR 117.35Sanitary Operations
supplement21 CFR 111.20What sanitation requirements apply to your physical plant and grounds?
api21 CFR 211.56Buildings and Facilities -- Sanitation
device21 CFR 820.70Production and Process Controls

Example: Calibration Failure Event

Product TypeCFR MappingRegulation Title
drug21 CFR 211.68Automatic, Mechanical, and Electronic Equipment
food21 CFR 117.160Calibration of Process Monitoring and Control Instruments
supplement21 CFR 111.160What requirements apply to laboratory methods, facilities, and controls?
api21 CFR 211.68Automatic, Mechanical, and Electronic Equipment
device21 CFR 820.72Inspection, Measuring, and Test Equipment

Example Classification Flow

Here's a complete example showing all 5 steps:

Input Event

"Spider found in manufacturing area during routine inspection"

Step 1: Keywords Extracted

{
"extracted_keywords": ["spider", "manufacturing", "area", "routine", "inspection"],
"stopwords_removed": ["found", "in", "during"],
"multi_word_phrases": ["manufacturing area", "routine inspection"]
}

Step 2: Product Type Inferred

{
"product_type": "drug",
"reasoning": "No specific product keywords detected, using default",
"confidence": 0.50
}

Step 3: Hybrid Scoring

{
"scored_events": [
{
"event_type": "pest_control",
"category": "environmental_controls",
"keyword_score": 0.85,
"embedding_score": 0.99,
"hybrid_score": 0.97,
"matched_keywords": ["spider", "pest", "manufacturing", "area"],
"rarity_boost": 0.15,
"scoring_method": "hybrid_keyword_embedding"
},
{
"event_type": "environmental_monitoring_excursion",
"category": "environmental_controls",
"keyword_score": 0.20,
"embedding_score": 0.52,
"hybrid_score": 0.42,
"matched_keywords": ["area"],
"rarity_boost": 0.02,
"scoring_method": "hybrid_keyword_embedding"
}
]
}

Step 4: Threshold Check

{
"threshold": 0.60,
"passed_events": [
{
"event_type": "pest_control",
"score": 0.97,
"status": "PASS"
}
],
"rejected_events": [
{
"event_type": "contamination",
"score": 0.45,
"status": "FAIL (below threshold)"
}
]
}

Step 5: Return Match

{
"event_type": "pest_control",
"category": "environmental_controls",
"severity": "critical",
"confidence": 0.97,
"cfr_mappings": [
{
"cfr": "21 CFR 211.56",
"title": "Buildings and Facilities -- Sanitation",
"product_type": "drug"
}
],
"ich_mappings": [
{
"guideline": "ICH Q7",
"section": "3.1",
"title": "Buildings and Facilities"
}
],
"matched_keywords": ["spider", "pest", "manufacturing", "area"],
"algorithm_version": "v3.0",
"scoring_method": "hybrid_keyword_embedding"
}

APIs That Use Event Taxonomy

The Event Taxonomy powers multiple CTWise intelligence endpoints:

1. Event Classification API

Classify a single event description into a taxonomy event type:

API: POST /v1/kg/classify

curl -X POST https://api.ctwise.ai/v1/kg/classify \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event": "Spider found in manufacturing area",
"product_type": "drug"
}'

Response:

{
"event_classification": {
"event_type": "pest_control",
"event_category": "environmental_controls",
"severity": "critical",
"confidence": 0.97
},
"applicable_cfr_sections": ["21 CFR 211.56"]
}

Compound Observation (Top-N) Example

For observations describing multiple issues, request up to 5 classifications:

curl -X POST https://api.ctwise.ai/v1/kg/classify \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event": "The QU failed to ensure CGMP compliance, failed to ensure adequate investigations, and failed to establish adequate systems for document control",
"product_type": "drug",
"top_n": 3
}'

Response:

{
"event_classification": {
"event_type": "quality_unit_failure",
"event_category": "quality_systems",
"severity": "critical",
"confidence": 0.95
},
"applicable_cfr_sections": ["21 CFR 211.22"],
"top_n_classifications": [
{
"event_type": "investigation_deficiency",
"event_category": "quality_systems",
"severity": "critical",
"confidence": 0.88,
"applicable_cfr_sections": ["21 CFR 211.192"]
},
{
"event_type": "change_control_failure",
"event_category": "quality_management",
"severity": "major",
"confidence": 0.72,
"applicable_cfr_sections": ["21 CFR 211.100"]
}
]
}

2. Full Investigation API

Classify an event and get complete regulatory context with similar 483 observations:

API: POST /v1/intelligence/investigate

curl -X POST https://api.ctwise.ai/v1/intelligence/investigate \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event_text": "Temperature excursion in cold storage",
"product_type": "drug",
"include_similar_observations": true,
"include_regulatory_text": true
}'

Response includes:

  • Event classification with confidence score
  • Product-type-specific CFR mappings
  • ICH guideline cross-references
  • Similar 483 observations from FDA inspections
  • Full eCFR regulation text
  • Risk assessment with evidence chain

Analyze trending event types across time periods:

API: GET /v1/analytics/trends

curl -X GET "https://api.ctwise.ai/v1/analytics/trends?event_type=pest_control&period=6m" \
-H "X-Api-Key: YOUR_API_KEY"

Response:

{
"event_type": "pest_control",
"category": "environmental_controls",
"trend_data": [
{
"month": "2026-01",
"occurrence_count": 23,
"facilities_affected": 18,
"avg_confidence": 0.94
},
{
"month": "2026-02",
"occurrence_count": 31,
"facilities_affected": 24,
"avg_confidence": 0.96
}
],
"trend_direction": "increasing",
"change_percentage": 34.8
}

Versioning & Updates

Storage Format

Event Taxonomy uses JSONL (JSON Lines) format stored in Amazon S3:

s3://ctwise-data-lake-{env}/483-intelligence/datasets/event-taxonomy/
├── current.json # Pointer to active version
├── v3/
│ └── event-taxonomy-v3.0.jsonl # 39 event types
├── v2/
│ └── event-taxonomy-v2.0.jsonl # 30 event types (archived)
└── v1/
└── event-taxonomy.jsonl # 24 event types (archived)

Pointer Versioning

current.json contains a version pointer:

{
"version": "v3.0",
"released": "2026-03-13",
"event_count": 39,
"s3_path": "s3://ctwise-data-lake-{env}/483-intelligence/datasets/event-taxonomy/v3/event-taxonomy-v3.0.jsonl"
}

Version History

VersionReleasedEvent TypesKey Changes
v3.02026-03-1339Added 9 event types (quality_systems, supply_chain categories); hybrid keyword+embedding scoring; Platt sigmoid calibration; top-N classification for compound observations; multi-word plural keyword matching; enriched keywords for data_integrity, env_monitoring, cross_contamination, operator_training
v2.02025-12-0130Added food safety, medical device, utilities categories; enhanced scoring algorithm
v1.02024-06-1524Initial release with core GMP event types

Update Process

  1. Quarterly Review -- CTWise team reviews FDA inspection trends
  2. Event Type Evaluation -- Identify emerging event patterns
  3. Validation -- Test new event types against 483 observation corpus
  4. Release -- Deploy new version with backward-compatible versioning
  5. API Transparency -- API responses include algorithm_version field

Relationship to 483 Intelligence

The Event Taxonomy enables automated classification of 483 observations, connecting raw inspection findings to specific regulatory requirements.

How They Work Together

Capability483 IntelligenceEvent Taxonomy
What it answers"What violations did FDA cite?""What type of event is this?"
Data sourceFDA inspection observationsCurated event classification system
Output483 citations with CFR referencesEvent type with confidence score
Use togetherSearch for similar 483 observations......based on classified event type

Example: Combined Workflow

import requests

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.ctwise.ai/v1"
headers = {"X-Api-Key": API_KEY, "Content-Type": "application/json"}

# Step 1: Classify your manufacturing event
classification = requests.post(f"{BASE_URL}/kg/classify",
headers=headers,
json={
"event": "HPLC system not calibrated for 6 months",
"product_type": "drug"
}
).json()
# Returns: event_type=equipment_not_calibrated, cfr_sections=["21 CFR 211.68"]

# Step 2: Find similar 483 observations
observations = requests.post(f"{BASE_URL}/483/observations/search",
headers=headers,
json={
"query": "calibration failure HPLC",
"filters": {
"cfr": "21 CFR 211.68"
},
"top_k": 20
}
).json()
# Returns: 47 matching 483 citations with similarity scores

# Step 3: Get full regulatory text
regulation = requests.get(f"{BASE_URL}/kg/regulations/21%20CFR%20211.68",
headers=headers
).json()
# Returns: Full eCFR text, ICH cross-references, enforcement statistics

Integration Benefits

  • Automated root cause analysis -- Connect your event to similar FDA observations
  • Evidence-backed CAPA -- Reference specific 483 citations in corrective actions
  • Predictive compliance -- See which events frequently lead to OAI classifications
  • Regulatory trend awareness -- Track if your event type is trending in inspections

Getting Started

Ready to use the Event Taxonomy for automated event classification?

  1. KG Intelligence Overview -- Understand Knowledge Graph capabilities
  2. Event Classification API -- Classify events with confidence scores
  3. Full Investigation API -- Get complete regulatory context
  4. Trending Analysis API -- Track event type trends over time
  5. 483 Quickstart Guide -- Search similar 483 observations
  6. API Reference -- Complete endpoint documentation