Semantic Search
CTWise API uses AI-powered semantic search to understand the meaning of your queries, not just keywords.
Overview
Traditional regulatory databases require exact keyword matches. CTWise uses Amazon Bedrock Titan Text Embeddings v2 and AWS S3 Vectors to understand what you're actually looking for.
The Problem with Keyword Search
Query: "informed consent pediatric"
Keyword Result: Only documents containing BOTH exact words
Missed: "assent procedures for minors", "parental permission requirements"
The Semantic Search Advantage
Query: "What are the requirements for informed consent in pediatric trials?"
Semantic Result:
1. FDA-INFORMED-CONSENT-2024 (score: 0.56) - Informed consent guidance
2. ICH-E11(R1) (score: 0.50) - Pediatric population guidance
3. FDA-PEDIATRIC-2023 (score: 0.44) - Pediatric study plans
Why: AI understands the MEANING relates to consent + children + trials
How It Works
1. Query Embedding
Your natural language query is converted to a 1024-dimensional vector using Amazon Bedrock Titan:
Query: "What guidance exists for adaptive trial designs?"
│
└─► Titan Embed → [0.12, -0.45, 0.78, ...] (1024 dimensions)
2. Vector Similarity Search
AWS S3 Vectors performs approximate nearest neighbor search against pre-indexed regulatory rules:
Query Vector → S3 Vectors Index
│
├─► FDA-ADAPTIVE-2019 → similarity: 0.7769
├─► ICH-E20 → similarity: 0.5411
└─► FDA-DMC-2024-DRAFT → similarity: 0.3955
3. Ranked Results
Results are returned sorted by semantic similarity with confidence scores:
{
"results": [
{
"rule_id": "FDA-ADAPTIVE-2019",
"title": "Adaptive Designs for Clinical Trials of Drugs and Biologics",
"similarity_score": 0.7769,
"source": "fda"
}
]
}
Natural Language Query Examples
Regulatory Concept Queries
| Query | Top Result | Score |
|---|---|---|
| "What are the requirements for informed consent in pediatric trials?" | FDA-INFORMED-CONSENT-2024 | 0.56 |
| "How should I handle adverse event reporting?" | ICH-E2A | 0.44 |
| "What statistical methods are acceptable for phase 3?" | ICH-E9(R1) | 0.46 |
| "GCP guidelines for investigator responsibilities" | ICH-E6(R3) | 0.55 |
Process-Oriented Queries
| Query | Top Result | Score |
|---|---|---|
| "What guidance exists for adaptive trial designs?" | FDA-ADAPTIVE-2019 | 0.78 |
| "How do I establish a Data Safety Monitoring Board?" | FDA-DMC-2024-DRAFT | 0.58 |
| "Explain protocol amendment procedures" | ICH-E6(R3) | 0.47 |
| "What training is required for clinical investigators?" | ICH-E6(R2) | 0.45 |
Domain-Specific Queries
| Query | Top Result | Score |
|---|---|---|
| "Tell me about blinding requirements in controlled trials" | ICH-E10 | 0.43 |
| "What are the monitoring requirements for multi-site studies?" | ICH-E6(R3) | 0.52 |
| "How should biomarker data be collected and analyzed?" | ICH-E16 | 0.41 |
API Usage
Semantic Search Endpoint
POST Method (Recommended):
curl -X POST https://api.ctwise.ai/v1/semantic-search \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the requirements for informed consent in pediatric trials?",
"sources": ["fda", "ich"],
"top_k": 5,
"min_similarity": 0.25
}'
GET Method (Alternative via query parameters):
curl "https://api.ctwise.ai/v1/rules/search?q=informed+consent+pediatric&sources=fda,ich&limit=5" \
-H "X-Api-Key: YOUR_API_KEY"
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Natural language question |
sources | string[] | No | Filter by source (fda, ich, ema, who) |
top_k | integer | No | Number of results (default: 5, max: 50) |
min_similarity | float | No | Minimum similarity threshold (default: 0.25) |
Response
{
"query": "What are the requirements for informed consent in pediatric trials?",
"results": [
{
"rule_id": "FDA-INFORMED-CONSENT-2024",
"title": "Informed Consent: Guidance for IRBs, Clinical Investigators, and Sponsors",
"source": "fda",
"similarity_score": 0.5594,
"effective_date": "2024-01-01"
},
{
"rule_id": "ICH-E11(R1)",
"title": "Clinical Investigation of Medicinal Products in the Pediatric Population",
"source": "ich",
"similarity_score": 0.5022,
"effective_date": "2017-09-14"
}
],
"query_metadata": {
"execution_time_ms": 380,
"embedding_model": "amazon.titan-embed-text-v2:0",
"indexes_searched": ["fda-tier1", "ich-tier1"],
"total_results": 5
}
}
Similarity Scoring
Score Interpretation
| Score Range | Meaning | Recommendation |
|---|---|---|
| 0.70+ | High confidence match | Directly relevant |
| 0.50-0.70 | Good match | Review for relevance |
| 0.25-0.50 | Partial match | May be related |
| < 0.25 | Below threshold | Not returned |
Configuring Thresholds
For different use cases, adjust the min_similarity parameter:
| Use Case | Threshold | Rationale |
|---|---|---|
| Broad discovery | 0.20 | Find loosely related rules |
| Standard search | 0.25 | Balanced precision/recall |
| Precise matching | 0.40 | High-confidence matches only |
Cross-Source Discovery
Semantic search excels at finding related rules across different regulatory authorities:
Example: Informed Consent
A single query about "informed consent requirements" returns:
FDA Results:
├── FDA-INFORMED-CONSENT-2024 (0.56)
└── FDA-PEDIATRIC-2023 (0.44)
ICH Results:
├── ICH-E11(R1) (0.50) - Pediatric
├── ICH-E6(R3) (0.39) - GCP
└── ICH-E8(R1) (0.29) - General Considerations
Why this matters: Traditional keyword search would require separate queries to each regulatory body. Semantic search understands the concept spans multiple sources.
Performance Characteristics
| Metric | Value | Notes |
|---|---|---|
| Average response time | 380ms | Including embedding generation |
| P95 response time | Less than 600ms | Under load |
| Embedding model | Titan Text v2 | 1024 dimensions |
| Vector database | AWS S3 Vectors | Native AWS integration |
| Indexes available | FDA, ICH, EMA, WHO | Tier-dependent access |
Best Practices
1. Ask Complete Questions
Good: "What are the requirements for informed consent in pediatric trials?"
Poor: "informed consent pediatric"
2. Include Context
Good: "What statistical methods are acceptable for phase 3 oncology trials?"
Poor: "statistics trials"
3. Use Natural Language
Good: "How should adverse events be reported to the FDA?"
Poor: "AE reporting FDA"
4. Specify Domains When Known
# If you know you want FDA guidance specifically
response = search(
query="adaptive trial design requirements",
sources=["fda"], # Limits search scope
top_k=10
)
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Embedding Model | Amazon Bedrock Titan Text Embeddings v2 | 1024-dimensional semantic encoding |
| Vector Database | AWS S3 Vectors | Cosine similarity search |
| Query Processing | AWS Lambda (ARM64) | Cost-optimized inference |
| Indexes | FDA-tier1, ICH-tier1, EMA-tier1, WHO-tier1 | Pre-computed regulatory rule vectors |
Verified Performance (2025-12-18)
| Metric | Result |
|---|---|
| Tests executed | 20 natural language queries |
| Success rate | 95% (19/20 returned results) |
| Highest score | 0.7769 ("adaptive trial designs") |
| Average score | 0.41 |
| Average response | 380ms |
Evidence: See /aws_mp_set_up/products/ctwise/nlp_evidence/NLP_EVIDENCE_SUMMARY.md
Related Documentation
- Requirements Search Endpoint - Keyword-based search
- Getting Started Guide - First API call
- Authentication - API key setup