Semantic Search
CTWise API uses AI-powered semantic search to understand the meaning of your queries, not just keywords.
Overview​
Traditional regulatory databases require exact keyword matches. CTWise uses Amazon Bedrock Titan Text Embeddings v2 and AWS S3 Vectors to understand what you're actually looking for.
The Problem with Keyword Search​
Query: "informed consent pediatric"
Keyword Result: Only documents containing BOTH exact words
Missed: "assent procedures for minors", "parental permission requirements"
The Semantic Search Advantage​
Query: "What are the requirements for informed consent in pediatric trials?"
Semantic Result:
1. FDA-INFORMED-CONSENT-2024 (score: 0.56) - Informed consent guidance
2. ICH-E11(R1) (score: 0.50) - Pediatric population guidance
3. FDA-PEDIATRIC-2023 (score: 0.44) - Pediatric study plans
Why: AI understands the MEANING relates to consent + children + trials
How It Works​
1. Query Embedding​
Your natural language query is converted to a 1024-dimensional vector using Amazon Bedrock Titan:
Query: "What guidance exists for adaptive trial designs?"
│
└─► Titan Embed → [0.12, -0.45, 0.78, ...] (1024 dimensions)
2. Vector Similarity Search​
AWS S3 Vectors performs approximate nearest neighbor search against pre-indexed regulatory rules:
Query Vector → S3 Vectors Index
│
├─► FDA-ADAPTIVE-2019 → similarity: 0.7769
├─► ICH-E20 → similarity: 0.5411
└─► FDA-DMC-2024-DRAFT → similarity: 0.3955
3. Ranked Results​
Results are returned sorted by semantic similarity with confidence scores:
{
"results": [
{
"rule_id": "FDA-ADAPTIVE-2019",
"title": "Adaptive Designs for Clinical Trials of Drugs and Biologics",
"similarity_score": 0.7769,
"source": "fda"
}
]
}
Natural Language Query Examples​
Regulatory Concept Queries​
| Query | Top Result | Score |
|---|---|---|
| "What are the requirements for informed consent in pediatric trials?" | FDA-INFORMED-CONSENT-2024 | 0.56 |
| "How should I handle adverse event reporting?" | ICH-E2A | 0.44 |
| "What statistical methods are acceptable for phase 3?" | ICH-E9(R1) | 0.46 |
| "GCP guidelines for investigator responsibilities" | ICH-E6(R3) | 0.55 |
Process-Oriented Queries​
| Query | Top Result | Score |
|---|---|---|
| "What guidance exists for adaptive trial designs?" | FDA-ADAPTIVE-2019 | 0.78 |
| "How do I establish a Data Safety Monitoring Board?" | FDA-DMC-2024-DRAFT | 0.58 |
| "Explain protocol amendment procedures" | ICH-E6(R3) | 0.47 |
| "What training is required for clinical investigators?" | ICH-E6(R2) | 0.45 |
Domain-Specific Queries​
| Query | Top Result | Score |
|---|---|---|
| "Tell me about blinding requirements in controlled trials" | ICH-E10 | 0.43 |
| "What are the monitoring requirements for multi-site studies?" | ICH-E6(R3) | 0.52 |
| "How should biomarker data be collected and analyzed?" | ICH-E16 | 0.41 |
API Usage​
Semantic Search Endpoint​
POST Method (Recommended):
curl -X POST https://api.ctwise.ai/v1/semantic-search \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the requirements for informed consent in pediatric trials?",
"sources": ["fda", "ich"],
"top_k": 5,
"min_similarity": 0.25
}'
GET Method (Alternative via query parameters):
curl "https://api.ctwise.ai/v1/rules/search?q=informed+consent+pediatric&sources=fda,ich&limit=5" \
-H "x-api-key: YOUR_API_KEY"
Request Parameters​
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Natural language question |
sources | string[] | No | Filter by source (fda, ich, ema, who) |
top_k | integer | No | Number of results (default: 5, max: 50) |
min_similarity | float | No | Minimum similarity threshold (default: 0.25) |
Response​
{
"query": "What are the requirements for informed consent in pediatric trials?",
"results": [
{
"rule_id": "FDA-INFORMED-CONSENT-2024",
"title": "Informed Consent: Guidance for IRBs, Clinical Investigators, and Sponsors",
"source": "fda",
"similarity_score": 0.5594,
"effective_date": "2024-01-01"
},
{
"rule_id": "ICH-E11(R1)",
"title": "Clinical Investigation of Medicinal Products in the Pediatric Population",
"source": "ich",
"similarity_score": 0.5022,
"effective_date": "2017-09-14"
}
],
"query_metadata": {
"execution_time_ms": 380,
"embedding_model": "amazon.titan-embed-text-v2:0",
"indexes_searched": ["fda-tier1", "ich-tier1"],
"total_results": 5
}
}
Similarity Scoring​
Score Interpretation​
| Score Range | Meaning | Recommendation |
|---|---|---|
| 0.70+ | High confidence match | Directly relevant |
| 0.50-0.70 | Good match | Review for relevance |
| 0.25-0.50 | Partial match | May be related |
| < 0.25 | Below threshold | Not returned |
Configuring Thresholds​
For different use cases, adjust the min_similarity parameter:
| Use Case | Threshold | Rationale |
|---|---|---|
| Broad discovery | 0.20 | Find loosely related rules |
| Standard search | 0.25 | Balanced precision/recall |
| Precise matching | 0.40 | High-confidence matches only |
Cross-Source Discovery​
Semantic search excels at finding related rules across different regulatory authorities:
Example: Informed Consent​
A single query about "informed consent requirements" returns:
FDA Results:
├── FDA-INFORMED-CONSENT-2024 (0.56)
└── FDA-PEDIATRIC-2023 (0.44)
ICH Results:
├── ICH-E11(R1) (0.50) - Pediatric
├── ICH-E6(R3) (0.39) - GCP
└── ICH-E8(R1) (0.29) - General Considerations
Why this matters: Traditional keyword search would require separate queries to each regulatory body. Semantic search understands the concept spans multiple sources.
Performance Characteristics​
| Metric | Value | Notes |
|---|---|---|
| Average response time | 380ms | Including embedding generation |
| P95 response time | Less than 600ms | Under load |
| Embedding model | Titan Text v2 | 1024 dimensions |
| Vector database | AWS S3 Vectors | Native AWS integration |
| Indexes available | FDA, ICH, EMA, WHO | Tier-dependent access |
Best Practices​
1. Ask Complete Questions​
Good: "What are the requirements for informed consent in pediatric trials?"
Poor: "informed consent pediatric"
2. Include Context​
Good: "What statistical methods are acceptable for phase 3 oncology trials?"
Poor: "statistics trials"