Knowledge Query Tools¶
This section documents tools for information retrieval and knowledge querying in DeepCritical.
Overview¶
Knowledge Query tools provide capabilities for retrieving information from various knowledge sources, including web search, databases, and structured knowledge bases.
Available Tools¶
Web Search Tools¶
WebSearchTool¶
Performs web searches and retrieves relevant information.
Location: DeepResearch.src.tools.websearch_tools.WebSearchTool
Capabilities: - Multi-engine search (Google, DuckDuckGo, Bing) - Content extraction and summarization - Relevance filtering - Result ranking and deduplication
Usage:
from DeepResearch.src.tools.websearch_tools import WebSearchTool
tool = WebSearchTool()
result = await tool.run({
"query": "machine learning applications",
"num_results": 10,
"engines": ["google", "duckduckgo"]
})
Parameters: - query
: Search query string - num_results
: Number of results to return (default: 10) - engines
: List of search engines to use - max_age_days
: Maximum age of results in days - language
: Language for search results
ChunkedSearchTool¶
Performs chunked searches for large query sets.
Location: DeepResearch.src.tools.websearch_tools.ChunkedSearchTool
Capabilities: - Large-scale search operations - Query chunking and parallel processing - Result aggregation and deduplication - Memory-efficient processing
Usage:
from DeepResearch.src.tools.websearch_tools import ChunkedSearchTool
tool = ChunkedSearchTool()
result = await tool.run({
"queries": ["query1", "query2", "query3"],
"chunk_size": 5,
"max_concurrent": 3
})
Database Query Tools¶
DatabaseQueryTool¶
Executes queries against structured databases.
Location: DeepResearch.src.tools.database_tools.DatabaseQueryTool
Capabilities: - SQL query execution - Result formatting and validation - Connection management - Query optimization
Supported Databases: - PostgreSQL - MySQL - SQLite - Neo4j (graph database)
Usage:
from DeepResearch.src.tools.database_tools import DatabaseQueryTool
tool = DatabaseQueryTool()
result = await tool.run({
"connection_string": "postgresql://user:pass@localhost/db",
"query": "SELECT * FROM research_data WHERE topic = %s",
"parameters": ["machine_learning"],
"max_rows": 1000
})
Knowledge Base Tools¶
KnowledgeBaseQueryTool¶
Queries structured knowledge bases and ontologies.
Location: DeepResearch.src.tools.knowledge_base_tools.KnowledgeBaseQueryTool
Capabilities: - Ontology querying (GO, MeSH, etc.) - Semantic search - Relationship traversal - Knowledge graph navigation
Usage:
from DeepResearch.src.tools.knowledge_base_tools import KnowledgeBaseQueryTool
tool = KnowledgeBaseQueryTool()
result = await tool.run({
"ontology": "GO",
"query_type": "term_search",
"search_term": "protein kinase activity",
"max_results": 50
})
Document Search Tools¶
DocumentSearchTool¶
Searches through document collections and corpora.
Location: DeepResearch.src.tools.document_tools.DocumentSearchTool
Capabilities: - Full-text search across documents - Metadata filtering - Relevance ranking - Multi-format support (PDF, DOC, TXT)
Usage:
from DeepResearch.src.tools.document_tools import DocumentSearchTool
tool = DocumentSearchTool()
result = await tool.run({
"collection": "research_papers",
"query": "deep learning protein structure",
"filters": {
"year": {"gte": 2020},
"journal": "Nature"
},
"max_results": 20
})
Tool Integration¶
Agent Integration¶
Knowledge Query tools integrate seamlessly with DeepCritical agents:
from DeepResearch.agents import SearchAgent
agent = SearchAgent()
result = await agent.execute(
"Find recent papers on CRISPR gene editing",
dependencies=AgentDependencies()
)
Workflow Integration¶
Tools can be used in research workflows:
from DeepResearch.app import main
result = await main(
question="What are the latest developments in quantum computing?",
flows={"deepsearch": {"enabled": True}},
tool_config={
"web_search": {
"engines": ["google", "arxiv"],
"max_results": 50
}
}
)
Configuration¶
Tool Configuration¶
Configure Knowledge Query tools in configs/tools/knowledge_query.yaml
:
knowledge_query:
web_search:
default_engines: ["google", "duckduckgo"]
max_results: 20
cache_results: true
cache_ttl_hours: 24
database:
connection_pool_size: 10
query_timeout_seconds: 30
enable_query_logging: true
knowledge_base:
supported_ontologies: ["GO", "MeSH", "ChEBI"]
default_endpoint: "https://api.geneontology.org"
cache_enabled: true
Performance Tuning¶
performance:
search:
max_concurrent_requests: 5
request_timeout_seconds: 10
retry_attempts: 3
database:
connection_pool_size: 20
statement_cache_size: 100
query_optimization: true
caching:
enabled: true
ttl_seconds: 3600
max_cache_size_mb: 512
Best Practices¶
Search Optimization¶
- Query Formulation: Use specific, well-formed queries
- Result Filtering: Apply relevance filters to reduce noise
- Source Diversity: Use multiple search engines/sources
- Caching: Enable caching for frequently accessed data
Database Queries¶
- Parameterized Queries: Always use parameterized queries
- Index Usage: Ensure proper database indexing
- Connection Pooling: Use connection pooling for efficiency
- Query Limits: Set reasonable result limits
Knowledge Base Queries¶
- Ontology Awareness: Understand ontology structure and relationships
- Semantic Matching: Use semantic search capabilities
- Result Validation: Validate ontology term mappings
- Version Handling: Handle ontology version changes
Error Handling¶
Common Errors¶
Search Failures:
try:
result = await web_search_tool.run({"query": "complex query"})
except SearchTimeoutError:
# Handle timeout
result = await web_search_tool.run({
"query": "complex query",
"timeout": 60
})
Database Connection Issues:
try:
result = await db_tool.run({"query": "SELECT * FROM data"})
except ConnectionError:
# Retry with different connection
result = await db_tool.run({
"query": "SELECT * FROM data",
"connection_string": backup_connection
})
Knowledge Base Unavailability:
try:
result = await kb_tool.run({"ontology": "GO", "term": "kinase"})
except OntologyUnavailableError:
# Fallback to alternative source
result = await kb_tool.run({
"ontology": "GO",
"term": "kinase",
"fallback_source": "local_cache"
})
Monitoring and Metrics¶
Tool Metrics¶
Knowledge Query tools provide comprehensive metrics:
# Get tool metrics
metrics = tool.get_metrics()
print(f"Total queries: {metrics['total_queries']}")
print(f"Success rate: {metrics['success_rate']:.2%}")
print(f"Average response time: {metrics['avg_response_time']:.2f}s")
print(f"Cache hit rate: {metrics['cache_hit_rate']:.2%}")
Performance Monitoring¶
# Enable performance monitoring
tool.enable_monitoring()
# Get performance report
report = tool.get_performance_report()
for query_type, stats in report.items():
print(f"{query_type}: {stats['count']} queries, "
f"{stats['avg_time']:.2f}s avg time")
Security Considerations¶
Input Validation¶
All Knowledge Query tools validate inputs:
# Automatic input validation
result = await tool.run({
"query": user_input, # Automatically validated
"max_results": 100 # Range checked
})
Output Sanitization¶
Results are sanitized to prevent injection:
# Safe result handling
if result.success:
safe_data = result.get_sanitized_data()
# Use safe_data for further processing
Access Control¶
Configure access controls for sensitive data sources:
access_control:
database:
allowed_queries: ["SELECT", "SHOW"]
blocked_tables: ["sensitive_data"]
knowledge_base:
allowed_ontologies: ["GO", "MeSH"]
require_authentication: true
Related Documentation¶
- Tool Registry - Tool registration and management
- Web Search Integration - Web search capabilities
- RAG Tools - Retrieval-augmented generation
- Bioinformatics Tools - Domain-specific tools