Code Execution Flow¶
The Code Execution Flow provides intelligent code generation, execution, and automatic error correction capabilities for natural language programming tasks.
Overview¶
The Code Execution Flow implements a sophisticated workflow that can: - Generate code (Python, Bash, etc.) from natural language descriptions - Execute code in isolated environments (Docker, local, Jupyter) - Automatically analyze execution errors and improve code - Provide iterative error correction with detailed improvement history
Architecture¶
graph TD
A[User Request] --> B[Initialize]
B --> C[Generate Code]
C --> D[Execute Code]
D --> E{Execution Success?}
E -->|Yes| F[Format Response]
E -->|No| G[Analyze Error]
G --> H[Improve Code]
H --> I[Execute Improved Code]
I --> J{Max Attempts Reached?}
J -->|No| D
J -->|Yes| F
F --> K[Final Response]
Configuration¶
Basic Configuration¶
Advanced Configuration¶
# configs/statemachines/flows/code_execution.yaml
enabled: true
# Code generation settings
generation:
model: "anthropic:claude-sonnet-4-0"
temperature: 0.7
max_tokens: 2000
timeout: 60
# Execution settings
execution:
use_docker: true
use_jupyter: false
timeout: 120
max_retries: 3
# Error improvement settings
improvement:
enabled: true
max_attempts: 3
model: "anthropic:claude-sonnet-4-0"
focus: "fix_errors" # fix_errors, optimize, robustness
# Response formatting
response:
include_improvement_history: true
show_performance_metrics: true
format: "markdown" # markdown, json, plain
Usage Examples¶
Basic Code Generation and Execution¶
With Automatic Error Correction¶
uv run deepresearch \
question="Create a script that processes CSV data and generates statistics" \
flows.code_execution.improvement.enabled=true
Multi-Language Support¶
uv run deepresearch \
question="Create a bash script that monitors system resources" \
flows.code_execution.generation.language=bash
Advanced Configuration¶
uv run deepresearch \
--config-name=code_execution_advanced \
question="Implement a machine learning model for classification" \
flows.code_execution.execution.use_docker=true \
flows.code_execution.improvement.max_attempts=5
Code Generation Capabilities¶
Supported Languages¶
- Python: General-purpose programming, data analysis, ML/AI
- Bash: System administration, automation, file processing
- Auto-detection: Automatically determines appropriate language based on request
Generation Features¶
- Context-aware: Considers request complexity and requirements
- Best practices: Includes error handling, documentation, and optimization
- Modular design: Creates reusable, well-structured code
- Security considerations: Avoids potentially harmful operations
Execution Environments¶
Docker Execution (Recommended)¶
- Isolated environment: Secure code execution in containers
- Dependency management: Automatic handling of required packages
- Resource limits: Configurable CPU, memory, and timeout limits
- Multi-language support: Consistent execution across languages
Local Execution¶
- Direct execution: Run code directly on host system
- Performance: Lower overhead, faster execution
- Dependencies: Requires manual dependency management
- Security: Less isolated, potential system impact
Jupyter Execution¶
- Interactive environment: Stateful code execution with persistence
- Rich output: Support for plots, images, and interactive content
- Stateful computation: Variables and results persist across executions
- Rich media: Support for HTML, LaTeX, and other rich content types
Error Analysis and Improvement¶
Automatic Error Detection¶
The system automatically detects and categorizes errors:
- Syntax Errors: Code parsing and structure issues
- Runtime Errors: Execution-time failures (undefined variables, type errors, etc.)
- Logical Errors: Incorrect algorithms or logic flow
- Environment Errors: Missing dependencies, permission issues, resource limits
- Import Errors: Missing modules or packages
Intelligent Code Improvement¶
The Code Improvement Agent provides:
Error Analysis¶
- Root Cause Identification: Determines the underlying cause of failures
- Impact Assessment: Evaluates the severity and scope of the error
- Recommendation Generation: Provides specific steps for resolution
Code Enhancement¶
- Error Fixes: Corrects syntax, logical, and runtime errors
- Robustness Improvements: Adds error handling and validation
- Performance Optimization: Improves efficiency and resource usage
- Best Practices: Applies language-specific coding standards
Iterative Improvement¶
- Multi-step Refinement: Progressive improvement attempts
- History Tracking: Detailed record of all improvement attempts
- Convergence Detection: Stops when code executes successfully
Response Formatting¶
Success Response¶
**✅ Execution Successful**
**Generated Python Code:**
```python
def fibonacci(n):
"""Calculate the nth Fibonacci number."""
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
# Example usage
result = fibonacci(10)
print(f"Fibonacci(10) = {result}")
Execution Result:
Performance: - Generation: 2.34s - Execution: 0.12s - Total: 2.46s
### Error with Improvement Response
```markdown
**❌ Execution Failed**
**Error:** NameError: name 'undefined_variable' is not defined
**Error Type:** runtime
**Root Cause:** Undefined variable reference
**Improvement Attempts:** 1
**Improved Python Code:**
```python
def process_data(data):
"""Process input data and return statistics."""
if not data:
return {"error": "No data provided"}
try:
# Calculate basic statistics
total = sum(data)
count = len(data)
average = total / count
return {
"total": total,
"count": count,
"average": average
}
except Exception as e:
return {"error": f"Processing failed: {str(e)}"}
# Example usage with error handling
data = [1, 2, 3, 4, 5]
result = process_data(data)
print(f"Statistics: {result}")
✅ Success after 1 iterations!
Execution Result:
Improvement History: Attempt 1: - Error: NameError: name 'undefined_variable' is not defined - Fix: Added proper variable initialization, error handling, and documentation
## Advanced Features
### Custom Execution Environments
```python
from DeepResearch.src.utils.coding import DockerCommandLineCodeExecutor
# Custom Docker execution
executor = DockerCommandLineCodeExecutor(
timeout=300,
work_dir="/workspace",
image="python:3.11-slim",
auto_remove=True
)
result = await executor.execute_code_blocks([
CodeBlock(code="pip install numpy pandas", language="bash"),
CodeBlock(code="import numpy as np; print('NumPy version:', np.__version__)", language="python")
])
Interactive Jupyter Sessions¶
from DeepResearch.src.utils.jupyter import JupyterCodeExecutor
# Create Jupyter executor
executor = JupyterCodeExecutor(
connection_info=JupyterConnectionInfo(
host="localhost",
port=8888,
token="your-token"
)
)
# Execute with state persistence
result = await executor.execute_code_blocks([
CodeBlock(code="x = 42", language="python"),
CodeBlock(code="y = x * 2; print(f'y = {y}')", language="python")
])
Batch Processing¶
from DeepResearch.src.agents.code_execution_orchestrator import CodeExecutionOrchestrator
orchestrator = CodeExecutionOrchestrator()
# Process multiple requests
requests = [
"Calculate factorial using recursion",
"Create a data visualization script",
"Implement a sorting algorithm"
]
results = []
for request in requests:
result = await orchestrator.process_request(
request,
enable_improvement=True,
max_iterations=3
)
results.append(result)
Integration with Other Flows¶
With PRIME Flow¶
uv run deepresearch \
flows.prime.enabled=true \
flows.code_execution.enabled=true \
question="Design a protein and generate the analysis code"
With Bioinformatics Flow¶
uv run deepresearch \
flows.bioinformatics.enabled=true \
flows.code_execution.enabled=true \
question="Analyze gene expression data and create visualization scripts"
With DeepSearch Flow¶
uv run deepresearch \
flows.deepsearch.enabled=true \
flows.code_execution.enabled=true \
question="Research machine learning algorithms and implement comparison scripts"
Best Practices¶
Code Generation¶
- Clear Specifications: Provide detailed, unambiguous requirements
- Context Information: Include relevant constraints and requirements
- Language Preferences: Specify preferred programming language when needed
- Example Outputs: Describe expected input/output formats
Error Handling¶
- Enable Improvements: Always enable automatic error correction
- Reasonable Limits: Set appropriate maximum improvement attempts
- Review Results: Examine improvement history for learning opportunities
- Iterative Refinement: Use iterative improvement for complex tasks
Execution Environment¶
- Docker First: Prefer Docker execution for security and isolation
- Resource Planning: Configure appropriate resource limits
- Dependency Management: Handle required packages explicitly
- Timeout Settings: Set reasonable execution timeouts
Performance Optimization¶
- Caching: Enable result caching for repeated operations
- Parallel Execution: Use batch processing for multiple tasks
- Resource Monitoring: Monitor execution time and resource usage
- Optimization: Enable code optimization features
Troubleshooting¶
Common Issues¶
Code Generation Failures:
# Increase generation timeout and model temperature
flows.code_execution.generation.timeout=120
flows.code_execution.generation.temperature=0.8
Execution Timeouts:
# Increase execution timeout and resource limits
flows.code_execution.execution.timeout=300
flows.code_execution.execution.memory_limit=2g
Improvement Loops:
# Limit improvement attempts and enable debugging
flows.code_execution.improvement.max_attempts=2
flows.code_execution.improvement.debug=true
Docker Issues:
# Check Docker availability and use local execution as fallback
flows.code_execution.execution.use_docker=false
flows.code_execution.execution.local_fallback=true
Debug Mode¶
# Enable detailed logging and debugging
uv run deepresearch \
question="Debug this code generation" \
hydra.verbose=true \
flows.code_execution.improvement.debug=true \
flows.code_execution.response.show_debug_info=true
Performance Metrics¶
Execution Statistics¶
- Generation Time: Time to generate initial code
- Execution Time: Time to execute generated code
- Improvement Time: Time spent on error analysis and code improvement
- Total Time: End-to-end processing time
- Success Rate: Percentage of successful executions
- Improvement Efficiency: Average improvements per attempt
Quality Metrics¶
- Code Quality Score: Automated assessment of generated code
- Error Reduction: Percentage reduction in errors through improvement
- Robustness Score: Assessment of error handling and validation
- Performance Score: Execution efficiency and resource usage
Security Considerations¶
Code Execution Security¶
- Container Isolation: All code executes in isolated Docker containers
- Resource Limits: Configurable CPU, memory, and network restrictions
- Permission Control: Limited filesystem and network access
- Command Filtering: Blocking potentially harmful operations
Input Validation¶
- Code Analysis: Static analysis of generated code for security issues
- Dependency Scanning: Checking for malicious or vulnerable packages
- Sandboxing: Additional security layers for sensitive operations
Future Enhancements¶
Planned Features¶
- Multi-language Support: Expanded language support (R, Julia, etc.)
- Interactive Debugging: Step-through debugging capabilities
- Code Review Integration: Automated code review and suggestions
- Performance Profiling: Detailed performance analysis and optimization
- Collaborative Coding: Multi-user code development and review
For more detailed API documentation, see the Agents API and Tools API.