Reflection Data Management
Reflection Data Management
Table of Contents
- Introduction
- Project Structure
- Core Components
- Data Model and Schema
- Entity Relationships
- Data Access Patterns
- Sample Queries
- Performance Considerations
- Data Lifecycle and Privacy
- Migration and Schema Updates
Introduction
This document provides comprehensive documentation for the reflection data management system within the RAVANA repository. The system enables self-reflection, insight generation, and self-modification capabilities for an AI agent. It stores reflection outcomes in a persistent JSON file and supports automated analysis of failures to propose and test code changes. While no formal SQL database schema exists for reflection entities, the system leverages a file-based storage mechanism and integrates with a broader knowledge service for insight persistence.
Project Structure
The reflection system is primarily contained within the modules/agent_self_reflection directory. It interacts with core components such as the LLM interface, knowledge service, and decision engine. The data is stored locally in JSON format, and the system supports command-line execution for reflection and self-modification workflows.
Diagram sources
Section sources
Core Components
The core functionality of the reflection system is distributed across several files:
- main.py: Entry point for generating reflections and initiating self-modification.
- reflection_db.py: Handles persistence of reflection entries to reflections.json.
- self_modification.py: Implements logic for detecting actionable reflections, extracting bug reports, generating code patches, and testing changes.
- reflection_prompts.py: Defines the prompt template used to structure reflection content.
Reflection entries are created via reflect_on_task() in main.py, which formats a prompt using REFLECTION_PROMPT, sends it to the LLM, and saves the structured response using save_reflection().
Section sources
Data Model and Schema
ReflectionEntry
Reflection data is stored as a list of JSON objects in reflections.json. Each entry follows this structure:
{
  "timestamp": "2025-07-01T00:00:00Z",
  "task_summary": "Test bug in dummy_function",
  "outcome": "Function failed on edge case.",
  "reflection": "1. What failed?\n- The function 'dummy_function' in 'llm.py' does not handle empty input correctly.\n2. ..."
}
Field Definitions:
- timestamp: ISO 8601 formatted UTC timestamp of the reflection.
- task_summary: Brief description of the task performed.
- outcome: Description of the result or output of the task.
- reflection: LLM-generated structured response containing numbered sections (1. What worked?, 2. What failed?, etc.).
No formal database schema (e.g., SQLModel) exists for ReflectionEntry. The data is managed via simple file I/O operations in reflection_db.py.
InsightRecord
Insights derived from reflections are not stored as a separate entity within the reflection module. Instead, they are passed to the KnowledgeService via add_knowledge() for integration into the broader knowledge base.
Example insight generation:
self.agi_system.knowledge_service.add_knowledge(
    content=insight,
    source="reflection",
    category="insight"
)
Field Definitions:
- content: Textual insight generated from reflection analysis.
- source: Origin of the knowledge (e.g., "reflection").
- category: Classification (e.g., "insight").
Section sources
Entity Relationships
There is no formal relational schema for reflection entities. However, logical relationships exist through data flow and processing:
- ReflectionEntry → InsightRecord: A reflection entry may lead to an insight, which is then stored in the knowledge base via KnowledgeService.
- ReflectionEntry → CodeChangeProposal: Actionable reflections (those indicating failure) are analyzed to extract bug information, which leads to a proposed code patch.
- CodeChangeProposal → AuditLog: Each patch attempt is recorded in self_modification_audit.json, including test results and application status.
These relationships are processed sequentially by run_self_modification() in self_modification.py.
Diagram sources
Data Access Patterns
Reflection Loop
The reflection loop follows this pattern:
- After task completion, reflect_on_task()is called withtask_summaryandoutcome.
- An LLM prompt is generated using REFLECTION_PROMPT.
- The LLM response is saved as a new ReflectionEntryviasave_reflection().
Analytics Tools
Analytics are performed by loading all reflections via load_reflections() and filtering or analyzing them in memory. For example, find_actionable_reflections() scans the reflection field for keywords like "fail", "error", or "bug".
Knowledge Integration
Insights from reflections are pushed to the KnowledgeService, which manages a separate database (not detailed here) for long-term knowledge storage.
Section sources
Sample Queries
Since data is stored in JSON, queries are implemented in Python:
Retrieve All Reflections
from reflection_db import load_reflections
entries = load_reflections()
Retrieve Historical Reflections (Last 7 Days)
from datetime import datetime, timedelta
import json
def get_recent_reflections(days=7):
    cutoff = datetime.now().astimezone() - timedelta(days=days)
    entries = load_reflections()
    return [e for e in entries if datetime.fromisoformat(e['timestamp']) > cutoff]
Find Trending Insights (Frequent Failure Patterns)
from collections import Counter
import re
def get_common_failures():
    entries = load_reflections()
    failures = []
    for entry in entries:
        reflection = entry.get('reflection', '')
        # Extract "What failed?" section
        match = re.search(r"2\.\s*What failed\?\s*(.*?)(?=\n\d\.)", reflection, re.DOTALL)
        if match:
            failures.append(match.group(1).strip())
    return Counter(failures).most_common(5)
Section sources
Performance Considerations
Query Optimization
- Full Load: All reflections are loaded into memory; performance degrades as the file grows.
- Indexing: No indexing is implemented. Time-series queries rely on linear scans.
- Optimization Suggestion: Migrate to a lightweight database (e.g., SQLite) with indexed timestamps.
Caching Strategies
- No explicit caching is implemented. The JSON file is read on every access.
- Recommendation: Cache parsed reflections in memory during a session to avoid repeated I/O.
Retention Policies
- No automatic retention or archival is implemented.
- Recommendation: Implement log rotation or archival to reflections_archive.jsonfor entries older than 90 days.
Diagram sources
Data Lifecycle and Privacy
Data Lifecycle
- Creation: Generated after task completion.
- Modification: Not directly modified; new entries are appended.
- Archival: Not implemented.
- Deletion: Manual only.
Archival Procedures
No automated archival exists. A future enhancement could include:
def archive_old_reflections(days=90):
    # Move entries older than `days` to archive file
    pass
Privacy Controls
- No encryption or access controls are implemented.
- Reflections may contain self-assessment content.
- Recommendation: Add optional encryption for sensitive reflections and access logging.
Section sources
Migration and Schema Updates
Current State
- Storage: reflections.json(flat JSON array)
- No versioning or migration system.
Example Migration: Add Version Field
To add schema versioning:
- Update Entry Structure:
{
  "version": 1,
  "timestamp": "...",
  "task_summary": "...",
  "outcome": "...",
  "reflection": "..."
}
- Migration Script:
def migrate_to_v1():
    entries = load_reflections()
    for entry in entries:
        if 'version' not in entry:
            entry['version'] = 1
    with open(REFLECTIONS_FILE, 'w') as f:
        json.dump(entries, f, indent=2)
- Update Writers:
Modify reflect_on_task()to include"version": 1.
Section sources
Referenced Files in This Document