Reflection Data Management
Reflection Data Management
Table of Contents
- Introduction
- Project Structure
- Core Components
- Data Model and Schema
- Entity Relationships
- Data Access Patterns
- Sample Queries
- Performance Considerations
- Data Lifecycle and Privacy
- Migration and Schema Updates
Introduction
This document provides comprehensive documentation for the reflection data management system within the RAVANA repository. The system enables self-reflection, insight generation, and self-modification capabilities for an AI agent. It stores reflection outcomes in a persistent JSON file and supports automated analysis of failures to propose and test code changes. While no formal SQL database schema exists for reflection entities, the system leverages a file-based storage mechanism and integrates with a broader knowledge service for insight persistence.
Project Structure
The reflection system is primarily contained within the modules/agent_self_reflection
directory. It interacts with core components such as the LLM interface, knowledge service, and decision engine. The data is stored locally in JSON format, and the system supports command-line execution for reflection and self-modification workflows.
Diagram sources
Section sources
Core Components
The core functionality of the reflection system is distributed across several files:
- main.py: Entry point for generating reflections and initiating self-modification.
- reflection_db.py: Handles persistence of reflection entries to
reflections.json
. - self_modification.py: Implements logic for detecting actionable reflections, extracting bug reports, generating code patches, and testing changes.
- reflection_prompts.py: Defines the prompt template used to structure reflection content.
Reflection entries are created via reflect_on_task()
in main.py
, which formats a prompt using REFLECTION_PROMPT
, sends it to the LLM, and saves the structured response using save_reflection()
.
Section sources
Data Model and Schema
ReflectionEntry
Reflection data is stored as a list of JSON objects in reflections.json
. Each entry follows this structure:
{
"timestamp": "2025-07-01T00:00:00Z",
"task_summary": "Test bug in dummy_function",
"outcome": "Function failed on edge case.",
"reflection": "1. What failed?\n- The function 'dummy_function' in 'llm.py' does not handle empty input correctly.\n2. ..."
}
Field Definitions:
- timestamp: ISO 8601 formatted UTC timestamp of the reflection.
- task_summary: Brief description of the task performed.
- outcome: Description of the result or output of the task.
- reflection: LLM-generated structured response containing numbered sections (1. What worked?, 2. What failed?, etc.).
No formal database schema (e.g., SQLModel) exists for ReflectionEntry
. The data is managed via simple file I/O operations in reflection_db.py
.
InsightRecord
Insights derived from reflections are not stored as a separate entity within the reflection module. Instead, they are passed to the KnowledgeService
via add_knowledge()
for integration into the broader knowledge base.
Example insight generation:
self.agi_system.knowledge_service.add_knowledge(
content=insight,
source="reflection",
category="insight"
)
Field Definitions:
- content: Textual insight generated from reflection analysis.
- source: Origin of the knowledge (e.g., "reflection").
- category: Classification (e.g., "insight").
Section sources
Entity Relationships
There is no formal relational schema for reflection entities. However, logical relationships exist through data flow and processing:
- ReflectionEntry → InsightRecord: A reflection entry may lead to an insight, which is then stored in the knowledge base via
KnowledgeService
. - ReflectionEntry → CodeChangeProposal: Actionable reflections (those indicating failure) are analyzed to extract bug information, which leads to a proposed code patch.
- CodeChangeProposal → AuditLog: Each patch attempt is recorded in
self_modification_audit.json
, including test results and application status.
These relationships are processed sequentially by run_self_modification()
in self_modification.py
.
Diagram sources
Data Access Patterns
Reflection Loop
The reflection loop follows this pattern:
- After task completion,
reflect_on_task()
is called withtask_summary
andoutcome
. - An LLM prompt is generated using
REFLECTION_PROMPT
. - The LLM response is saved as a new
ReflectionEntry
viasave_reflection()
.
Analytics Tools
Analytics are performed by loading all reflections via load_reflections()
and filtering or analyzing them in memory. For example, find_actionable_reflections()
scans the reflection
field for keywords like "fail", "error", or "bug".
Knowledge Integration
Insights from reflections are pushed to the KnowledgeService
, which manages a separate database (not detailed here) for long-term knowledge storage.
Section sources
Sample Queries
Since data is stored in JSON, queries are implemented in Python:
Retrieve All Reflections
from reflection_db import load_reflections
entries = load_reflections()
Retrieve Historical Reflections (Last 7 Days)
from datetime import datetime, timedelta
import json
def get_recent_reflections(days=7):
cutoff = datetime.now().astimezone() - timedelta(days=days)
entries = load_reflections()
return [e for e in entries if datetime.fromisoformat(e['timestamp']) > cutoff]
Find Trending Insights (Frequent Failure Patterns)
from collections import Counter
import re
def get_common_failures():
entries = load_reflections()
failures = []
for entry in entries:
reflection = entry.get('reflection', '')
# Extract "What failed?" section
match = re.search(r"2\.\s*What failed\?\s*(.*?)(?=\n\d\.)", reflection, re.DOTALL)
if match:
failures.append(match.group(1).strip())
return Counter(failures).most_common(5)
Section sources
Performance Considerations
Query Optimization
- Full Load: All reflections are loaded into memory; performance degrades as the file grows.
- Indexing: No indexing is implemented. Time-series queries rely on linear scans.
- Optimization Suggestion: Migrate to a lightweight database (e.g., SQLite) with indexed timestamps.
Caching Strategies
- No explicit caching is implemented. The JSON file is read on every access.
- Recommendation: Cache parsed reflections in memory during a session to avoid repeated I/O.
Retention Policies
- No automatic retention or archival is implemented.
- Recommendation: Implement log rotation or archival to
reflections_archive.json
for entries older than 90 days.
Diagram sources
Data Lifecycle and Privacy
Data Lifecycle
- Creation: Generated after task completion.
- Modification: Not directly modified; new entries are appended.
- Archival: Not implemented.
- Deletion: Manual only.
Archival Procedures
No automated archival exists. A future enhancement could include:
def archive_old_reflections(days=90):
# Move entries older than `days` to archive file
pass
Privacy Controls
- No encryption or access controls are implemented.
- Reflections may contain self-assessment content.
- Recommendation: Add optional encryption for sensitive reflections and access logging.
Section sources
Migration and Schema Updates
Current State
- Storage:
reflections.json
(flat JSON array) - No versioning or migration system.
Example Migration: Add Version Field
To add schema versioning:
- Update Entry Structure:
{
"version": 1,
"timestamp": "...",
"task_summary": "...",
"outcome": "...",
"reflection": "..."
}
- Migration Script:
def migrate_to_v1():
entries = load_reflections()
for entry in entries:
if 'version' not in entry:
entry['version'] = 1
with open(REFLECTIONS_FILE, 'w') as f:
json.dump(entries, f, indent=2)
- Update Writers:
Modify
reflect_on_task()
to include"version": 1
.
Section sources
Referenced Files in This Document