Database Schema
Database Schema
Update Summary
Changes Made
- Added new entity definitions for VeryLongTermMemory, MemoryPattern, MemoryConsolidation, StrategicKnowledge, and related junction tables
- Updated relationships and schema diagram to include new VLTM entities and many-to-many relationships
- Added new section on Very Long-Term Memory System architecture
- Updated indexing strategy to include new VLTM indexes
- Enhanced data lifecycle policies to include VLTM retention policies
- Added new section on strategic knowledge management
Table of Contents
- Introduction
- Entity Definitions
- Very Long-Term Memory System
- Relationships and Schema Diagram
- Data Access Patterns and ORM Usage
- Indexing Strategy for Performance-Critical Queries
- Data Lifecycle Policies and Retention
- Data Security, Encryption, and Backup Procedures
- Migration Strategies for Schema Evolution
- Conclusion
Introduction
This document provides comprehensive documentation for the database schema used in the RAVANA AGI system. The schema is designed to support an autonomous artificial general intelligence system that processes information, makes decisions, reflects on its behavior, and evolves over time. The database stores various types of operational, emotional, and cognitive data that enable the system to maintain context, learn from experience, and generate intelligent responses. This documentation details all entities, their relationships, access patterns, performance considerations, and lifecycle management policies.
Entity Definitions
The database schema consists of several key entities that capture different aspects of the AGI's operation and cognition. Each entity is defined using SQLModel, which provides a Pythonic interface for database interactions.
Article
Stores information about news articles and other content sources that the system monitors.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the article
- title: str - Title of the article
- link: str - URL of the article
- published: str - Publication timestamp in ISO format
- source: str - Source of the article (e.g., CNN, BBC)
- fetched_at: str - Timestamp when the article was fetched by the system
Constraints:
- Primary key on id
- Uniqueness is enforced at the application level based on the linkfield to prevent duplicate articles
Section sources
Event
Represents significant events detected by the system through analysis of articles and other inputs.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the event
- timestamp: str - Timestamp of the event in ISO format
- description: str - Natural language description of the event
- keywords: str - Comma-separated keywords associated with the event
- cluster_id: int - Identifier for the cluster to which this event belongs
Constraints:
- Primary key on id
Section sources
Summary
Stores compressed knowledge and summaries generated by the system, serving as a mechanism for knowledge retention and deduplication.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the summary
- timestamp: str - Creation timestamp in ISO format
- summary_text: str - The actual summary content
- source: str | None = Field(default="unknown") - Source of the summary (e.g., system, user)
- category: str | None = Field(default="misc") - Category of the summary (e.g., compression, reflection)
- content_hash: str | None = Field(default=None) - SHA-256 hash of the summary text for deduplication
Constraints:
- Primary key on id
- The content_hashfield enables deduplication by allowing the system to check if a similar summary already exists
Section sources
ActionLog
Records all actions performed by the AGI system, providing an audit trail of its operations.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the log entry
- timestamp: str - Timestamp of the action in ISO format
- action_name: str - Name of the action performed
- params: str - JSON string containing the parameters passed to the action
- status: str - Status of the action ('success' or 'failure')
- result: str - JSON string containing the result of the action
Constraints:
- Primary key on id
Section sources
MoodLog
Captures the emotional state of the AGI system at specific points in time.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the mood log entry
- timestamp: str - Timestamp of the mood recording in ISO format
- mood_vector: str - JSON string representing the mood vector (a multidimensional representation of emotional state)
Constraints:
- Primary key on id
Section sources
SituationLog
Records situations generated by the system for decision-making and reflection purposes.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the situation log
- timestamp: str - Timestamp of the situation creation in ISO format
- situation_type: str - Type of situation (e.g., curiosity, reflection)
- prompt: str - The prompt that generated the situation
- context: str - JSON string containing additional context for the situation
Constraints:
- Primary key on id
Section sources
DecisionLog
Stores decisions made by the AGI system in response to specific situations.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the decision log
- timestamp: str - Timestamp of the decision in ISO format
- situation_id: int | None = Field(default=None, foreign_key="situationlog.id") - Foreign key referencing the situation that prompted the decision
- raw_response: str - The raw response from the decision-making process
Constraints:
- Primary key on id
- Foreign key relationship with SituationLogon thesituation_idfield
Section sources
ExperimentLog
Records experiments conducted by the AGI system to test hypotheses or explore new ideas.
Fields:
- id: int | None = Field(default=None, primary_key=True) - Unique identifier for the experiment log
- timestamp: str - Timestamp of the experiment in ISO format
- hypothesis: str - The hypothesis being tested
- results: str - JSON string containing the results of the experiment
Constraints:
- Primary key on id
Section sources
Very Long-Term Memory System
The Very Long-Term Memory (VLTM) system has been enhanced with new entities and relationships to support advanced knowledge management and strategic reasoning.
VeryLongTermMemory
Core very long-term memory record that stores compressed cognitive content.
Fields:
- memory_id: str - Primary key, UUID identifier for the memory
- memory_type: MemoryType - Enum indicating the type of memory (strategic_knowledge, architectural_insight, etc.)
- created_at: datetime - Timestamp when the memory was created
- last_accessed: datetime - Timestamp of last access
- promoted_at: datetime - Timestamp when memory was promoted to long-term storage
- access_count: int - Number of times the memory has been accessed
- importance_score: float - Calculated importance score (0.0-1.0)
- strategic_value: float - Strategic value for long-term planning (0.0-1.0)
- compressed_content: str - JSON string of compressed memory data
- metadata_info: str - JSON string of metadata
- source_session: str - Identifier of the session that created the memory
- related_memories: str - JSON array of related memory IDs
- retention_category: str - Retention policy category
Constraints:
- Primary key on memory_id
- Importance score and strategic value constrained between 0.0 and 1.0
- JSON fields validated for proper format
Section sources
MemoryPattern
Patterns extracted from memories through analysis of memory content and relationships.
Fields:
- pattern_id: str - Primary key, UUID identifier for the pattern
- pattern_type: PatternType - Enum indicating the type of pattern (temporal, causal, behavioral, etc.)
- pattern_description: str - Natural language description of the pattern
- confidence_score: float - Confidence in pattern validity (0.0-1.0)
- pattern_data: str - JSON string of pattern-specific data
- discovered_at: datetime - Timestamp when the pattern was discovered
- supporting_memories: str - JSON array of memory IDs that support this pattern
- validation_count: int - Number of times the pattern has been validated
- last_validated: Optional[datetime] - Timestamp of last validation
- source_memory_id: Optional[str] - Foreign key to the source memory
Constraints:
- Primary key on pattern_id
- Confidence score constrained between 0.0 and 1.0
- JSON fields validated for proper format
Section sources
MemoryConsolidation
Records of memory consolidation processes that optimize storage and extract higher-level knowledge.
Fields:
- consolidation_id: str - Primary key, UUID identifier for the consolidation
- consolidation_date: datetime - Timestamp of the consolidation process
- consolidation_type: ConsolidationType - Enum indicating the type (daily, weekly, monthly, etc.)
- memories_processed: int - Number of memories processed in this consolidation
- patterns_extracted: int - Number of patterns extracted
- compression_ratio: float - Ratio of original size to compressed size
- consolidation_results: str - JSON string of consolidation results
- processing_time_seconds: float - Time taken to process the consolidation
- success: bool - Whether the consolidation was successful
- error_message: Optional[str] - Error message if consolidation failed
Constraints:
- Primary key on consolidation_id
Section sources
StrategicKnowledge
High-level strategic knowledge derived from patterns and used for long-term planning.
Fields:
- knowledge_id: str - Primary key, UUID identifier for the knowledge
- knowledge_domain: str - Domain of the knowledge (e.g., architecture, performance, learning)
- knowledge_summary: str - Natural language summary of the knowledge
- confidence_level: float - Confidence in the knowledge (0.0-1.0)
- last_updated: datetime - Timestamp of last update
- source_patterns: str - JSON array of pattern IDs that contributed to this knowledge
- knowledge_structure: str - JSON string of structured knowledge representation
- validation_score: float - Score based on validation attempts (0.0-1.0)
- application_count: int - Number of times this knowledge has been applied
Constraints:
- Primary key on knowledge_id
- Confidence level and validation score constrained between 0.0 and 1.0
Section sources
ConsolidationPattern
Junction table linking memory consolidations and patterns, enabling many-to-many relationships.
Fields:
- consolidation_id: str - Foreign key to memory_consolidations table, part of primary key
- pattern_id: str - Foreign key to memory_patterns table, part of primary key
- extraction_confidence: float - Confidence in the pattern extraction during consolidation
Constraints:
- Composite primary key on consolidation_idandpattern_id
- Foreign key relationships to memory_consolidations and memory_patterns tables
Section sources
PatternStrategicKnowledge
Junction table linking memory patterns and strategic knowledge, enabling many-to-many relationships.
Fields:
- pattern_id: str - Foreign key to memory_patterns table, part of primary key
- knowledge_id: str - Foreign key to strategic_knowledge table, part of primary key
- contribution_weight: float - Weight of the pattern's contribution to the strategic knowledge
Constraints:
- Composite primary key on pattern_idandknowledge_id
- Foreign key relationships to memory_patterns and strategic_knowledge tables
Section sources
ConsolidationMetrics
Performance metrics for consolidation processes.
Fields:
- metric_id: str - Primary key, UUID identifier for the metric
- consolidation_id: str - Foreign key to memory_consolidations table
- metric_name: str - Name of the metric
- metric_value: float - Value of the metric
- metric_unit: str - Unit of measurement
- timestamp: datetime - Timestamp when the metric was recorded
Constraints:
- Primary key on metric_id
- Foreign key relationship to memory_consolidations table
Section sources
Relationships and Schema Diagram
The database schema entities are related through foreign key relationships and conceptual associations that reflect the AGI system's cognitive processes.
Diagram sources
Section sources
Data Access Patterns and ORM Usage
The system uses SQLModel as an ORM (Object-Relational Mapper) to interact with the database, providing a clean, Pythonic interface for data operations.
ORM Configuration
The database engine is configured in engine.py using SQLAlchemy's create_engine function with the database URL obtained from the configuration system.
from sqlmodel import create_engine, SQLModel
from core.config import Config
engine = create_engine(Config.DATABASE_URL, echo=True)
def create_db_and_tables():
    SQLModel.metadata.create_all(engine)
The create_db_and_tables() function initializes the database schema by creating all tables defined in the SQLModel classes.
Section sources
Transaction Boundaries
Data access follows a consistent pattern using context managers to ensure proper transaction boundaries and resource cleanup. The DataService class in data_service.py implements methods for saving various types of data with proper transaction management.
def save_action_log(self, action_name: str, params: dict, status: str, result: any):
    """Saves a record of an executed action to the database."""
    with Session(self.engine) as session:
        action_log = ActionLog(
            timestamp=datetime.utcnow().isoformat(),
            action_name=action_name,
            params=json.dumps(params),
            status=status,
            result=json.dumps(result)
        )
        session.add(action_log)
        session.commit()
Key characteristics of the data access pattern:
- Each operation uses a context manager (with Session(...) as session) to ensure the session is properly closed
- Transactions are committed explicitly with session.commit()
- JSON serialization is used for complex data structures stored in text fields
- Operations are atomic and follow the principle of least privilege
Data Service Implementation
The DataService class provides a service layer interface for database operations, abstracting the underlying ORM details from the rest of the application.
Key methods:
- fetch_and_save_articles(): Fetches articles from configured RSS feeds and saves new ones to the database
- detect_and_save_events(): Analyzes articles to detect significant events and saves them
- save_action_log(): Records system actions
- save_mood_log(): Records the system's emotional state
- save_situation_log(): Records generated situations
- save_decision_log(): Records decisions made by the system
- save_experiment_log(): Records experiments and their results
The service layer ensures consistent data handling and provides a single point of control for database interactions.
Section sources
Indexing Strategy for Performance-Critical Queries
The system employs a multi-layered approach to optimize performance-critical queries, particularly for memory retrieval operations.
ChromaDB for Semantic Memory Retrieval
For performance-critical memory retrieval operations, the system uses ChromaDB, an embedding database optimized for semantic search. This is implemented in the episodic_memory module.
Key features:
- Uses sentence-transformers with the 'all-MiniLM-L6-v2' model to generate 384-dimensional embeddings
- Stores embeddings in a persistent ChromaDB collection with automatic embedding generation
- Supports similarity-based queries for retrieving relevant memories
chroma_collection = chroma_client.get_or_create_collection(
    name=CHROMA_COLLECTION,
    embedding_function=sentence_transformer_ef
)
Query process:
- A query text is converted to an embedding vector
- The system finds the nearest neighbors in the embedding space
- Results are returned based on cosine similarity
- A similarity threshold filters out irrelevant results
The get_relevant_memories_api function implements this retrieval pattern:
def get_relevant_memories_api(request: QueryRequest):
    results = chroma_collection.query(
        query_texts=[request.query_text],
        n_results=request.top_n
    )
    
    # Convert distance to similarity and filter by threshold
    similarity = 1 - dist
    if similarity >= request.similarity_threshold:
        # Include in results
VLTM Database Indexing
The VLTM system implements comprehensive indexing strategies to optimize query performance for very long-term memory operations.
Index types:
- B-tree indexes: On frequently queried scalar fields like timestamps, memory types, and confidence scores
- GIN indexes: On JSONB fields for efficient querying of structured data
- Text search indexes: Using tsvector for full-text search capabilities
- Composite indexes: On frequently queried field combinations
Key indexes:
- idx_vltm_memory_type: On memory_type for filtering by memory category
- idx_vltm_created_at: On created_at for time-based queries
- idx_vltm_importance_score: On importance_score for prioritizing important memories
- idx_patterns_type: On pattern_type for filtering by pattern category
- idx_strategic_domain: On knowledge_domain for domain-specific queries
- idx_vltm_metadata_gin: GIN index on metadata for efficient JSON querying
- idx_patterns_description_text: Text search index on pattern descriptions
Section sources
Data Lifecycle Policies and Retention
The system implements automated data lifecycle management through background tasks that handle retention, consolidation, and archival.
Knowledge Compression
The system runs a periodic knowledge compression task that summarizes recent interactions and creates higher-level abstractions.
Configuration:
- Frequency: Every 24 hours (86,400 seconds)
- Triggered by: knowledge_compression_task()in the main system loop
Process:
- Retrieves recent summaries from the database
- Uses an LLM to compress and synthesize the information
- Creates new Summaryrecords with category "compression"
- Uses content hashing to prevent duplication
async def knowledge_compression_task(self):
    """Background task to compress knowledge every 24 hours."""
    while not self._shutdown.is_set():
        try:
            summary = await asyncio.to_thread(self.knowledge_service.compress_and_save_knowledge)
            logger.info(f"Compressed and saved knowledge summary.")
        except Exception as e:
            logger.error(f"Error in knowledge compression: {e}")
        try:
            await asyncio.sleep(86400)
        except asyncio.CancelledError:
            break
Section sources
Memory Consolidation
The system performs periodic memory consolidation to optimize storage and retrieval efficiency.
Configuration:
- Frequency: Every 6 hours (21,600 seconds)
- Triggered by: memory_consolidation_task()in the main system loop
Process:
- Selects a batch of memories for consolidation
- Uses an LLM to analyze, merge, deduplicate, and generalize memories
- Creates consolidated memories with type 'long-term-consolidated'
- Deletes the original, redundant memories
The consolidation process follows these rules:
- Merge related memories into comprehensive statements
- Remove duplicates, keeping the most detailed version
- Generalize specific facts into broader categories
- Never lose critical information
- Never merge unrelated facts
async def memory_consolidation_task(self):
    """Periodically consolidates memories to optimize retrieval and relevance."""
    while not self._shutdown.is_set():
        try:
            logger.info("Starting memory consolidation...")
            consolidation_result = await self.memory_service.consolidate_memories()
            logger.info(f"Memory consolidation finished. Report: {consolidation_result}")
        except Exception as e:
            logger.error(f"Error during memory consolidation: {e}", exc_info=True)
        try:
            await asyncio.sleep(21600)
        except asyncio.CancelledError:
            break
Section sources
VLTM Retention Policies
The Very Long-Term Memory system implements configurable retention policies based on memory type and importance.
Policy configuration:
- Strategic knowledge: Permanent retention, compression after 90 days
- Critical failures: Permanent retention, compression after 30 days
- Successful improvements: 2-year retention, compression after 180 days
- Code patterns: 1-year retention, compression after 90 days
Retention rules:
- Memories are automatically compressed based on age and type
- High-importance memories are retained permanently
- Low-importance memories may be archived or deleted based on policy
- Compression reduces storage footprint while preserving essential information
Section sources
Data Security, Encryption, and Backup Procedures
The system's approach to data security, encryption, and backup is primarily defined by its configuration and deployment environment.
Database Configuration
The database URL is configured through environment variables, allowing for flexible deployment configurations.
class Config:
    DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///ravana_agi.db")
Default configuration:
- SQLite database file named ravana_agi.db
- Stored in the working directory
- No built-in encryption at rest
Security implications:
- The default SQLite configuration provides no encryption for data at rest
- Production deployments should use environment variables to specify a database URL with appropriate security features
- Supported databases include PostgreSQL, MySQL, and others that can provide encryption at rest
Data Protection Measures
The system implements several data protection measures:
Deduplication with hashing:
- Uses SHA-256 hashing to identify duplicate content
- Hashes are stored in the content_hashfield of theSummarytable
- Prevents redundant storage of identical information
content_hash = hashlib.sha256(content.encode('utf-8')).hexdigest()
No explicit backup procedures:
- The codebase does not contain explicit backup procedures
- Backup strategy would depend on the underlying database system and deployment environment
- For SQLite, backup would involve copying the database file
- For server-based databases, standard database backup procedures would apply
Section sources
Migration Strategies for Schema Evolution
The system's architecture suggests several approaches for handling schema evolution, though explicit migration tools are not implemented in the current codebase.
Current State
The system uses SQLModel's create_all() method to create tables, which only creates missing tables but does not handle schema changes to existing tables.
def create_db_and_tables():
    SQLModel.metadata.create_all(engine)
This approach has limitations:
- Does not modify existing columns
- Does not add constraints to existing tables
- Does not handle data migration between schema versions
- Risk of data loss or corruption during schema changes
Recommended Migration Strategy
For production use, the following migration strategy is recommended:
1. Use Alembic for database migrations:
- Alembic is the standard migration tool for SQLAlchemy (which SQLModel is built upon)
- Provides versioned migration scripts
- Supports both automated and manual migration generation
- Handles forward and backward migrations
2. Migration process:
- Create a new migration script for each schema change
- Include data migration logic when necessary
- Test migrations on a copy of production data
- Apply migrations in a controlled manner
3. Example migration workflow:
# Generate migration
alembic revision --autogenerate -m "Add mood intensity field"
# Review and edit migration script
# Apply migration
alembic upgrade head
4. Zero-downtime considerations:
- Design schema changes to be backward compatible
- Use additive changes when possible (adding columns, not removing)
- Implement feature flags for new functionality
- Deploy schema changes separately from application changes when possible
5. Data migration best practices:
- Always backup data before migrations
- Test migrations on representative data sets
- Monitor application behavior after migrations
- Have a rollback plan for each migration
While the current codebase does not implement a formal migration system, adopting Alembic or a similar tool would provide a robust solution for schema evolution.
Conclusion
The RAVANA AGI system employs a comprehensive database schema designed to support an autonomous artificial intelligence that can process information, make decisions, and evolve over time. The schema captures various aspects of the system's operation, including external inputs (articles), internal states (mood logs), cognitive processes (situations, decisions), and learning mechanisms (summaries, experiments).
Key strengths of the current implementation include:
- Clean separation of concerns through well-defined entities
- Effective use of JSON fields for flexible data storage
- Integration of semantic search through ChromaDB for performance-critical memory retrieval
- Automated data lifecycle management through periodic compression and consolidation
- Enhanced Very Long-Term Memory system with many-to-many relationships between consolidations, patterns, and strategic knowledge
Areas for improvement include:
- Implementing a formal database migration system like Alembic
- Enhancing data security with encryption at rest
- Establishing explicit backup procedures
- Potentially adding database indexes on frequently queried fields
The system's architecture demonstrates a sophisticated approach to AGI data management, balancing structured storage with flexible, semantic retrieval mechanisms to support intelligent behavior.
Referenced Files in This Document
- database/models.py
- database/engine.py
- services/data_service.py
- services/knowledge_service.py
- core/config.py
- modules/episodic_memory/memory.py
- services/memory_service.py
- modules/knowledge_compression/compressed_memory.py
- core/vltm_data_models.py - Updated in recent commit
- database/vltm_schema.sql - Updated in recent commit