RAVANA AGI

Database Schema

Database Schema

Update Summary

Changes Made

  • Added new entity definitions for VeryLongTermMemory, MemoryPattern, MemoryConsolidation, StrategicKnowledge, and related junction tables
  • Updated relationships and schema diagram to include new VLTM entities and many-to-many relationships
  • Added new section on Very Long-Term Memory System architecture
  • Updated indexing strategy to include new VLTM indexes
  • Enhanced data lifecycle policies to include VLTM retention policies
  • Added new section on strategic knowledge management

Table of Contents

  1. Introduction
  2. Entity Definitions
  3. Very Long-Term Memory System
  4. Relationships and Schema Diagram
  5. Data Access Patterns and ORM Usage
  6. Indexing Strategy for Performance-Critical Queries
  7. Data Lifecycle Policies and Retention
  8. Data Security, Encryption, and Backup Procedures
  9. Migration Strategies for Schema Evolution
  10. Conclusion

Introduction

This document provides comprehensive documentation for the database schema used in the RAVANA AGI system. The schema is designed to support an autonomous artificial general intelligence system that processes information, makes decisions, reflects on its behavior, and evolves over time. The database stores various types of operational, emotional, and cognitive data that enable the system to maintain context, learn from experience, and generate intelligent responses. This documentation details all entities, their relationships, access patterns, performance considerations, and lifecycle management policies.

Entity Definitions

The database schema consists of several key entities that capture different aspects of the AGI's operation and cognition. Each entity is defined using SQLModel, which provides a Pythonic interface for database interactions.

Article

Stores information about news articles and other content sources that the system monitors.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the article
  • title: str - Title of the article
  • link: str - URL of the article
  • published: str - Publication timestamp in ISO format
  • source: str - Source of the article (e.g., CNN, BBC)
  • fetched_at: str - Timestamp when the article was fetched by the system

Constraints:

  • Primary key on id
  • Uniqueness is enforced at the application level based on the link field to prevent duplicate articles

Section sources

Event

Represents significant events detected by the system through analysis of articles and other inputs.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the event
  • timestamp: str - Timestamp of the event in ISO format
  • description: str - Natural language description of the event
  • keywords: str - Comma-separated keywords associated with the event
  • cluster_id: int - Identifier for the cluster to which this event belongs

Constraints:

  • Primary key on id

Section sources

Summary

Stores compressed knowledge and summaries generated by the system, serving as a mechanism for knowledge retention and deduplication.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the summary
  • timestamp: str - Creation timestamp in ISO format
  • summary_text: str - The actual summary content
  • source: str | None = Field(default="unknown") - Source of the summary (e.g., system, user)
  • category: str | None = Field(default="misc") - Category of the summary (e.g., compression, reflection)
  • content_hash: str | None = Field(default=None) - SHA-256 hash of the summary text for deduplication

Constraints:

  • Primary key on id
  • The content_hash field enables deduplication by allowing the system to check if a similar summary already exists

Section sources

ActionLog

Records all actions performed by the AGI system, providing an audit trail of its operations.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the log entry
  • timestamp: str - Timestamp of the action in ISO format
  • action_name: str - Name of the action performed
  • params: str - JSON string containing the parameters passed to the action
  • status: str - Status of the action ('success' or 'failure')
  • result: str - JSON string containing the result of the action

Constraints:

  • Primary key on id

Section sources

MoodLog

Captures the emotional state of the AGI system at specific points in time.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the mood log entry
  • timestamp: str - Timestamp of the mood recording in ISO format
  • mood_vector: str - JSON string representing the mood vector (a multidimensional representation of emotional state)

Constraints:

  • Primary key on id

Section sources

SituationLog

Records situations generated by the system for decision-making and reflection purposes.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the situation log
  • timestamp: str - Timestamp of the situation creation in ISO format
  • situation_type: str - Type of situation (e.g., curiosity, reflection)
  • prompt: str - The prompt that generated the situation
  • context: str - JSON string containing additional context for the situation

Constraints:

  • Primary key on id

Section sources

DecisionLog

Stores decisions made by the AGI system in response to specific situations.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the decision log
  • timestamp: str - Timestamp of the decision in ISO format
  • situation_id: int | None = Field(default=None, foreign_key="situationlog.id") - Foreign key referencing the situation that prompted the decision
  • raw_response: str - The raw response from the decision-making process

Constraints:

  • Primary key on id
  • Foreign key relationship with SituationLog on the situation_id field

Section sources

ExperimentLog

Records experiments conducted by the AGI system to test hypotheses or explore new ideas.

Fields:

  • id: int | None = Field(default=None, primary_key=True) - Unique identifier for the experiment log
  • timestamp: str - Timestamp of the experiment in ISO format
  • hypothesis: str - The hypothesis being tested
  • results: str - JSON string containing the results of the experiment

Constraints:

  • Primary key on id

Section sources

Very Long-Term Memory System

The Very Long-Term Memory (VLTM) system has been enhanced with new entities and relationships to support advanced knowledge management and strategic reasoning.

VeryLongTermMemory

Core very long-term memory record that stores compressed cognitive content.

Fields:

  • memory_id: str - Primary key, UUID identifier for the memory
  • memory_type: MemoryType - Enum indicating the type of memory (strategic_knowledge, architectural_insight, etc.)
  • created_at: datetime - Timestamp when the memory was created
  • last_accessed: datetime - Timestamp of last access
  • promoted_at: datetime - Timestamp when memory was promoted to long-term storage
  • access_count: int - Number of times the memory has been accessed
  • importance_score: float - Calculated importance score (0.0-1.0)
  • strategic_value: float - Strategic value for long-term planning (0.0-1.0)
  • compressed_content: str - JSON string of compressed memory data
  • metadata_info: str - JSON string of metadata
  • source_session: str - Identifier of the session that created the memory
  • related_memories: str - JSON array of related memory IDs
  • retention_category: str - Retention policy category

Constraints:

  • Primary key on memory_id
  • Importance score and strategic value constrained between 0.0 and 1.0
  • JSON fields validated for proper format

Section sources

MemoryPattern

Patterns extracted from memories through analysis of memory content and relationships.

Fields:

  • pattern_id: str - Primary key, UUID identifier for the pattern
  • pattern_type: PatternType - Enum indicating the type of pattern (temporal, causal, behavioral, etc.)
  • pattern_description: str - Natural language description of the pattern
  • confidence_score: float - Confidence in pattern validity (0.0-1.0)
  • pattern_data: str - JSON string of pattern-specific data
  • discovered_at: datetime - Timestamp when the pattern was discovered
  • supporting_memories: str - JSON array of memory IDs that support this pattern
  • validation_count: int - Number of times the pattern has been validated
  • last_validated: Optional[datetime] - Timestamp of last validation
  • source_memory_id: Optional[str] - Foreign key to the source memory

Constraints:

  • Primary key on pattern_id
  • Confidence score constrained between 0.0 and 1.0
  • JSON fields validated for proper format

Section sources

MemoryConsolidation

Records of memory consolidation processes that optimize storage and extract higher-level knowledge.

Fields:

  • consolidation_id: str - Primary key, UUID identifier for the consolidation
  • consolidation_date: datetime - Timestamp of the consolidation process
  • consolidation_type: ConsolidationType - Enum indicating the type (daily, weekly, monthly, etc.)
  • memories_processed: int - Number of memories processed in this consolidation
  • patterns_extracted: int - Number of patterns extracted
  • compression_ratio: float - Ratio of original size to compressed size
  • consolidation_results: str - JSON string of consolidation results
  • processing_time_seconds: float - Time taken to process the consolidation
  • success: bool - Whether the consolidation was successful
  • error_message: Optional[str] - Error message if consolidation failed

Constraints:

  • Primary key on consolidation_id

Section sources

StrategicKnowledge

High-level strategic knowledge derived from patterns and used for long-term planning.

Fields:

  • knowledge_id: str - Primary key, UUID identifier for the knowledge
  • knowledge_domain: str - Domain of the knowledge (e.g., architecture, performance, learning)
  • knowledge_summary: str - Natural language summary of the knowledge
  • confidence_level: float - Confidence in the knowledge (0.0-1.0)
  • last_updated: datetime - Timestamp of last update
  • source_patterns: str - JSON array of pattern IDs that contributed to this knowledge
  • knowledge_structure: str - JSON string of structured knowledge representation
  • validation_score: float - Score based on validation attempts (0.0-1.0)
  • application_count: int - Number of times this knowledge has been applied

Constraints:

  • Primary key on knowledge_id
  • Confidence level and validation score constrained between 0.0 and 1.0

Section sources

ConsolidationPattern

Junction table linking memory consolidations and patterns, enabling many-to-many relationships.

Fields:

  • consolidation_id: str - Foreign key to memory_consolidations table, part of primary key
  • pattern_id: str - Foreign key to memory_patterns table, part of primary key
  • extraction_confidence: float - Confidence in the pattern extraction during consolidation

Constraints:

  • Composite primary key on consolidation_id and pattern_id
  • Foreign key relationships to memory_consolidations and memory_patterns tables

Section sources

PatternStrategicKnowledge

Junction table linking memory patterns and strategic knowledge, enabling many-to-many relationships.

Fields:

  • pattern_id: str - Foreign key to memory_patterns table, part of primary key
  • knowledge_id: str - Foreign key to strategic_knowledge table, part of primary key
  • contribution_weight: float - Weight of the pattern's contribution to the strategic knowledge

Constraints:

  • Composite primary key on pattern_id and knowledge_id
  • Foreign key relationships to memory_patterns and strategic_knowledge tables

Section sources

ConsolidationMetrics

Performance metrics for consolidation processes.

Fields:

  • metric_id: str - Primary key, UUID identifier for the metric
  • consolidation_id: str - Foreign key to memory_consolidations table
  • metric_name: str - Name of the metric
  • metric_value: float - Value of the metric
  • metric_unit: str - Unit of measurement
  • timestamp: datetime - Timestamp when the metric was recorded

Constraints:

  • Primary key on metric_id
  • Foreign key relationship to memory_consolidations table

Section sources

Relationships and Schema Diagram

The database schema entities are related through foreign key relationships and conceptual associations that reflect the AGI system's cognitive processes.

Diagram sources

Section sources

Data Access Patterns and ORM Usage

The system uses SQLModel as an ORM (Object-Relational Mapper) to interact with the database, providing a clean, Pythonic interface for data operations.

ORM Configuration

The database engine is configured in engine.py using SQLAlchemy's create_engine function with the database URL obtained from the configuration system.

from sqlmodel import create_engine, SQLModel
from core.config import Config

engine = create_engine(Config.DATABASE_URL, echo=True)

def create_db_and_tables():
    SQLModel.metadata.create_all(engine)

The create_db_and_tables() function initializes the database schema by creating all tables defined in the SQLModel classes.

Section sources

Transaction Boundaries

Data access follows a consistent pattern using context managers to ensure proper transaction boundaries and resource cleanup. The DataService class in data_service.py implements methods for saving various types of data with proper transaction management.

def save_action_log(self, action_name: str, params: dict, status: str, result: any):
    """Saves a record of an executed action to the database."""
    with Session(self.engine) as session:
        action_log = ActionLog(
            timestamp=datetime.utcnow().isoformat(),
            action_name=action_name,
            params=json.dumps(params),
            status=status,
            result=json.dumps(result)
        )
        session.add(action_log)
        session.commit()

Key characteristics of the data access pattern:

  • Each operation uses a context manager (with Session(...) as session) to ensure the session is properly closed
  • Transactions are committed explicitly with session.commit()
  • JSON serialization is used for complex data structures stored in text fields
  • Operations are atomic and follow the principle of least privilege

Data Service Implementation

The DataService class provides a service layer interface for database operations, abstracting the underlying ORM details from the rest of the application.

Key methods:

  • fetch_and_save_articles(): Fetches articles from configured RSS feeds and saves new ones to the database
  • detect_and_save_events(): Analyzes articles to detect significant events and saves them
  • save_action_log(): Records system actions
  • save_mood_log(): Records the system's emotional state
  • save_situation_log(): Records generated situations
  • save_decision_log(): Records decisions made by the system
  • save_experiment_log(): Records experiments and their results

The service layer ensures consistent data handling and provides a single point of control for database interactions.

Section sources

Indexing Strategy for Performance-Critical Queries

The system employs a multi-layered approach to optimize performance-critical queries, particularly for memory retrieval operations.

ChromaDB for Semantic Memory Retrieval

For performance-critical memory retrieval operations, the system uses ChromaDB, an embedding database optimized for semantic search. This is implemented in the episodic_memory module.

Key features:

  • Uses sentence-transformers with the 'all-MiniLM-L6-v2' model to generate 384-dimensional embeddings
  • Stores embeddings in a persistent ChromaDB collection with automatic embedding generation
  • Supports similarity-based queries for retrieving relevant memories
chroma_collection = chroma_client.get_or_create_collection(
    name=CHROMA_COLLECTION,
    embedding_function=sentence_transformer_ef
)

Query process:

  1. A query text is converted to an embedding vector
  2. The system finds the nearest neighbors in the embedding space
  3. Results are returned based on cosine similarity
  4. A similarity threshold filters out irrelevant results

The get_relevant_memories_api function implements this retrieval pattern:

def get_relevant_memories_api(request: QueryRequest):
    results = chroma_collection.query(
        query_texts=[request.query_text],
        n_results=request.top_n
    )
    
    # Convert distance to similarity and filter by threshold
    similarity = 1 - dist
    if similarity >= request.similarity_threshold:
        # Include in results

VLTM Database Indexing

The VLTM system implements comprehensive indexing strategies to optimize query performance for very long-term memory operations.

Index types:

  • B-tree indexes: On frequently queried scalar fields like timestamps, memory types, and confidence scores
  • GIN indexes: On JSONB fields for efficient querying of structured data
  • Text search indexes: Using tsvector for full-text search capabilities
  • Composite indexes: On frequently queried field combinations

Key indexes:

  • idx_vltm_memory_type: On memory_type for filtering by memory category
  • idx_vltm_created_at: On created_at for time-based queries
  • idx_vltm_importance_score: On importance_score for prioritizing important memories
  • idx_patterns_type: On pattern_type for filtering by pattern category
  • idx_strategic_domain: On knowledge_domain for domain-specific queries
  • idx_vltm_metadata_gin: GIN index on metadata for efficient JSON querying
  • idx_patterns_description_text: Text search index on pattern descriptions

Section sources

Data Lifecycle Policies and Retention

The system implements automated data lifecycle management through background tasks that handle retention, consolidation, and archival.

Knowledge Compression

The system runs a periodic knowledge compression task that summarizes recent interactions and creates higher-level abstractions.

Configuration:

  • Frequency: Every 24 hours (86,400 seconds)
  • Triggered by: knowledge_compression_task() in the main system loop

Process:

  1. Retrieves recent summaries from the database
  2. Uses an LLM to compress and synthesize the information
  3. Creates new Summary records with category "compression"
  4. Uses content hashing to prevent duplication
async def knowledge_compression_task(self):
    """Background task to compress knowledge every 24 hours."""
    while not self._shutdown.is_set():
        try:
            summary = await asyncio.to_thread(self.knowledge_service.compress_and_save_knowledge)
            logger.info(f"Compressed and saved knowledge summary.")
        except Exception as e:
            logger.error(f"Error in knowledge compression: {e}")

        try:
            await asyncio.sleep(86400)
        except asyncio.CancelledError:
            break

Section sources

Memory Consolidation

The system performs periodic memory consolidation to optimize storage and retrieval efficiency.

Configuration:

  • Frequency: Every 6 hours (21,600 seconds)
  • Triggered by: memory_consolidation_task() in the main system loop

Process:

  1. Selects a batch of memories for consolidation
  2. Uses an LLM to analyze, merge, deduplicate, and generalize memories
  3. Creates consolidated memories with type 'long-term-consolidated'
  4. Deletes the original, redundant memories

The consolidation process follows these rules:

  • Merge related memories into comprehensive statements
  • Remove duplicates, keeping the most detailed version
  • Generalize specific facts into broader categories
  • Never lose critical information
  • Never merge unrelated facts
async def memory_consolidation_task(self):
    """Periodically consolidates memories to optimize retrieval and relevance."""
    while not self._shutdown.is_set():
        try:
            logger.info("Starting memory consolidation...")
            consolidation_result = await self.memory_service.consolidate_memories()
            logger.info(f"Memory consolidation finished. Report: {consolidation_result}")
        except Exception as e:
            logger.error(f"Error during memory consolidation: {e}", exc_info=True)

        try:
            await asyncio.sleep(21600)
        except asyncio.CancelledError:
            break

Section sources

VLTM Retention Policies

The Very Long-Term Memory system implements configurable retention policies based on memory type and importance.

Policy configuration:

  • Strategic knowledge: Permanent retention, compression after 90 days
  • Critical failures: Permanent retention, compression after 30 days
  • Successful improvements: 2-year retention, compression after 180 days
  • Code patterns: 1-year retention, compression after 90 days

Retention rules:

  • Memories are automatically compressed based on age and type
  • High-importance memories are retained permanently
  • Low-importance memories may be archived or deleted based on policy
  • Compression reduces storage footprint while preserving essential information

Section sources

Data Security, Encryption, and Backup Procedures

The system's approach to data security, encryption, and backup is primarily defined by its configuration and deployment environment.

Database Configuration

The database URL is configured through environment variables, allowing for flexible deployment configurations.

class Config:
    DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///ravana_agi.db")

Default configuration:

  • SQLite database file named ravana_agi.db
  • Stored in the working directory
  • No built-in encryption at rest

Security implications:

  • The default SQLite configuration provides no encryption for data at rest
  • Production deployments should use environment variables to specify a database URL with appropriate security features
  • Supported databases include PostgreSQL, MySQL, and others that can provide encryption at rest

Data Protection Measures

The system implements several data protection measures:

Deduplication with hashing:

  • Uses SHA-256 hashing to identify duplicate content
  • Hashes are stored in the content_hash field of the Summary table
  • Prevents redundant storage of identical information
content_hash = hashlib.sha256(content.encode('utf-8')).hexdigest()

No explicit backup procedures:

  • The codebase does not contain explicit backup procedures
  • Backup strategy would depend on the underlying database system and deployment environment
  • For SQLite, backup would involve copying the database file
  • For server-based databases, standard database backup procedures would apply

Section sources

Migration Strategies for Schema Evolution

The system's architecture suggests several approaches for handling schema evolution, though explicit migration tools are not implemented in the current codebase.

Current State

The system uses SQLModel's create_all() method to create tables, which only creates missing tables but does not handle schema changes to existing tables.

def create_db_and_tables():
    SQLModel.metadata.create_all(engine)

This approach has limitations:

  • Does not modify existing columns
  • Does not add constraints to existing tables
  • Does not handle data migration between schema versions
  • Risk of data loss or corruption during schema changes

Recommended Migration Strategy

For production use, the following migration strategy is recommended:

1. Use Alembic for database migrations:

  • Alembic is the standard migration tool for SQLAlchemy (which SQLModel is built upon)
  • Provides versioned migration scripts
  • Supports both automated and manual migration generation
  • Handles forward and backward migrations

2. Migration process:

  • Create a new migration script for each schema change
  • Include data migration logic when necessary
  • Test migrations on a copy of production data
  • Apply migrations in a controlled manner

3. Example migration workflow:

# Generate migration
alembic revision --autogenerate -m "Add mood intensity field"

# Review and edit migration script
# Apply migration
alembic upgrade head

4. Zero-downtime considerations:

  • Design schema changes to be backward compatible
  • Use additive changes when possible (adding columns, not removing)
  • Implement feature flags for new functionality
  • Deploy schema changes separately from application changes when possible

5. Data migration best practices:

  • Always backup data before migrations
  • Test migrations on representative data sets
  • Monitor application behavior after migrations
  • Have a rollback plan for each migration

While the current codebase does not implement a formal migration system, adopting Alembic or a similar tool would provide a robust solution for schema evolution.

Conclusion

The RAVANA AGI system employs a comprehensive database schema designed to support an autonomous artificial intelligence that can process information, make decisions, and evolve over time. The schema captures various aspects of the system's operation, including external inputs (articles), internal states (mood logs), cognitive processes (situations, decisions), and learning mechanisms (summaries, experiments).

Key strengths of the current implementation include:

  • Clean separation of concerns through well-defined entities
  • Effective use of JSON fields for flexible data storage
  • Integration of semantic search through ChromaDB for performance-critical memory retrieval
  • Automated data lifecycle management through periodic compression and consolidation
  • Enhanced Very Long-Term Memory system with many-to-many relationships between consolidations, patterns, and strategic knowledge

Areas for improvement include:

  • Implementing a formal database migration system like Alembic
  • Enhancing data security with encryption at rest
  • Establishing explicit backup procedures
  • Potentially adding database indexes on frequently queried fields

The system's architecture demonstrates a sophisticated approach to AGI data management, balancing structured storage with flexible, semantic retrieval mechanisms to support intelligent behavior.

Referenced Files in This Document