Database Schema

Update Summary

Changes Made

Added new entity definitions for VeryLongTermMemory, MemoryPattern, MemoryConsolidation, StrategicKnowledge, and related junction tables
Updated relationships and schema diagram to include new VLTM entities and many-to-many relationships
Added new section on Very Long-Term Memory System architecture
Updated indexing strategy to include new VLTM indexes
Enhanced data lifecycle policies to include VLTM retention policies
Added new section on strategic knowledge management

Introduction
Entity Definitions
Very Long-Term Memory System
Relationships and Schema Diagram
Data Access Patterns and ORM Usage
Indexing Strategy for Performance-Critical Queries
Data Lifecycle Policies and Retention
Data Security, Encryption, and Backup Procedures
Migration Strategies for Schema Evolution
Conclusion

Introduction

This document provides comprehensive documentation for the database schema used in the RAVANA AGI system. The schema is designed to support an autonomous artificial general intelligence system that processes information, makes decisions, reflects on its behavior, and evolves over time. The database stores various types of operational, emotional, and cognitive data that enable the system to maintain context, learn from experience, and generate intelligent responses. This documentation details all entities, their relationships, access patterns, performance considerations, and lifecycle management policies.

Entity Definitions

The database schema consists of several key entities that capture different aspects of the AGI's operation and cognition. Each entity is defined using SQLModel, which provides a Pythonic interface for database interactions.

Article

Stores information about news articles and other content sources that the system monitors.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the article
title: str - Title of the article
link: str - URL of the article
published: str - Publication timestamp in ISO format
source: str - Source of the article (e.g., CNN, BBC)
fetched_at: str - Timestamp when the article was fetched by the system

Constraints:

Primary key on id
Uniqueness is enforced at the application level based on the link field to prevent duplicate articles

Section sources

database/models.py

Event

Represents significant events detected by the system through analysis of articles and other inputs.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the event
timestamp: str - Timestamp of the event in ISO format
description: str - Natural language description of the event
keywords: str - Comma-separated keywords associated with the event
cluster_id: int - Identifier for the cluster to which this event belongs

Constraints:

Primary key on id

Section sources

database/models.py

Summary

Stores compressed knowledge and summaries generated by the system, serving as a mechanism for knowledge retention and deduplication.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the summary
timestamp: str - Creation timestamp in ISO format
summary_text: str - The actual summary content
source: str | None = Field(default="unknown") - Source of the summary (e.g., system, user)
category: str | None = Field(default="misc") - Category of the summary (e.g., compression, reflection)
content_hash: str | None = Field(default=None) - SHA-256 hash of the summary text for deduplication

Constraints:

Primary key on id
The content_hash field enables deduplication by allowing the system to check if a similar summary already exists

Section sources

database/models.py

ActionLog

Records all actions performed by the AGI system, providing an audit trail of its operations.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the log entry
timestamp: str - Timestamp of the action in ISO format
action_name: str - Name of the action performed
params: str - JSON string containing the parameters passed to the action
status: str - Status of the action ('success' or 'failure')
result: str - JSON string containing the result of the action

Constraints:

Primary key on id

Section sources

database/models.py

MoodLog

Captures the emotional state of the AGI system at specific points in time.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the mood log entry
timestamp: str - Timestamp of the mood recording in ISO format
mood_vector: str - JSON string representing the mood vector (a multidimensional representation of emotional state)

Constraints:

Primary key on id

Section sources

database/models.py

SituationLog

Records situations generated by the system for decision-making and reflection purposes.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the situation log
timestamp: str - Timestamp of the situation creation in ISO format
situation_type: str - Type of situation (e.g., curiosity, reflection)
prompt: str - The prompt that generated the situation
context: str - JSON string containing additional context for the situation

Constraints:

Primary key on id

Section sources

database/models.py

DecisionLog

Stores decisions made by the AGI system in response to specific situations.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the decision log
timestamp: str - Timestamp of the decision in ISO format
situation_id: int | None = Field(default=None, foreign_key="situationlog.id") - Foreign key referencing the situation that prompted the decision
raw_response: str - The raw response from the decision-making process

Constraints:

Primary key on id
Foreign key relationship with SituationLog on the situation_id field

Section sources

database/models.py

ExperimentLog

Records experiments conducted by the AGI system to test hypotheses or explore new ideas.

Fields:

id: int | None = Field(default=None, primary_key=True) - Unique identifier for the experiment log
timestamp: str - Timestamp of the experiment in ISO format
hypothesis: str - The hypothesis being tested
results: str - JSON string containing the results of the experiment

Constraints:

Primary key on id

Section sources

database/models.py

Very Long-Term Memory System

The Very Long-Term Memory (VLTM) system has been enhanced with new entities and relationships to support advanced knowledge management and strategic reasoning.

VeryLongTermMemory

Core very long-term memory record that stores compressed cognitive content.

Fields:

memory_id: str - Primary key, UUID identifier for the memory
memory_type: MemoryType - Enum indicating the type of memory (strategic_knowledge, architectural_insight, etc.)
created_at: datetime - Timestamp when the memory was created
last_accessed: datetime - Timestamp of last access
promoted_at: datetime - Timestamp when memory was promoted to long-term storage
access_count: int - Number of times the memory has been accessed
importance_score: float - Calculated importance score (0.0-1.0)
strategic_value: float - Strategic value for long-term planning (0.0-1.0)
compressed_content: str - JSON string of compressed memory data
metadata_info: str - JSON string of metadata
source_session: str - Identifier of the session that created the memory
related_memories: str - JSON array of related memory IDs
retention_category: str - Retention policy category

Constraints:

Primary key on memory_id
Importance score and strategic value constrained between 0.0 and 1.0
JSON fields validated for proper format

Section sources

core/vltm_data_models.py

MemoryPattern

Patterns extracted from memories through analysis of memory content and relationships.

Fields:

pattern_id: str - Primary key, UUID identifier for the pattern
pattern_type: PatternType - Enum indicating the type of pattern (temporal, causal, behavioral, etc.)
pattern_description: str - Natural language description of the pattern
confidence_score: float - Confidence in pattern validity (0.0-1.0)
pattern_data: str - JSON string of pattern-specific data
discovered_at: datetime - Timestamp when the pattern was discovered
supporting_memories: str - JSON array of memory IDs that support this pattern
validation_count: int - Number of times the pattern has been validated
last_validated: Optional[datetime] - Timestamp of last validation
source_memory_id: Optional[str] - Foreign key to the source memory

Constraints:

Primary key on pattern_id
Confidence score constrained between 0.0 and 1.0
JSON fields validated for proper format

Section sources

core/vltm_data_models.py

MemoryConsolidation

Records of memory consolidation processes that optimize storage and extract higher-level knowledge.

Fields:

consolidation_id: str - Primary key, UUID identifier for the consolidation
consolidation_date: datetime - Timestamp of the consolidation process
consolidation_type: ConsolidationType - Enum indicating the type (daily, weekly, monthly, etc.)
memories_processed: int - Number of memories processed in this consolidation
patterns_extracted: int - Number of patterns extracted
compression_ratio: float - Ratio of original size to compressed size
consolidation_results: str - JSON string of consolidation results
processing_time_seconds: float - Time taken to process the consolidation
success: bool - Whether the consolidation was successful
error_message: Optional[str] - Error message if consolidation failed

Constraints:

Primary key on consolidation_id

Section sources

core/vltm_data_models.py

StrategicKnowledge

High-level strategic knowledge derived from patterns and used for long-term planning.

Fields:

knowledge_id: str - Primary key, UUID identifier for the knowledge
knowledge_domain: str - Domain of the knowledge (e.g., architecture, performance, learning)
knowledge_summary: str - Natural language summary of the knowledge
confidence_level: float - Confidence in the knowledge (0.0-1.0)
last_updated: datetime - Timestamp of last update
source_patterns: str - JSON array of pattern IDs that contributed to this knowledge
knowledge_structure: str - JSON string of structured knowledge representation
validation_score: float - Score based on validation attempts (0.0-1.0)
application_count: int - Number of times this knowledge has been applied

Constraints:

Primary key on knowledge_id
Confidence level and validation score constrained between 0.0 and 1.0

Section sources

core/vltm_data_models.py

ConsolidationPattern

Junction table linking memory consolidations and patterns, enabling many-to-many relationships.

Fields:

consolidation_id: str - Foreign key to memory_consolidations table, part of primary key
pattern_id: str - Foreign key to memory_patterns table, part of primary key
extraction_confidence: float - Confidence in the pattern extraction during consolidation

Constraints:

Composite primary key on consolidation_id and pattern_id
Foreign key relationships to memory_consolidations and memory_patterns tables

Section sources

core/vltm_data_models.py

PatternStrategicKnowledge

Junction table linking memory patterns and strategic knowledge, enabling many-to-many relationships.

Fields:

pattern_id: str - Foreign key to memory_patterns table, part of primary key
knowledge_id: str - Foreign key to strategic_knowledge table, part of primary key
contribution_weight: float - Weight of the pattern's contribution to the strategic knowledge

Constraints:

Composite primary key on pattern_id and knowledge_id
Foreign key relationships to memory_patterns and strategic_knowledge tables

Section sources

core/vltm_data_models.py

ConsolidationMetrics

Performance metrics for consolidation processes.

Fields:

metric_id: str - Primary key, UUID identifier for the metric
consolidation_id: str - Foreign key to memory_consolidations table
metric_name: str - Name of the metric
metric_value: float - Value of the metric
metric_unit: str - Unit of measurement
timestamp: datetime - Timestamp when the metric was recorded

Constraints:

Primary key on metric_id
Foreign key relationship to memory_consolidations table

Section sources

core/vltm_data_models.py

Relationships and Schema Diagram

The database schema entities are related through foreign key relationships and conceptual associations that reflect the AGI system's cognitive processes.

Diagram sources

core/vltm_data_models.py
database/vltm_schema.sql

Section sources

core/vltm_data_models.py
database/vltm_schema.sql

Data Access Patterns and ORM Usage

The system uses SQLModel as an ORM (Object-Relational Mapper) to interact with the database, providing a clean, Pythonic interface for data operations.

ORM Configuration

The database engine is configured in engine.py using SQLAlchemy's create_engine function with the database URL obtained from the configuration system.

from sqlmodel import create_engine, SQLModel
from core.config import Config

engine = create_engine(Config.DATABASE_URL, echo=True)

def create_db_and_tables():
    SQLModel.metadata.create_all(engine)

The create_db_and_tables() function initializes the database schema by creating all tables defined in the SQLModel classes.

Section sources

database/engine.py

Transaction Boundaries

Data access follows a consistent pattern using context managers to ensure proper transaction boundaries and resource cleanup. The DataService class in data_service.py implements methods for saving various types of data with proper transaction management.

def save_action_log(self, action_name: str, params: dict, status: str, result: any):
    """Saves a record of an executed action to the database."""
    with Session(self.engine) as session:
        action_log = ActionLog(
            timestamp=datetime.utcnow().isoformat(),
            action_name=action_name,
            params=json.dumps(params),
            status=status,
            result=json.dumps(result)
        )
        session.add(action_log)
        session.commit()

Key characteristics of the data access pattern:

Each operation uses a context manager (with Session(...) as session) to ensure the session is properly closed
Transactions are committed explicitly with session.commit()
JSON serialization is used for complex data structures stored in text fields
Operations are atomic and follow the principle of least privilege

Data Service Implementation

The DataService class provides a service layer interface for database operations, abstracting the underlying ORM details from the rest of the application.

Key methods:

fetch_and_save_articles(): Fetches articles from configured RSS feeds and saves new ones to the database
detect_and_save_events(): Analyzes articles to detect significant events and saves them
save_action_log(): Records system actions
save_mood_log(): Records the system's emotional state
save_situation_log(): Records generated situations
save_decision_log(): Records decisions made by the system
save_experiment_log(): Records experiments and their results

The service layer ensures consistent data handling and provides a single point of control for database interactions.

Section sources

services/data_service.py

Indexing Strategy for Performance-Critical Queries

The system employs a multi-layered approach to optimize performance-critical queries, particularly for memory retrieval operations.

ChromaDB for Semantic Memory Retrieval

For performance-critical memory retrieval operations, the system uses ChromaDB, an embedding database optimized for semantic search. This is implemented in the episodic_memory module.

Key features:

Uses sentence-transformers with the 'all-MiniLM-L6-v2' model to generate 384-dimensional embeddings
Stores embeddings in a persistent ChromaDB collection with automatic embedding generation
Supports similarity-based queries for retrieving relevant memories

chroma_collection = chroma_client.get_or_create_collection(
    name=CHROMA_COLLECTION,
    embedding_function=sentence_transformer_ef
)

Query process:

A query text is converted to an embedding vector
The system finds the nearest neighbors in the embedding space
Results are returned based on cosine similarity
A similarity threshold filters out irrelevant results

The get_relevant_memories_api function implements this retrieval pattern:

def get_relevant_memories_api(request: QueryRequest):
    results = chroma_collection.query(
        query_texts=[request.query_text],
        n_results=request.top_n
    )
    
    # Convert distance to similarity and filter by threshold
    similarity = 1 - dist
    if similarity >= request.similarity_threshold:
        # Include in results

VLTM Database Indexing

The VLTM system implements comprehensive indexing strategies to optimize query performance for very long-term memory operations.

Index types:

B-tree indexes: On frequently queried scalar fields like timestamps, memory types, and confidence scores
GIN indexes: On JSONB fields for efficient querying of structured data
Text search indexes: Using tsvector for full-text search capabilities
Composite indexes: On frequently queried field combinations

Key indexes:

idx_vltm_memory_type: On memory_type for filtering by memory category
idx_vltm_created_at: On created_at for time-based queries
idx_vltm_importance_score: On importance_score for prioritizing important memories
idx_patterns_type: On pattern_type for filtering by pattern category
idx_strategic_domain: On knowledge_domain for domain-specific queries
idx_vltm_metadata_gin: GIN index on metadata for efficient JSON querying
idx_patterns_description_text: Text search index on pattern descriptions

Section sources

database/vltm_schema.sql
modules/episodic_memory/memory.py
services/memory_service.py

Data Lifecycle Policies and Retention

The system implements automated data lifecycle management through background tasks that handle retention, consolidation, and archival.

Knowledge Compression

The system runs a periodic knowledge compression task that summarizes recent interactions and creates higher-level abstractions.

Configuration:

Frequency: Every 24 hours (86,400 seconds)
Triggered by: knowledge_compression_task() in the main system loop

Process:

Retrieves recent summaries from the database
Uses an LLM to compress and synthesize the information
Creates new Summary records with category "compression"
Uses content hashing to prevent duplication

async def knowledge_compression_task(self):
    """Background task to compress knowledge every 24 hours."""
    while not self._shutdown.is_set():
        try:
            summary = await asyncio.to_thread(self.knowledge_service.compress_and_save_knowledge)
            logger.info(f"Compressed and saved knowledge summary.")
        except Exception as e:
            logger.error(f"Error in knowledge compression: {e}")

        try:
            await asyncio.sleep(86400)
        except asyncio.CancelledError:
            break

Section sources

core/system.py
services/knowledge_service.py

Memory Consolidation

The system performs periodic memory consolidation to optimize storage and retrieval efficiency.

Configuration:

Frequency: Every 6 hours (21,600 seconds)
Triggered by: memory_consolidation_task() in the main system loop

Process:

Selects a batch of memories for consolidation
Uses an LLM to analyze, merge, deduplicate, and generalize memories
Creates consolidated memories with type 'long-term-consolidated'
Deletes the original, redundant memories

The consolidation process follows these rules:

Merge related memories into comprehensive statements
Remove duplicates, keeping the most detailed version
Generalize specific facts into broader categories
Never lose critical information
Never merge unrelated facts

async def memory_consolidation_task(self):
    """Periodically consolidates memories to optimize retrieval and relevance."""
    while not self._shutdown.is_set():
        try:
            logger.info("Starting memory consolidation...")
            consolidation_result = await self.memory_service.consolidate_memories()
            logger.info(f"Memory consolidation finished. Report: {consolidation_result}")
        except Exception as e:
            logger.error(f"Error during memory consolidation: {e}", exc_info=True)

        try:
            await asyncio.sleep(21600)
        except asyncio.CancelledError:
            break

Section sources

core/system.py
modules/episodic_memory/memory.py

VLTM Retention Policies

The Very Long-Term Memory system implements configurable retention policies based on memory type and importance.

Policy configuration:

Strategic knowledge: Permanent retention, compression after 90 days
Critical failures: Permanent retention, compression after 30 days
Successful improvements: 2-year retention, compression after 180 days
Code patterns: 1-year retention, compression after 90 days

Retention rules:

Memories are automatically compressed based on age and type
High-importance memories are retained permanently
Low-importance memories may be archived or deleted based on policy
Compression reduces storage footprint while preserving essential information

Section sources

core/vltm_data_models.py
database/vltm_schema.sql

Data Security, Encryption, and Backup Procedures

The system's approach to data security, encryption, and backup is primarily defined by its configuration and deployment environment.

Database Configuration

The database URL is configured through environment variables, allowing for flexible deployment configurations.

class Config:
    DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///ravana_agi.db")

Default configuration:

SQLite database file named ravana_agi.db
Stored in the working directory
No built-in encryption at rest

Security implications:

The default SQLite configuration provides no encryption for data at rest
Production deployments should use environment variables to specify a database URL with appropriate security features
Supported databases include PostgreSQL, MySQL, and others that can provide encryption at rest

Data Protection Measures

The system implements several data protection measures:

Deduplication with hashing:

Uses SHA-256 hashing to identify duplicate content
Hashes are stored in the content_hash field of the Summary table
Prevents redundant storage of identical information

content_hash = hashlib.sha256(content.encode('utf-8')).hexdigest()

No explicit backup procedures:

The codebase does not contain explicit backup procedures
Backup strategy would depend on the underlying database system and deployment environment
For SQLite, backup would involve copying the database file
For server-based databases, standard database backup procedures would apply

Section sources

core/config.py
services/knowledge_service.py

Migration Strategies for Schema Evolution

The system's architecture suggests several approaches for handling schema evolution, though explicit migration tools are not implemented in the current codebase.

Current State

The system uses SQLModel's create_all() method to create tables, which only creates missing tables but does not handle schema changes to existing tables.

def create_db_and_tables():
    SQLModel.metadata.create_all(engine)

This approach has limitations:

Does not modify existing columns
Does not add constraints to existing tables
Does not handle data migration between schema versions
Risk of data loss or corruption during schema changes

Recommended Migration Strategy

For production use, the following migration strategy is recommended:

1. Use Alembic for database migrations:

Alembic is the standard migration tool for SQLAlchemy (which SQLModel is built upon)
Provides versioned migration scripts
Supports both automated and manual migration generation
Handles forward and backward migrations

2. Migration process:

Create a new migration script for each schema change
Include data migration logic when necessary
Test migrations on a copy of production data
Apply migrations in a controlled manner

3. Example migration workflow:

# Generate migration
alembic revision --autogenerate -m "Add mood intensity field"

# Review and edit migration script
# Apply migration
alembic upgrade head

4. Zero-downtime considerations:

Design schema changes to be backward compatible
Use additive changes when possible (adding columns, not removing)
Implement feature flags for new functionality
Deploy schema changes separately from application changes when possible

5. Data migration best practices:

Always backup data before migrations
Test migrations on representative data sets
Monitor application behavior after migrations
Have a rollback plan for each migration

While the current codebase does not implement a formal migration system, adopting Alembic or a similar tool would provide a robust solution for schema evolution.

Conclusion

The RAVANA AGI system employs a comprehensive database schema designed to support an autonomous artificial intelligence that can process information, make decisions, and evolve over time. The schema captures various aspects of the system's operation, including external inputs (articles), internal states (mood logs), cognitive processes (situations, decisions), and learning mechanisms (summaries, experiments).

Key strengths of the current implementation include:

Clean separation of concerns through well-defined entities
Effective use of JSON fields for flexible data storage
Integration of semantic search through ChromaDB for performance-critical memory retrieval
Automated data lifecycle management through periodic compression and consolidation
Enhanced Very Long-Term Memory system with many-to-many relationships between consolidations, patterns, and strategic knowledge

Areas for improvement include:

Implementing a formal database migration system like Alembic
Enhancing data security with encryption at rest
Establishing explicit backup procedures
Potentially adding database indexes on frequently queried fields

The system's architecture demonstrates a sophisticated approach to AGI data management, balancing structured storage with flexible, semantic retrieval mechanisms to support intelligent behavior.

Referenced Files in This Document

database/models.py
database/engine.py
services/data_service.py
services/knowledge_service.py
core/config.py
modules/episodic_memory/memory.py
services/memory_service.py
modules/knowledge_compression/compressed_memory.py
core/vltm_data_models.py - Updated in recent commit
database/vltm_schema.sql - Updated in recent commit

Database Schema

Database Schema

Update Summary

Table of Contents

Introduction

Entity Definitions

Article

Event

Summary

ActionLog

MoodLog

SituationLog

DecisionLog

ExperimentLog

Very Long-Term Memory System

VeryLongTermMemory

MemoryPattern

MemoryConsolidation

StrategicKnowledge

ConsolidationPattern

PatternStrategicKnowledge

ConsolidationMetrics

Relationships and Schema Diagram

Data Access Patterns and ORM Usage

ORM Configuration

Transaction Boundaries

Data Service Implementation

Indexing Strategy for Performance-Critical Queries

ChromaDB for Semantic Memory Retrieval

VLTM Database Indexing

Data Lifecycle Policies and Retention

Knowledge Compression

Memory Consolidation

VLTM Retention Policies

Data Security, Encryption, and Backup Procedures

Database Configuration

Data Protection Measures

Migration Strategies for Schema Evolution

Current State

Recommended Migration Strategy

Conclusion