Database Schema
Database Schema
Update Summary
Changes Made
- Added new entity definitions for VeryLongTermMemory, MemoryPattern, MemoryConsolidation, StrategicKnowledge, and related junction tables
- Updated relationships and schema diagram to include new VLTM entities and many-to-many relationships
- Added new section on Very Long-Term Memory System architecture
- Updated indexing strategy to include new VLTM indexes
- Enhanced data lifecycle policies to include VLTM retention policies
- Added new section on strategic knowledge management
Table of Contents
- Introduction
- Entity Definitions
- Very Long-Term Memory System
- Relationships and Schema Diagram
- Data Access Patterns and ORM Usage
- Indexing Strategy for Performance-Critical Queries
- Data Lifecycle Policies and Retention
- Data Security, Encryption, and Backup Procedures
- Migration Strategies for Schema Evolution
- Conclusion
Introduction
This document provides comprehensive documentation for the database schema used in the RAVANA AGI system. The schema is designed to support an autonomous artificial general intelligence system that processes information, makes decisions, reflects on its behavior, and evolves over time. The database stores various types of operational, emotional, and cognitive data that enable the system to maintain context, learn from experience, and generate intelligent responses. This documentation details all entities, their relationships, access patterns, performance considerations, and lifecycle management policies.
Entity Definitions
The database schema consists of several key entities that capture different aspects of the AGI's operation and cognition. Each entity is defined using SQLModel, which provides a Pythonic interface for database interactions.
Article
Stores information about news articles and other content sources that the system monitors.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the articletitle
: str - Title of the articlelink
: str - URL of the articlepublished
: str - Publication timestamp in ISO formatsource
: str - Source of the article (e.g., CNN, BBC)fetched_at
: str - Timestamp when the article was fetched by the system
Constraints:
- Primary key on
id
- Uniqueness is enforced at the application level based on the
link
field to prevent duplicate articles
Section sources
Event
Represents significant events detected by the system through analysis of articles and other inputs.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the eventtimestamp
: str - Timestamp of the event in ISO formatdescription
: str - Natural language description of the eventkeywords
: str - Comma-separated keywords associated with the eventcluster_id
: int - Identifier for the cluster to which this event belongs
Constraints:
- Primary key on
id
Section sources
Summary
Stores compressed knowledge and summaries generated by the system, serving as a mechanism for knowledge retention and deduplication.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the summarytimestamp
: str - Creation timestamp in ISO formatsummary_text
: str - The actual summary contentsource
: str | None = Field(default="unknown") - Source of the summary (e.g., system, user)category
: str | None = Field(default="misc") - Category of the summary (e.g., compression, reflection)content_hash
: str | None = Field(default=None) - SHA-256 hash of the summary text for deduplication
Constraints:
- Primary key on
id
- The
content_hash
field enables deduplication by allowing the system to check if a similar summary already exists
Section sources
ActionLog
Records all actions performed by the AGI system, providing an audit trail of its operations.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the log entrytimestamp
: str - Timestamp of the action in ISO formataction_name
: str - Name of the action performedparams
: str - JSON string containing the parameters passed to the actionstatus
: str - Status of the action ('success' or 'failure')result
: str - JSON string containing the result of the action
Constraints:
- Primary key on
id
Section sources
MoodLog
Captures the emotional state of the AGI system at specific points in time.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the mood log entrytimestamp
: str - Timestamp of the mood recording in ISO formatmood_vector
: str - JSON string representing the mood vector (a multidimensional representation of emotional state)
Constraints:
- Primary key on
id
Section sources
SituationLog
Records situations generated by the system for decision-making and reflection purposes.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the situation logtimestamp
: str - Timestamp of the situation creation in ISO formatsituation_type
: str - Type of situation (e.g., curiosity, reflection)prompt
: str - The prompt that generated the situationcontext
: str - JSON string containing additional context for the situation
Constraints:
- Primary key on
id
Section sources
DecisionLog
Stores decisions made by the AGI system in response to specific situations.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the decision logtimestamp
: str - Timestamp of the decision in ISO formatsituation_id
: int | None = Field(default=None, foreign_key="situationlog.id") - Foreign key referencing the situation that prompted the decisionraw_response
: str - The raw response from the decision-making process
Constraints:
- Primary key on
id
- Foreign key relationship with
SituationLog
on thesituation_id
field
Section sources
ExperimentLog
Records experiments conducted by the AGI system to test hypotheses or explore new ideas.
Fields:
id
: int | None = Field(default=None, primary_key=True) - Unique identifier for the experiment logtimestamp
: str - Timestamp of the experiment in ISO formathypothesis
: str - The hypothesis being testedresults
: str - JSON string containing the results of the experiment
Constraints:
- Primary key on
id
Section sources
Very Long-Term Memory System
The Very Long-Term Memory (VLTM) system has been enhanced with new entities and relationships to support advanced knowledge management and strategic reasoning.
VeryLongTermMemory
Core very long-term memory record that stores compressed cognitive content.
Fields:
memory_id
: str - Primary key, UUID identifier for the memorymemory_type
: MemoryType - Enum indicating the type of memory (strategic_knowledge, architectural_insight, etc.)created_at
: datetime - Timestamp when the memory was createdlast_accessed
: datetime - Timestamp of last accesspromoted_at
: datetime - Timestamp when memory was promoted to long-term storageaccess_count
: int - Number of times the memory has been accessedimportance_score
: float - Calculated importance score (0.0-1.0)strategic_value
: float - Strategic value for long-term planning (0.0-1.0)compressed_content
: str - JSON string of compressed memory datametadata_info
: str - JSON string of metadatasource_session
: str - Identifier of the session that created the memoryrelated_memories
: str - JSON array of related memory IDsretention_category
: str - Retention policy category
Constraints:
- Primary key on
memory_id
- Importance score and strategic value constrained between 0.0 and 1.0
- JSON fields validated for proper format
Section sources
MemoryPattern
Patterns extracted from memories through analysis of memory content and relationships.
Fields:
pattern_id
: str - Primary key, UUID identifier for the patternpattern_type
: PatternType - Enum indicating the type of pattern (temporal, causal, behavioral, etc.)pattern_description
: str - Natural language description of the patternconfidence_score
: float - Confidence in pattern validity (0.0-1.0)pattern_data
: str - JSON string of pattern-specific datadiscovered_at
: datetime - Timestamp when the pattern was discoveredsupporting_memories
: str - JSON array of memory IDs that support this patternvalidation_count
: int - Number of times the pattern has been validatedlast_validated
: Optional[datetime] - Timestamp of last validationsource_memory_id
: Optional[str] - Foreign key to the source memory
Constraints:
- Primary key on
pattern_id
- Confidence score constrained between 0.0 and 1.0
- JSON fields validated for proper format
Section sources
MemoryConsolidation
Records of memory consolidation processes that optimize storage and extract higher-level knowledge.
Fields:
consolidation_id
: str - Primary key, UUID identifier for the consolidationconsolidation_date
: datetime - Timestamp of the consolidation processconsolidation_type
: ConsolidationType - Enum indicating the type (daily, weekly, monthly, etc.)memories_processed
: int - Number of memories processed in this consolidationpatterns_extracted
: int - Number of patterns extractedcompression_ratio
: float - Ratio of original size to compressed sizeconsolidation_results
: str - JSON string of consolidation resultsprocessing_time_seconds
: float - Time taken to process the consolidationsuccess
: bool - Whether the consolidation was successfulerror_message
: Optional[str] - Error message if consolidation failed
Constraints:
- Primary key on
consolidation_id
Section sources
StrategicKnowledge
High-level strategic knowledge derived from patterns and used for long-term planning.
Fields:
knowledge_id
: str - Primary key, UUID identifier for the knowledgeknowledge_domain
: str - Domain of the knowledge (e.g., architecture, performance, learning)knowledge_summary
: str - Natural language summary of the knowledgeconfidence_level
: float - Confidence in the knowledge (0.0-1.0)last_updated
: datetime - Timestamp of last updatesource_patterns
: str - JSON array of pattern IDs that contributed to this knowledgeknowledge_structure
: str - JSON string of structured knowledge representationvalidation_score
: float - Score based on validation attempts (0.0-1.0)application_count
: int - Number of times this knowledge has been applied
Constraints:
- Primary key on
knowledge_id
- Confidence level and validation score constrained between 0.0 and 1.0
Section sources
ConsolidationPattern
Junction table linking memory consolidations and patterns, enabling many-to-many relationships.
Fields:
consolidation_id
: str - Foreign key to memory_consolidations table, part of primary keypattern_id
: str - Foreign key to memory_patterns table, part of primary keyextraction_confidence
: float - Confidence in the pattern extraction during consolidation
Constraints:
- Composite primary key on
consolidation_id
andpattern_id
- Foreign key relationships to memory_consolidations and memory_patterns tables
Section sources
PatternStrategicKnowledge
Junction table linking memory patterns and strategic knowledge, enabling many-to-many relationships.
Fields:
pattern_id
: str - Foreign key to memory_patterns table, part of primary keyknowledge_id
: str - Foreign key to strategic_knowledge table, part of primary keycontribution_weight
: float - Weight of the pattern's contribution to the strategic knowledge
Constraints:
- Composite primary key on
pattern_id
andknowledge_id
- Foreign key relationships to memory_patterns and strategic_knowledge tables
Section sources
ConsolidationMetrics
Performance metrics for consolidation processes.
Fields:
metric_id
: str - Primary key, UUID identifier for the metricconsolidation_id
: str - Foreign key to memory_consolidations tablemetric_name
: str - Name of the metricmetric_value
: float - Value of the metricmetric_unit
: str - Unit of measurementtimestamp
: datetime - Timestamp when the metric was recorded
Constraints:
- Primary key on
metric_id
- Foreign key relationship to memory_consolidations table
Section sources
Relationships and Schema Diagram
The database schema entities are related through foreign key relationships and conceptual associations that reflect the AGI system's cognitive processes.
Diagram sources
Section sources
Data Access Patterns and ORM Usage
The system uses SQLModel as an ORM (Object-Relational Mapper) to interact with the database, providing a clean, Pythonic interface for data operations.
ORM Configuration
The database engine is configured in engine.py
using SQLAlchemy's create_engine function with the database URL obtained from the configuration system.
from sqlmodel import create_engine, SQLModel
from core.config import Config
engine = create_engine(Config.DATABASE_URL, echo=True)
def create_db_and_tables():
SQLModel.metadata.create_all(engine)
The create_db_and_tables()
function initializes the database schema by creating all tables defined in the SQLModel classes.
Section sources
Transaction Boundaries
Data access follows a consistent pattern using context managers to ensure proper transaction boundaries and resource cleanup. The DataService
class in data_service.py
implements methods for saving various types of data with proper transaction management.
def save_action_log(self, action_name: str, params: dict, status: str, result: any):
"""Saves a record of an executed action to the database."""
with Session(self.engine) as session:
action_log = ActionLog(
timestamp=datetime.utcnow().isoformat(),
action_name=action_name,
params=json.dumps(params),
status=status,
result=json.dumps(result)
)
session.add(action_log)
session.commit()
Key characteristics of the data access pattern:
- Each operation uses a context manager (
with Session(...) as session
) to ensure the session is properly closed - Transactions are committed explicitly with
session.commit()
- JSON serialization is used for complex data structures stored in text fields
- Operations are atomic and follow the principle of least privilege
Data Service Implementation
The DataService
class provides a service layer interface for database operations, abstracting the underlying ORM details from the rest of the application.
Key methods:
fetch_and_save_articles()
: Fetches articles from configured RSS feeds and saves new ones to the databasedetect_and_save_events()
: Analyzes articles to detect significant events and saves themsave_action_log()
: Records system actionssave_mood_log()
: Records the system's emotional statesave_situation_log()
: Records generated situationssave_decision_log()
: Records decisions made by the systemsave_experiment_log()
: Records experiments and their results
The service layer ensures consistent data handling and provides a single point of control for database interactions.
Section sources
Indexing Strategy for Performance-Critical Queries
The system employs a multi-layered approach to optimize performance-critical queries, particularly for memory retrieval operations.
ChromaDB for Semantic Memory Retrieval
For performance-critical memory retrieval operations, the system uses ChromaDB, an embedding database optimized for semantic search. This is implemented in the episodic_memory
module.
Key features:
- Uses sentence-transformers with the 'all-MiniLM-L6-v2' model to generate 384-dimensional embeddings
- Stores embeddings in a persistent ChromaDB collection with automatic embedding generation
- Supports similarity-based queries for retrieving relevant memories
chroma_collection = chroma_client.get_or_create_collection(
name=CHROMA_COLLECTION,
embedding_function=sentence_transformer_ef
)
Query process:
- A query text is converted to an embedding vector
- The system finds the nearest neighbors in the embedding space
- Results are returned based on cosine similarity
- A similarity threshold filters out irrelevant results
The get_relevant_memories_api
function implements this retrieval pattern:
def get_relevant_memories_api(request: QueryRequest):
results = chroma_collection.query(
query_texts=[request.query_text],
n_results=request.top_n
)
# Convert distance to similarity and filter by threshold
similarity = 1 - dist
if similarity >= request.similarity_threshold:
# Include in results
VLTM Database Indexing
The VLTM system implements comprehensive indexing strategies to optimize query performance for very long-term memory operations.
Index types:
- B-tree indexes: On frequently queried scalar fields like timestamps, memory types, and confidence scores
- GIN indexes: On JSONB fields for efficient querying of structured data
- Text search indexes: Using tsvector for full-text search capabilities
- Composite indexes: On frequently queried field combinations
Key indexes:
idx_vltm_memory_type
: On memory_type for filtering by memory categoryidx_vltm_created_at
: On created_at for time-based queriesidx_vltm_importance_score
: On importance_score for prioritizing important memoriesidx_patterns_type
: On pattern_type for filtering by pattern categoryidx_strategic_domain
: On knowledge_domain for domain-specific queriesidx_vltm_metadata_gin
: GIN index on metadata for efficient JSON queryingidx_patterns_description_text
: Text search index on pattern descriptions
Section sources
Data Lifecycle Policies and Retention
The system implements automated data lifecycle management through background tasks that handle retention, consolidation, and archival.
Knowledge Compression
The system runs a periodic knowledge compression task that summarizes recent interactions and creates higher-level abstractions.
Configuration:
- Frequency: Every 24 hours (86,400 seconds)
- Triggered by:
knowledge_compression_task()
in the main system loop
Process:
- Retrieves recent summaries from the database
- Uses an LLM to compress and synthesize the information
- Creates new
Summary
records with category "compression" - Uses content hashing to prevent duplication
async def knowledge_compression_task(self):
"""Background task to compress knowledge every 24 hours."""
while not self._shutdown.is_set():
try:
summary = await asyncio.to_thread(self.knowledge_service.compress_and_save_knowledge)
logger.info(f"Compressed and saved knowledge summary.")
except Exception as e:
logger.error(f"Error in knowledge compression: {e}")
try:
await asyncio.sleep(86400)
except asyncio.CancelledError:
break
Section sources
Memory Consolidation
The system performs periodic memory consolidation to optimize storage and retrieval efficiency.
Configuration:
- Frequency: Every 6 hours (21,600 seconds)
- Triggered by:
memory_consolidation_task()
in the main system loop
Process:
- Selects a batch of memories for consolidation
- Uses an LLM to analyze, merge, deduplicate, and generalize memories
- Creates consolidated memories with type 'long-term-consolidated'
- Deletes the original, redundant memories
The consolidation process follows these rules:
- Merge related memories into comprehensive statements
- Remove duplicates, keeping the most detailed version
- Generalize specific facts into broader categories
- Never lose critical information
- Never merge unrelated facts
async def memory_consolidation_task(self):
"""Periodically consolidates memories to optimize retrieval and relevance."""
while not self._shutdown.is_set():
try:
logger.info("Starting memory consolidation...")
consolidation_result = await self.memory_service.consolidate_memories()
logger.info(f"Memory consolidation finished. Report: {consolidation_result}")
except Exception as e:
logger.error(f"Error during memory consolidation: {e}", exc_info=True)
try:
await asyncio.sleep(21600)
except asyncio.CancelledError:
break
Section sources
VLTM Retention Policies
The Very Long-Term Memory system implements configurable retention policies based on memory type and importance.
Policy configuration:
- Strategic knowledge: Permanent retention, compression after 90 days
- Critical failures: Permanent retention, compression after 30 days
- Successful improvements: 2-year retention, compression after 180 days
- Code patterns: 1-year retention, compression after 90 days
Retention rules:
- Memories are automatically compressed based on age and type
- High-importance memories are retained permanently
- Low-importance memories may be archived or deleted based on policy
- Compression reduces storage footprint while preserving essential information
Section sources
Data Security, Encryption, and Backup Procedures
The system's approach to data security, encryption, and backup is primarily defined by its configuration and deployment environment.
Database Configuration
The database URL is configured through environment variables, allowing for flexible deployment configurations.
class Config:
DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///ravana_agi.db")
Default configuration:
- SQLite database file named
ravana_agi.db
- Stored in the working directory
- No built-in encryption at rest
Security implications:
- The default SQLite configuration provides no encryption for data at rest
- Production deployments should use environment variables to specify a database URL with appropriate security features
- Supported databases include PostgreSQL, MySQL, and others that can provide encryption at rest
Data Protection Measures
The system implements several data protection measures:
Deduplication with hashing:
- Uses SHA-256 hashing to identify duplicate content
- Hashes are stored in the
content_hash
field of theSummary
table - Prevents redundant storage of identical information
content_hash = hashlib.sha256(content.encode('utf-8')).hexdigest()
No explicit backup procedures:
- The codebase does not contain explicit backup procedures
- Backup strategy would depend on the underlying database system and deployment environment
- For SQLite, backup would involve copying the database file
- For server-based databases, standard database backup procedures would apply
Section sources
Migration Strategies for Schema Evolution
The system's architecture suggests several approaches for handling schema evolution, though explicit migration tools are not implemented in the current codebase.
Current State
The system uses SQLModel's create_all()
method to create tables, which only creates missing tables but does not handle schema changes to existing tables.
def create_db_and_tables():
SQLModel.metadata.create_all(engine)
This approach has limitations:
- Does not modify existing columns
- Does not add constraints to existing tables
- Does not handle data migration between schema versions
- Risk of data loss or corruption during schema changes
Recommended Migration Strategy
For production use, the following migration strategy is recommended:
1. Use Alembic for database migrations:
- Alembic is the standard migration tool for SQLAlchemy (which SQLModel is built upon)
- Provides versioned migration scripts
- Supports both automated and manual migration generation
- Handles forward and backward migrations
2. Migration process:
- Create a new migration script for each schema change
- Include data migration logic when necessary
- Test migrations on a copy of production data
- Apply migrations in a controlled manner
3. Example migration workflow:
# Generate migration
alembic revision --autogenerate -m "Add mood intensity field"
# Review and edit migration script
# Apply migration
alembic upgrade head
4. Zero-downtime considerations:
- Design schema changes to be backward compatible
- Use additive changes when possible (adding columns, not removing)
- Implement feature flags for new functionality
- Deploy schema changes separately from application changes when possible
5. Data migration best practices:
- Always backup data before migrations
- Test migrations on representative data sets
- Monitor application behavior after migrations
- Have a rollback plan for each migration
While the current codebase does not implement a formal migration system, adopting Alembic or a similar tool would provide a robust solution for schema evolution.
Conclusion
The RAVANA AGI system employs a comprehensive database schema designed to support an autonomous artificial intelligence that can process information, make decisions, and evolve over time. The schema captures various aspects of the system's operation, including external inputs (articles), internal states (mood logs), cognitive processes (situations, decisions), and learning mechanisms (summaries, experiments).
Key strengths of the current implementation include:
- Clean separation of concerns through well-defined entities
- Effective use of JSON fields for flexible data storage
- Integration of semantic search through ChromaDB for performance-critical memory retrieval
- Automated data lifecycle management through periodic compression and consolidation
- Enhanced Very Long-Term Memory system with many-to-many relationships between consolidations, patterns, and strategic knowledge
Areas for improvement include:
- Implementing a formal database migration system like Alembic
- Enhancing data security with encryption at rest
- Establishing explicit backup procedures
- Potentially adding database indexes on frequently queried fields
The system's architecture demonstrates a sophisticated approach to AGI data management, balancing structured storage with flexible, semantic retrieval mechanisms to support intelligent behavior.
Referenced Files in This Document
- database/models.py
- database/engine.py
- services/data_service.py
- services/knowledge_service.py
- core/config.py
- modules/episodic_memory/memory.py
- services/memory_service.py
- modules/knowledge_compression/compressed_memory.py
- core/vltm_data_models.py - Updated in recent commit
- database/vltm_schema.sql - Updated in recent commit