doc/AGDB/1_AGDB_Project_Outline.md · SlappAI/Singularity at main

1. Node Schema in JSON File

Defining the Schema: The JSON schema would indeed serve as the blueprint for each node type, mapping attributes and features for the dataset. This schema could be similar to a CSV header row, where each column represents an attribute or feature. So, for time series data, you'd have columns for Unix time, feature1, feature2, etc.
Embedding the Schema: The schema definition could be embedded at the top of the JSON file, allowing the AGDB to recognize the structure without relying on external definitions. This would allow each node to parse its data dynamically based on the schema without hardcoding specific attributes.
Populating Nodes from CSV: With a structured CSV, the AGDB could read each row, map it according to the schema, and create a node. The node's JSON representation would have a key for each attribute, making it easy to look up specific data points by attribute.
Defining Feature Weights: Each feature or attribute in the schema could include metadata like weights, importance, or even data types (e.g., numerical, categorical). This would enable AGDB to know how to interpret and prioritize different features without needing additional processing steps.

2. Storage Structure: NetworkX vs. JSON File

NetworkX vs. JSON: Storing the entire graph in JSON makes it portable and easy to manipulate programmatically. However, loading the JSON data into NetworkX (or a similar library) might be beneficial for actual computations or complex graph operations.
Option 1 - JSON as Primary Storage: Store the data in JSON, and use NetworkX as a secondary tool for visualization and complex pathfinding. This keeps your data lightweight and portable while enabling more powerful graph analysis when needed.
Option 2 - Native NetworkX Storage: Store the data directly in NetworkX’s native structure, using Python pickling or another serialization format. This might increase complexity but could be optimized for fast querying.
Hybrid Approach: JSON for static storage, with dynamic loading into NetworkX as required. This would give you the benefits of both systems without being locked into one.

3. Query Language Development

High-Level Query Structure: Your high-level query structure should start with a straightforward approach that breaks down queries into essential components. For example:
- get-node [type] -path [time domain]/[timestamp] - Returns a node at a specific timestamp in a specific domain.
- get-node [type] -range [start timestamp] to [end timestamp] - Allows for querying a range of nodes based on time.
- get-attribute [node] -attribute [attribute_name] - Pulls a specific attribute from a node.
Dynamic Time Navigation: For example, if you wanted a node at 10:45 but the closest checkpoint is 10:40, you could define a traversal rule in the query language to navigate to 10:40 and add +5 to reach 10:45. This could be expressed as:
- navigate-time [start_time] +[interval] or navigate-time [start_time] -[interval]
Attribute-Based Filtering: Your query language should support filtering nodes based on attributes. For instance:
- filter-nodes [type] where [attribute] = [value] - Filters nodes where a specific attribute meets a condition, enabling complex attribute-based querying.
Feature Weighting in Queries: The query language could include parameters for feature weights, allowing users to emphasize specific attributes or relationships in the query results. This could look like:
- get-node [type] -path [time] -weights [attribute1:weight1, attribute2:weight2]
Combining Queries: Building in support for combining queries (e.g., finding nodes that match both time and attribute filters) will make your query language far more powerful. Combining queries would be key for real-world applications, enabling more nuanced data exploration.

4. Data and Query Examples

Example 1: Simple Time-Based Query
- Query: get-node ts -path BTC/2024/11/04/10/45
- Process: The system checks for the nearest checkpoint at 10:40, navigates forward by +5 minutes, and returns the data for 10:45.
- Outcome: You get the node data for BTC at the specified time.
Example 2: Attribute Query with Filtering
- Query: filter-nodes ts where volume > 1000000
- Process: The system scans nodes in the time series to find those with a volume attribute over one million.
- Outcome: Returns all time nodes where volume exceeds the threshold, potentially for further analysis.
Example 3: Weighted Attribute Query
- Query: get-node ts -path BTC/2024/11/04/10/45 -weights {price: 0.8, volume: 0.2}
- Process: The query prioritizes the “price” attribute by 80% and “volume” by 20%, returning the node data with adjusted attribute significance.
- Outcome: A context-rich node that highlights prioritized attributes in the results.

5. AGDB Project Outline

Define Schema and Standards: Create a standard JSON schema for node attributes and feature weighting, as well as rules for predefined relationships.
Establish Query Language Syntax: Finalize the syntax and structure of your query language, detailing basic commands, syntax for filtering, weighting, and time navigation.
Develop Query Parsing and Execution: Build a parser that interprets query strings and executes them against the JSON or NetworkX data. This would include handling time navigation, filtering, and weighting.
Documentation and Example Use Cases: Document each query type with clear examples and descriptions. Include example use cases, like time series analysis for financial data, patient data in healthcare, or any domain-specific application.
Test Cases and Validation: Develop test cases to ensure that the queries return expected results and that performance is optimized for various datasets.