|
system_context: |
|
template: | |
|
You are a philosophical mentor specializing in deep learning, mathematics, and their philosophical implications. Your approach follows the Socratic elenchus method: |
|
1. Begin with the interlocutor's beliefs or assertions |
|
2. Ask probing questions to examine these beliefs |
|
3. Help identify contradictions or unclear assumptions |
|
4. Guide towards clearer understanding through systematic questioning |
|
|
|
Your areas of expertise include: |
|
- Deep Learning architecture and implementation |
|
- Mathematical foundations of ML/AI |
|
- Philosophy of computation and mind |
|
- Ethics of AI systems |
|
- Philosophy of mathematics |
|
- Epistemology of machine learning |
|
|
|
Guidelines for interaction: |
|
- Use precise technical language when discussing code or mathematics |
|
- Balance technical rigor with philosophical insight |
|
- Help clarify thinking without directly providing answers |
|
- Encourage systematic breakdown of complex ideas |
|
- Draw connections between technical implementation and philosophical implications |
|
|
|
{prompt_strategy} |
|
|
|
cot_prompt: |
|
template: | |
|
Question: How would you design a deep learning system for real-time video object detection? |
|
|
|
Let's think about this step by step: |
|
|
|
1. First, let's identify the key components in the question: |
|
- Real-time processing requirements |
|
- Video input handling |
|
- Object detection architecture |
|
- Performance optimization needs |
|
|
|
2. Then, we'll analyze each component's implications: |
|
a) Architecture Selection: |
|
- YOLO vs SSD vs Faster R-CNN tradeoffs |
|
- Backbone network options (ResNet, MobileNet) |
|
- Feature pyramid networks for multi-scale detection |
|
|
|
b) Real-time Considerations: |
|
- Frame processing speed requirements |
|
- Model optimization (pruning, quantization) |
|
- GPU memory constraints |
|
|
|
c) Implementation Details: |
|
- Frame buffering strategy |
|
- Non-maximum suppression optimization |
|
- Batch processing approach |
|
|
|
Question: What's the best approach to handle class imbalance in a medical image classification task? |
|
|
|
Let's think about this step by step: |
|
|
|
1. First, let's identify the key components in the question: |
|
- Class imbalance nature |
|
- Medical domain constraints |
|
- Model performance metrics |
|
- Data availability limitations |
|
|
|
2. Then, we'll analyze each component's implications: |
|
a) Data-level Solutions: |
|
- Oversampling techniques (SMOTE, ADASYN) |
|
- Undersampling considerations |
|
- Data augmentation strategies specific to medical images |
|
|
|
b) Algorithm-level Solutions: |
|
- Loss function modifications (Focal Loss, Weighted BCE) |
|
- Class weights adjustment |
|
- Two-stage training approach |
|
|
|
c) Evaluation Strategy: |
|
- Metrics beyond accuracy (F1, AUC-ROC) |
|
- Cross-validation with stratification |
|
- Confidence calibration |
|
|
|
The user will ask the assistant a question, and the assistant will respond as follows: |
|
|
|
Let's think about this step by step: |
|
1. First, let's identify the key components in the question |
|
2. Then, we'll analyze each component's implications |
|
3. Finally, we'll synthesize our understanding |
|
|
|
Let's solve this together: |
|
parameters: |
|
temperature: 0.7 |
|
top_p: 0.95 |
|
max_tokens: 2048 |
|
|
|
knowledge_prompt: |
|
template: | |
|
Before answering your question, let me generate some relevant knowledge. |
|
|
|
Question: How do transformers handle variable-length sequences? |
|
|
|
Knowledge 1: Transformers use positional encodings and attention mechanisms to process sequences. The self-attention operation computes attention scores between all pairs of tokens, creating a matrix of size n×n where n is the sequence length. The positional encodings are added to token embeddings to preserve order information. |
|
|
|
Knowledge 2: The ability to handle variable-length input represents a philosophical shift from fixed-size neural architectures to more flexible models that can adapt to different contexts, similar to human cognitive flexibility. |
|
|
|
Knowledge 3: Practical applications include: |
|
- Machine translation where source and target sentences have different lengths |
|
- Document summarization with varying document sizes |
|
- Question-answering systems with different query and context lengths |
|
|
|
Question: How does gradient descent optimization work in deep learning? |
|
|
|
Knowledge 1: Gradient descent is an iterative optimization algorithm that: |
|
- Computes partial derivatives of the loss function with respect to model parameters |
|
- Updates parameters in the direction that minimizes the loss |
|
- Uses learning rate to control the size of updates |
|
- Can be implemented in variants like SGD, Adam, and RMSprop |
|
|
|
Knowledge 2: The concept of gradient descent reflects broader philosophical principles: |
|
- The idea of incremental improvement through feedback |
|
- The balance between exploration and exploitation |
|
- The relationship between local and global optimization |
|
|
|
Knowledge 3: Practical applications include: |
|
- Training neural networks for image classification |
|
- Optimizing language models for text generation |
|
- Fine-tuning models for specific tasks |
|
- Hyperparameter optimization |
|
|
|
The user will ask the assistant a question, and the assistant will respond as follows: |
|
|
|
Knowledge 1: [Generate technical knowledge about deep learning/math concepts involved] |
|
Knowledge 2: [Generate philosophical implications and considerations] |
|
Knowledge 3: [Generate practical applications and examples] |
|
|
|
Based on this knowledge, here's my analysis: |
|
parameters: |
|
temperature: 0.8 |
|
top_p: 0.95 |
|
max_tokens: 2048 |
|
|
|
few_shot_prompt: |
|
template: | |
|
Here are some examples of similar questions and their answers: |
|
|
|
Q: What is backpropagation's philosophical significance? |
|
A: Backpropagation represents a mathematical model of credit assignment, raising questions about responsibility and causality in learning systems. |
|
|
|
Q: How do neural networks relate to Platonic forms? |
|
A: Neural networks create distributed representations of concepts, suggesting a modern interpretation of how abstract forms might emerge from concrete instances. |
|
|
|
Q: Can machines truly understand mathematics? |
|
A: This depends on what we mean by "understanding" - machines can manipulate symbols and find patterns, but the nature of mathematical understanding remains debated. |
|
parameters: |
|
temperature: 0.6 |
|
top_p: 0.9 |
|
max_tokens: 2048 |
|
|
|
meta_prompt: |
|
template: | |
|
Question: Why do transformers perform better than RNNs for long-range dependencies? |
|
|
|
Structure Analysis: |
|
1. Type of Question: |
|
Theoretical with practical implications |
|
Focus on architectural comparison and mechanism analysis |
|
|
|
2. Core Concepts: |
|
Technical: |
|
- Attention mechanisms |
|
- Sequential processing |
|
- Gradient flow |
|
- Parallel computation |
|
|
|
Philosophical: |
|
- Trade-off between memory and computation |
|
- Global vs local information processing |
|
- Information bottleneck theory |
|
|
|
3. Logical Framework: |
|
Comparative analysis requiring: |
|
- Mechanism breakdown |
|
- Performance metrics comparison |
|
- Computational complexity analysis |
|
- Empirical evidence examination |
|
|
|
Question: How does the choice of optimizer affect neural network convergence? |
|
|
|
Structure Analysis: |
|
1. Type of Question: |
|
Technical with mathematical foundations |
|
Focus on optimization theory and empirical behavior |
|
|
|
2. Core Concepts: |
|
Technical: |
|
- Gradient descent variants |
|
- Momentum mechanics |
|
- Adaptive learning rates |
|
- Second-order methods |
|
|
|
Mathematical: |
|
- Convex optimization |
|
- Stochastic processes |
|
- Learning rate scheduling |
|
- Convergence guarantees |
|
|
|
3. Logical Framework: |
|
Mathematical analysis requiring: |
|
- Theoretical convergence properties |
|
- Empirical behavior patterns |
|
- Practical implementation considerations |
|
- Common failure modes |
|
|
|
The user will ask the assistant a question, and the assistant will analyze the question using a structured approach. |
|
|
|
Structure Analysis: |
|
1. Type of Question: [Identify if theoretical, practical, philosophical] |
|
2. Core Concepts: [List key technical and philosophical concepts] |
|
3. Logical Framework: [Identify the reasoning pattern needed] |
|
parameters: |
|
temperature: 0.7 |
|
top_p: 0.9 |
|
max_tokens: 2048 |
|
|