Flow vs Task Architecture

Conceptual Hierarchy

┌─────────────────────────────────────────────────────────┐
│                      FLOW                               │
│  (User-facing concept: Complete workflow)               │
│                                                         │
│  Example: "Breast Cancer Detection Flow"                │
│                                                         │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌──────┐│
│  │  Step 1 │───▶│  Step 2 │───▶│  Step 3 │───▶│Step 4││
│  │  Data   │    │Training │    │  Model  │    │Deploy││
│  └─────────┘    └─────────┘    └─────────┘    └──────┘│
│       │              │              │             │     │
└───────┼──────────────┼──────────────┼─────────────┼─────┘
        │              │              │             │
        ▼              ▼              ▼             ▼
    ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐
    │ Task 1 │    │ Task 2 │    │ Task 3 │    │ Task 4 │
    │Node A  │    │Node A  │    │Center  │    │Center  │
    └────────┘    └────────┘    └────────┘    └────────┘
    ┌────────┐    ┌────────┐                            
    │ Task 1 │    │ Task 2 │     (Backend execution units)
    │Node B  │    │Node B  │                            
    └────────┘    └────────┘

Key Differences

Flow

What users create: “I want to train a breast cancer detection model”
Visible in UI: Progress bars, step visualization, results
Business goal oriented: Complete end-to-end workflow
Contains multiple steps: Data → Process → Output

Task

What system executes: Computational units running on nodes
Backend concept: Not directly visible to users
Technical execution: Actual work being done
Atomic units: Single responsibility (train, preprocess, aggregate)

Example Breakdown

User Creates: “Lung Cancer Detection Flow”

Flow:
  name: "Lung Cancer Detection Flow"
  type: "training"
  config:
    datasets: ["hospital-a-lung", "hospital-b-lung"]  # Configuration
    algorithm: "FedAvg"                                # Configuration
    epochs: 100                                        # Configuration
  steps:
    - name: "Data Preprocessing"
      type: "process"
      
    - name: "Federated Training"
      type: "process"
        
    - name: "Model Validation"
      type: "process"
      
    - name: "Model Packaging"
      type: "output"

System Executes: Tasks

# Generated from Flow Step 1: Data Preprocessing
Task_1A:
  type: "image-normalization"
  node: "hospital-a-edge"
  input: "hospital-a-lung"
  params: {target_size: [512, 512], normalize: true}
  
Task_1B:
  type: "image-normalization"
  node: "hospital-b-edge"
  input: "hospital-b-lung"
  params: {target_size: [512, 512], normalize: true}

Task_1C:
  type: "feature-extraction"
  node: "hospital-a-edge"
  input: Task_1A.output
  params: {model: "resnet50-imagenet"}

Task_1D:
  type: "feature-extraction"
  node: "hospital-b-edge"
  input: Task_1B.output
  params: {model: "resnet50-imagenet"}

# Generated from Flow Step 2: Federated Training
Task_2A:
  type: "local-model-training"
  node: "hospital-a-edge"
  input: Task_1C.output
  algorithm: "FedAvg"
  rounds: 1-100
  
Task_2B:
  type: "local-model-training"
  node: "hospital-b-edge"
  input: Task_1D.output
  algorithm: "FedAvg"
  rounds: 1-100
  
Task_2C:
  type: "model-aggregation"
  node: "central-server"
  algorithm: "FedAvg"
  inputs: [Task_2A.model_updates, Task_2B.model_updates]
  round: current

# And so on...

UI/UX Implications

What Users See:

Create new Flow (primary entry point)
Monitor Flow progress
View Flow results
Use Flow templates

What Users CAN See (when monitoring):

Task execution status within a Flow
Task-level logs and errors
Resource usage per task
Task retry information

What Users DON’T See (as entry points):

“Create new Task” (Tasks are created by the system)
Task management pages
Direct task configuration

Backend Handles:

Flow → Task decomposition
Task scheduling and distribution
Task execution and monitoring
Result aggregation back to Flow

Example: Flow Monitoring UI

When users monitor a running Flow, they can see task details:

Breast Cancer Detection Flow - Running
│
├─ Step: Data Preprocessing [Completed]
│  └─ Tasks:
│      ├─ Image normalization (Node A) ✓ - 5,000 WSI processed
│      ├─ Image normalization (Node B) ✓ - 3,200 WSI processed
│      ├─ Feature extraction (Node A) ✓ - 128GB features
│      └─ Feature extraction (Node B) ✓ - 82GB features
│
├─ Step: Federated Training [Active - 67%]
│  └─ Tasks:
│      ├─ Local model training (Node A) - Round 67/100, Loss: 0.342
│      ├─ Local model training (Node B) - Round 67/100, Loss: 0.358
│      └─ Model aggregation (Central) - Aggregating round 66
│
├─ Step: Model Validation [Pending]
│  └─ Tasks: (will be created when training completes)
│
└─ Step: Model Packaging [Pending]

This shows tasks as execution details, not as primary concepts.

Benefits of This Architecture

Simplicity for Users: They think in terms of complete workflows, not technical tasks
Flexibility for System: Can optimize task execution without changing user experience
Scalability: Can parallelize tasks across nodes transparently
Maintainability: Clear separation between business logic (Flows) and execution logic (Tasks)

Implementation Note

In the codebase:

UI components work with Flow objects
Backend API decomposes Flow into Task objects
Task scheduler manages execution
Task results bubble up to Flow status