Days
Configuration Settings

Collectives

High-level communication patterns expanded into concrete flows.

Collectives are higher-level communication patterns (common in ML training workloads) that are expanded into concrete flows at simulation start (Topology::process_collectives in src/topos/topo.rs).

Parsing is implemented in src/flows/collective.rs.

Supported collective_type values:

  • Broadcast
  • Gather
  • AllReduce
  • RingAllReduce

Collectives can be specified individually ([[collective]]) or as sets ([[collective_set]]).

[[collective]] schema

Example (broadcast with packet distribution flows):

[[collective]]
collective_type = "Broadcast"
flow_type = "PacketDistribution"
flow_count = 2
sources = [0, 0]
sinks = [1, 2]
[collective.traffic]
  initial_delay = 1.0
  duration = 10.0
  arr_dist = { type = "Uniform", low = 1.0, high = 1.0 }
  pkt_size_dist = { type = "DiscreteUniform", low = 1000, high = 1500 }

Supported fields:

  • collective_type (required)
  • first_flow_id (optional): base ID for flows produced by this collective
  • flow_type (required): PacketDistribution or TCP (or DCQCN if enabled)
  • flow_count (required): number of flows in the collective
  • paths (optional): explicit per-flow paths; sources/sinks are derived from each path’s endpoints
  • sources / sinks (optional): explicit per-flow endpoints
  • routing (optional): routing protocol for produced flows
  • traffic (required): [collective.traffic] (same schema as flows)

Notes:

  • The optional graph field is currently stored but not used to derive endpoints; to control endpoints, use sources/sinks or paths.

Endpoint selection rules

Days determines (source_host, sink_host) for each flow in this order:

  1. If sources and sinks are provided, they must each contain flow_count entries and satisfy type-specific constraints:
    • Broadcast: all entries of sources must be the same
    • Gather: all entries of sinks must be the same
    • RingAllReduce: ring consistency (sources[i] == sinks[i-1])
  2. Else, if paths is provided, it must contain flow_count paths:
    • sources are path[0], sinks are path[last]
    • for RingAllReduce, each sink must match the next flow’s source
  3. Else, endpoints are generated randomly from hosts (seeded by seed):
    • Broadcast: pick one source, choose sinks from the other hosts
    • Gather / AllReduce: pick one sink, choose sources from the other hosts
    • RingAllReduce: requires flow_count == hosts.len() and uses all hosts in a ring

[[collective_set]] schema

Collective sets generate multiple collectives with the same shape.

Example:

[[collective_set]]
collective_type = "Gather"
collective_count = 2
flow_type = "PacketDistribution"
flow_count = 2
sources = [[0, 1], [1, 2]]
sinks = [[2, 2], [3, 3]]
[collective_set.traffic]
  initial_delay = 1.0
  duration = 10.0
  arr_dist = { type = "Uniform", low = 1.0, high = 1.0 }
  pkt_size_dist = { type = "DiscreteUniform", low = 512, high = 512 }

Fields:

  • collective_type (required)
  • collective_count (required)
  • first_flow_id (optional): base flow ID for the entire set
  • flow_type (required)
  • flow_count (required): number of flows per collective in the set
  • sources / sinks (optional): list-of-lists, one entry per collective
  • routing (optional)
  • traffic (required)

Flow ID assignment:

  • Each collective in the set gets a contiguous flow ID range:
    • first_flow_id + index * flow_count .. first_flow_id + (index + 1) * flow_count

TCP collectives and application data

For TCP collectives, Days can optionally back flows with application-level byte buffers (src/flows/app_source.rs).

Behavior:

  • TCP Broadcast: all flows in the broadcast share the same buffer, so each receiver gets the same byte stream.
  • TCP RingAllReduce: each hop sends a specific chunk (byte range) of the overall buffer, using per-flow offset/length handles.

Tune the app-source actor via [app_source] (see /docs/configuration/logging).

On this page