- replace class GraphNode with model-specific types, e.g. ThreadEventNode, 
  ProcessLeaveNode, and OffloadEventNode to avoid additional allocations
- handle data transfers between host and device as device tasks