NVidia Meeting 2015-3-4: Difference between revisions

From XVis
Jump to navigation Jump to search
Line 22: Line 22:
** minus: hard boundaries between devices (??)
** minus: hard boundaries between devices (??)
** plus/minus: implementation easier?  (depends on details)
** plus/minus: implementation easier?  (depends on details)
* one MPI tasks per node, devices are treated as one giant device
** plus: less MPI tasks
** minus: could lend itself to inefficient patterns (reaching across device memories)
* one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
* one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
** plus: less MPI tasks
** plus: less MPI tasks

Revision as of 17:00, 4 March 2015

Agenda

  • Weds, March 4th, 1-5pm: VTK-m design review (Ken Moreland)
  • Thurs, March 5th, 8am-noon: updates from NVIDIA

Design review

Issues raised in design review

How to handle multiple devices with one host?

We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). Options presented were:

  • one MPI task for each device (i.e., multiple MPI tasks per node)
    • minus: may be lots of MPI tasks
    • minus: may be incongruent with sim code's usage of MPI
    • minus: hard boundaries between devices
    • plus: easy to implement
  • one MPI task per node, with (for example) threading to manage access to multiple devices
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • minus: hard boundaries between devices (??)
    • plus/minus: implementation easier? (depends on details)
  • one MPI tasks per node, devices are treated as one giant device
    • plus: less MPI tasks
    • minus: could lend itself to inefficient patterns (reaching across device memories)
  • one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • plus: no boundaries between devices
    • minus: big implementation, right?

What use case are we optimizing for?