The sub-cycles to ensure all particles pass between domains to completion within a time-step should be implemented with blocking receives. Blocking receives inherently synchronize all domains without the unnecessary overhead of using barriers or busy waiting loops.