Core AI ComputeStream Init Function question.

Question

Created 2w

Replies 1

Boosts 0

Participants 2

Does this API only work for inference running on the GPU? If the inference runs on the ANE, can I still use this API? I noticed that the commandQueue parameter is an MTLCommandqueue? https://developer.apple.com/documentation/coreai/computestream/init(commandqueue:)

Answered by Engineer in 894663022

Yes you can still use the API regardless of where the model ends up running, however the behavior of how quickly InferenceFunction.encode returns may be affected by where the model runs. For work encoded to asynchronous compute units like GPU and Neural Engine, it can generally encode the work and return quickly while the actual compute runs asynchronously (which you can later await for completion by calling await computeStream.currentWorkCompleted() or waiting on the output asynchronous values). However if the model runs fully on the CPU, then the InferenceFunction.encode call may block until all compute is completed.

As for the specifics of the compute stream supporting initialization from a MTLCommandQueue, that is to help support synchronization with external GPU pipelines. If the model runs on the neural engine, events are used to synchronize with the metal queue.

Answer 1

Engineer OP

Apple

1w

Accepted Answer

Yes you can still use the API regardless of where the model ends up running, however the behavior of how quickly InferenceFunction.encode returns may be affected by where the model runs. For work encoded to asynchronous compute units like GPU and Neural Engine, it can generally encode the work and return quickly while the actual compute runs asynchronously (which you can later await for completion by calling await computeStream.currentWorkCompleted() or waiting on the output asynchronous values). However if the model runs fully on the CPU, then the InferenceFunction.encode call may block until all compute is completed.

As for the specifics of the compute stream supporting initialization from a MTLCommandQueue, that is to help support synchronization with external GPU pipelines. If the model runs on the neural engine, events are used to synchronize with the metal queue.