wwdc21-10040 | Apple Developer Forums

Detect people, faces, and poses using Vision

Discuss the WWDC21 session Detect people, faces, and poses using Vision.

View Session

Post

Replies

Boosts

Views

Activity

After exporting an action classifier from Create ML and importing it into Xcode, how do you use it do make predictions?

I followed Apple's guidance in their articles Creating an Action Classifier Model, Gathering Training Videos for an Action Classifier, and Building an Action Classifier Data Source. With this Core ML model file now imported in Xcode, how do use it to classify video frames? For each video frame I call do { let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer) try requestHandler.perform([self.detectHumanBodyPoseRequest]) } catch { print("Unable to perform the request: \(error.localizedDescription).") } But it's unclear to me how to use the results of the VNDetectHumanBodyPoseRequest which come back as the type [VNHumanBodyPoseObservation]?. How would I feed to the results into my custom classifier, which has an automatically generated model class TennisActionClassifier.swift? The classifier is for making predictions on the frame's body poses, labeling the actions as either playing a rally/point or not playing.

Machine Learning & AI Core ML Core ML Create ML wwdc21-10037 wwdc21-10040

655

Dec ’21

Accurately getting timestamps for start and end of tennis rallies

I'm building a feature to automatically edit out all the downtime of a tennis video. I have a partial implementation that stores the start and end times of Vision trajectory detections and writes only those segments to an AVFoundation export session. I've encountered a major issue, which is that the trajectories returned end whenever the ball bounce, so each segment is just one tennis shot and nowhere close to an entire rally with multiple bounces. I'm ensure if I should continue done the trajectory route, maybe stitching together the trajectories and somehow only splitting at the start and end of a rally. Any general guidance would be appreciated. Is there a different Vision or ML approach that would more accurately model the start and end time of a rally? I considered creating a custom action classifier to classify frames to be either "playing tennis" or "inactivity," but I started with Apple's trajectory detection since it was already built and trained. Maybe a custom classifier would be needed, but not sure.

Machine Learning & AI Core ML Core ML Vision wwdc21-10039 wwdc21-10040

634

Dec ’21

Vision's pose model architecture

Could I please ask what is (at least plainly) the deep learning architecture of the Apple's custom pose models available through Vision (for example with the VNDetectHumanBodyPoseRequest)? Or whether it is based on some publicly used architecture (such as ResNet) only with modifications or custom Apple dataset? I was not able to find this information anywhere in the Apple documentation and it would be highly beneficial to know this, as we are using this data in a research about which we want to publish a paper. Thanks beforehand!

Machine Learning & AI General Vision Machine Learning Core ML wwdc21-10040

773

Aug ’21

What are the settings that can be adjusted for Apple Vision framework for Hand gestures?

I can run the given example by Apple (https://developer.apple.com/documentation/vision/vndetecthumanhandposerequest). However, I am not sure what are the options to fine tune its behaviour? I know you can control by the confidence level, is there other ways to control the points to make the detection more consistent?

Machine Learning & AI General wwdc21-10040

669

Jul ’21