Performance profiling
I am working on real-time image processing and looking for the fastest way to receive a pixel buffer, process its data (in this case a gray scale conversion), and then return the data for further processing. The GPUImage framework seemed like a strong candidate for this task.
I am trying to determine the total elapsed time between sending data to GPUImage, processing, and returning the data as a GPUImageRawDataOutput. My timing procedure is below:
//Get a pixel buffer CVPixelBufferLockBaseAddress( pixelBuffer, 0 ); int bufferWidth = CVPixelBufferGetWidth(pixelBuffer); int bufferHeight = CVPixelBufferGetHeight(pixelBuffer); GLubyte *pixel = (GLubyte *)CVPixelBufferGetBaseAddress(pixelBuffer); // Get the time NSDate* start = [NSDate date]; GPUImageRawDataInput *rawDataInput = [[GPUImageRawDataInput alloc] initWithBytes:pixel size:CGSizeMake(bufferWidth,bufferHeight)]; GPUImageGrayscaleFilter* customFilter = [[GPUImageGrayscaleFilter alloc] init]; GPUImageRawDataOutput *rawDataOutput = [[GPUImageRawDataOutput alloc] initWithImageSize:CGSizeMake(bufferWidth,bufferHeight) resultsInBGRAFormat:YES]; [rawDataInput addTarget:customFilter]; [customFilter addTarget:rawDataOutput]; [rawDataOutput setNewFrameAvailableBlock:^{ GLubyte *outputBytes = [rawDataOutput rawBytesForImage]; NSInteger bytesPerRow = [rawDataOutput bytesPerRowInOutput]; //Get the time elapsed NSTimeInterval interval = [start timeIntervalSinceNow]; //Do more processing... }];
However, when I do this, I get a result of ~300ms, even for low-resolution images. Are the start and stop points for my timer in the wrong locations? Or is there significant overhead associated with creating the GPUImage objects? Any direction would be appreciated.
(As a further note, I have read the post here, but my current use requires me to keep ownership of the pixel buffer. So, attaching a GPUImageView as an output target isn't an option.)
There's significant overhead to allocating initializing the raw data input, output, and filter objects, which is what is consuming almost all of your 300 ms here. You don't want to recreate these each time you filter something. Instead, create them once and just update the data as needed by using -updateDataFromBytes:size: on the raw data input.
You'll see much, much faster processing times once you're just sending in and extracting the raw data without recreating the whole filter chain each time.