glGenerateMipmap and raw pixel buffer
The code below creates a simple GPUImage processing chain that supports native pixel access with the following structure: GPUImageVideoCamera -> GPUImageGrayscaleFilter -> GPUImageRawDataOutput -> [native processing block] -> GPUImageRawDataInput -> GPUImageView. The native processing callback block essentially implements the approach outlined in post http://www.sunsetlakesoftware.com/forum/cpu-processing-between-gpuimage-..., and all works fine if I disable my DO_MIPMAP_IMAGE_PYRAMID flag. I'm able to extract the raw pixel buffer, draw to it, and display the modified buffer on the screen.
At this point I would also like to create a fast approximation of a Gaussian image pyramid to facilitate coarse-to-fine CPU processing using HW accelerated mipmaps via the glGenerateMipmap call. I'm assuming glGenerateMipmap will be faster than opencv or the like, but will be curious to run benchmarks when it is working. (It looks like the best filtering option offered by Open GL ES is linear, not Gaussian, but this will probably be fine for my needs, perhaps coupled with an initial Gaussian blur at level 0.) When I turn on the DO_MIPMAP_IMAGE_PYRAMID flag, the enabled code attempts to resize the texture in GPUImageRawDataOutput to the nearest power of 2 dimensions, and then retrieve the mipmap levels via glFramebufferTexture2D(*,*,*,*,level) as outlined in http://mmmovania.blogspot.com/2011/02/implementing-histogram-pyramid.html. The mipmap part isn't quite working, and the call to glGenerateMipmap() consistently results the following error: glGetError()==1282, which indicates "invalid operation". I'm sure I have some basic OpenGL calls out of order here. GPUImage seems to be working well, but I can't quite seem to figure our how to fit this piece into the framework properly. I'm assuming it makes sense to do this in the GPUImageRawDataOutput newFrameAvailableBlock callback, since pyramids would probably be used most frequently with CPU processing.
The alternative would be to create a padded texture containing all pyramid levels in the same image as illustrated here: http://upload.wikimedia.org/wikipedia/commons/5/5c/MipMap_Example_STS101.... That could be passed around as a texture in the current framework and would have the advantage that single filter operations (e.g., Ixx+Iyy) could process the pyramid in parallel with nice scale space properties. I may try that next if I can get the basic approach outlined below working first. Any input regarding proper use of the glGenerateMipmap call would be appreciated.
#define DO_NATIVE_PROCESSING 1 #define DO_MIPMAP_IMAGE_PYRAMID 1 // Convenience pixel access struct struct BGRA { unsigned char b, g, r, a; }; GLint checkGLError() { GLint error = 0; error = glGetError(); if(error != 0) { exit(EXIT_FAILURE); } return error; } // Based on http://stackoverflow.com/questions/946700/glpaint-save-image +(UIImage *) makeImageFromGLViewUsingWidth:(int)width AndHeight:(int)height { NSInteger myDataLength = width * height * 4; GLubyte *buffer = (GLubyte *) malloc(myDataLength); glPixelStorei(GL_PACK_ALIGNMENT, 1); checkGLError(); glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, buffer); checkGLError(); CGDataProviderRef provider = CGDataProviderCreateWithData(NULL, buffer, myDataLength, NULL); int bitsPerComponent = 8; int bitsPerPixel = 32; int bytesPerRow = width * height; CGColorSpaceRef colorSpaceRef = CGColorSpaceCreateDeviceRGB(); CGBitmapInfo bitmapInfo = kCGBitmapByteOrderDefault; CGColorRenderingIntent renderingIntent = kCGRenderingIntentDefault; CGImageRef imageRef = CGImageCreate(width, height, bitsPerComponent, bitsPerPixel, bytesPerRow, colorSpaceRef, bitmapInfo, provider, NULL, NO, renderingIntent); UIImage *myImage = [UIImage imageWithCGImage:imageRef]; free(buffer); return myImage; } +(void) saveImageToDisk:(UIImage *)image WithFilename:(NSString *)path { NSData *imageData = [NSData dataWithData:UIImagePNGRepresentation(image)]; [imageData writeToFile:path atomically:YES]; } - (void)viewDidLoad { #if 0 NSString *const kAVCaptureSessionPreset = AVCaptureSessionPreset1920x1080; m_ImageWidth = 1080; // TODO: link these to AVCaptureSessionPreset*x* string, or retrieve them from GPUImageVideoCamera m_ImageHeight = 1920; #else NSString *const kAVCaptureSessionPreset = AVCaptureSessionPreset1280x720; m_ImageWidth = 720; // TODO: link these to AVCaptureSessionPreset*x* string, or retrieve them from GPUImageVideoCamera m_ImageHeight = 1280; #endif [super viewDidLoad]; vc = [[GPUImageVideoCamera alloc] initWithSessionPreset:kAVCaptureSessionPreset cameraPosition:AVCaptureDevicePositionBack]; vc.outputImageOrientation = UIInterfaceOrientationPortrait; gf = [[GPUImageGrayscaleFilter alloc] init]; GPUImageView *v = [[GPUImageView alloc] init]; #if DO_NATIVE_PROCESSING #if DO_MIPMAP_IMAGE_PYRAMID // Calculate next power of two for glGenerateMipmap() call in setNewFrameAvailableBlock int width2=pow(2, ceil(log2(m_ImageWidth))); /* 720 -> 1024, 1280 -> 2048 */ int height2=pow(2, ceil(log2(m_ImageHeight))); rawDataOutput = [[GPUImageRawDataOutput alloc] initWithImageSize:CGSizeMake(width2, height2) resultsInBGRAFormat:YES]; #else rawDataOutput = [[GPUImageRawDataOutput alloc] initWithImageSize:CGSizeMake(m_ImageWidth, m_ImageHeight) resultsInBGRAFormat:YES]; #endif rawDataInput = [[GPUImageRawDataInput alloc] initWithBytes:[rawDataOutput rawBytesForImage] size:CGSizeMake(m_ImageWidth, m_ImageHeight)]; __block int counter = 0, width = m_ImageWidth, height = m_ImageHeight; // add block friendly width and height parameters __weak GPUImageRawDataOutput *rawDataOutputSelf = rawDataOutput; // avoid retain cycle warning >= iOS 5.0 w/ ARC [rawDataOutput setNewFrameAvailableBlock:^{ NSLog(@"Frame: %d", counter++); #if DO_MIPMAP_IMAGE_PYRAMID // Trying to create fast image pyramid using glGenerateMipmap(GL_TEXTURE_2D) and FBOs // Basic mipmap level extraction from // _http://mmmovania.blogspot.com/2011/02/implementing-histogram-pyramid.html // It seems mipmap setup must come after a call to glBindTexture, as in: // glBindTexture(GL_TEXTURE_2D, textureName); // ERROR: all calls to glGenerateMipmap fail with return code 1282 [rawDataOutputSelf rawBytesForImage]; // Trigger output to load texture GLuint fboIDs[6]; GLint textureName; glGetIntegerv(GL_TEXTURE_BINDING_2D, &textureName); // Get the current texture id checkGLError(); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR); checkGLError(); glBindTexture(GL_TEXTURE_2D, textureName); checkGLError(); glHint(GL_GENERATE_MIPMAP_HINT, GL_NICEST); // GL_NICEST==GL_LINEAR on iOS 5.0? checkGLError(); glGenerateMipmap(GL_TEXTURE_2D); checkGLError(); // ERROR: glGetError==1282 glGenFramebuffers( 6, &fboIDs[0] ); checkGLError(); int w = width, h = height; // Use width and height of frame, not padded texture dimensions for( GLuint i=0; i<6; i++) { glBindFramebuffer( GL_FRAMEBUFFER, fboIDs[i] ); checkGLError(); glFramebufferTexture2D( GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, textureName, i ); checkGLError(); if( !glCheckFramebufferStatus(GL_FRAMEBUFFER) ) { NSLog(@"Framebuffer fo level %d incomplete\n", i); exit(EXIT_FAILURE); } // Save each mipmap level for inspection UIImage *image = [ViewController makeImageFromGLViewUsingWidth:w AndHeight:h]; #if 0 /* save to disk */ NSString *dir = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) objectAtIndex:0]; NSString *name = [NSString stringWithFormat:@"mipmap%ld.png", (long)i]; NSString *path = [dir stringByAppendingPathComponent:name]; [ViewController saveImageToDisk:image WithFilename:path]; #else /* save to album */ UIImageWriteToSavedPhotosAlbum(image,nil,nil,nil); #endif w /= 2; h /= 2; } glBindFramebuffer( GL_FRAMEBUFFER, 0 ); checkGLError(); #endif NSUInteger bytesPerRow = [rawDataOutputSelf bytesPerRowInOutput]; //NSLog(@"Bytes per row: %d", bytesPerRow); GLubyte *rawDataOutputBytes = [rawDataOutputSelf rawBytesForImage]; // retrieve address for raw pixels for (unsigned int y = 0; y < height; y++) { struct BGRA *ptr = (struct BGRA *) &rawDataOutputBytes[y * bytesPerRow]; for (unsigned int x = 0; x < width; x++, ptr ++) { // For now just add scrolling bands if((y % 100) == (counter % 100)) { (*ptr).r = 255; (*ptr).g = 255; (*ptr).b = 255; (*ptr).a = 255; } } } // Trigger input callback as in <a href="http://www.sunsetlakesoftware.com/forum/cpu-processing-between-gpuimage-filters<br /> " title="http://www.sunsetlakesoftware.com/forum/cpu-processing-between-gpuimage-filters<br /> " rel="nofollow">http://www.sunsetlakesoftware.com/forum/cpu-processing-between-gpuimage-...</a> [rawDataInput updateDataFromBytes:rawDataOutputBytes size:CGSizeMake((bytesPerRow/4), m_ImageHeight)]; [rawDataInput processData]; }]; [vc addTarget:gf]; [gf addTarget:rawDataOutput]; [rawDataInput addTarget:v]; #else [vc addTarget:gf]; [gf addTarget:v]; #endif self.view = v; [vc startCameraCapture]; }
Terrific. Adding forceProcessingAtSize to the GPUImageGrayscaleFilter with nearest power of 2 dimensions seems to satisfy the call to glGenerateMipmap. This appears to be resizing to the new dimensions. In my case I'll have to replace this with a border/padding operation to preserve the original image aspect ratio, but this gets me going. I need to debug the rest and will hopefully post the working code soon. The resize modification is shown below. I'll keep a lookout in case you post a pyramid filter. I use this structure pretty extensively in OpenCV.
#if DO_MIPMAP_IMAGE_PYRAMID // Calculate next power of two for glGenerateMipmap() call in setNewFrameAvailableBlock int width2=pow(2, ceil(log2(m_ImageWidth))); /* 720 -> 1024, 1280 -> 2048 */ int height2=pow(2, ceil(log2(m_ImageHeight))); rawDataOutput = [[GPUImageRawDataOutput alloc] initWithImageSize:CGSizeMake(width2, height2) resultsInBGRAFormat:YES]; [gf forceProcessingAtSize:CGSizeMake(width2, height2)]; // Use filter to force dimensions to nearest power of 2 #else
UPDATE: The call to glGenerateMipmap(GL_TEXTURE_2D) takes about 0.56 seconds (ouch!) on an IPhone 4S running iOS 5.0 w/ a 1280 -> 2048 texture. A few posts online suggest that this function is typically HW accelerated, but I haven't found anything specific to the IPhone. The benchmark certainly suggests otherwise. Unless there is a missing magic configuration parameter, the next step may be to try implementing the pyramid by iteratively low pass filtering (as in GPUImageGaussianBlurFilter) and explicitly resizing the texture for each pyramid level. I'm curious if you've seen similar results in your tests.
The slow glGenerateMipmap was verified by others on the Apple Developers Forum. Since I only need grayscale pyramids, I've implemented a pyramid filter class using GPUImageFilterGroup to perform progressive Gaussian blurring using each channel of an RGBA texture as a pyramid level -- this is along the lines of a SIFT implementation you had mentioned in earlier correspondence:
This is something like the following matlab code:
texture(1,:,:) = I; texture(2,:,:) = conv2(texture(1,:,:), G, 'same'); # I*G texture(2,1:end/2,1:end/2) = texture(2,1:2:end,1:2:end); # decimate 1->2 texture(3,:,:) = conv2(texture(2,:,:), G, 'same'); # I*G*G texture(3,1:end/2,1:end/2) = texture(3,1:2:end,1:2:end); # decimate 2->3 texture(4,:,:) = conv2(texture(3,:,:), G, 'same'); # I*G*G*G texture(4,1:end/2,1:end/2) = texture(4,1:2:end,1:2:end); # decimate 3->4
Where I is the input grayscale image, G is a Gaussian Kernel (I use a separable version in the actual implementation), and
texture(1,:,:) = image.r;
texture(2,:,:) = image.g;
texture(3,:,:) = image.b;
texture(4,:,:) = image.a;
The catch is that I can't seem to find a way to perform the sub-sampling operation on each pyramid level without destroying the previous pyramid levels (color channels) in some way. As an example, the following vertex and fragment shader pair will nicely decimate the green channel, but it doesn't preserve the values in the other color channels.
NSString *const kGPUImageDecimationVertexShaderString = SHADER_STRING ( attribute vec4 position; attribute vec4 inputTextureCoordinate; varying vec2 textureCoordinate; void main() { gl_Position = vec4((position.xy * 0.5), 0.0, 1.0); textureCoordinate = inputTextureCoordinate.xy; } ); NSString *const kGPUImageShaderStringGreen = SHADER_STRING ( precision mediump float; varying vec2 textureCoordinate; uniform sampler2D inputImageTexture; void main() { gl_FragColor.g = texture2D(inputImageTexture, textureCoordinate).g; } );
I can fix this somewhat by manually retrieving the pixel at the decimated position before overwriting the green value and assigning the original colors along with modified green color as demonstrated in a modified fragment shader below
reducedPixel = texture2D(inputImageTexture, reducedCoordinate); originalPixel = texture2D(inputImageTexture, originalCoordinate); gl_FragColor = vec4(reducedPixel.r, originalPixel.g, reducedPixel.b, reducePixel.a);
but this still destroys the original higher resolution color channels (in this case red). This would be solved by the ability to preinitialize the texture, or the ability to perform two outputs in the fragment shader, which isn't supported in OpenGL ES 2.0. So I'm hoping there is a way to achieve the prior, such that a noop fragment shader will still preserve the original texture? This way I could selectively modify a single channel and leave the other color channels in tact.
This could also be achieved by the ability to run multiple fragment shaders, one to fill in the undecimated pixel and a second for the decimated pixel, but it isn't clear to me if this is possible either.
Are you sure that the raw data output texture is the active one that you're getting in glGetIntegerv(GL_TEXTURE_BINDING_2D, &textureName)? That makes some assumptions about whatever is currently bound that might not be correct. In fact, there is no output texture for the raw data output on iOS 4.x and the output texture used on iOS 5.x is never bound, so I don't think it will be picked up by that call. I think the texture you're getting there is the one being fed into the raw data output, which is most likely not a power of two in size.
Rather than use a raw data output, you might be able to use a standard filter with -forceProcessingAtSize: and then grab its output texture using -textureForOutput. With a forced processing at a power of two size, you should be able to activate mipmap generation for that texture either manually or automatically on rendering. I did something like this recently when exploring Gaussian pyramids myself. I may even create a specific filter subclass just to do this.