Concurrent filter initialization and usage
Hi there,
in my app I present the user with a palette of filters from GPUImage - say 16 of them.
All these filters are applied to the same still image, and the user sees a thumbnail of each filtered image.
That's it.
Now, I'm allocating, initializing, configuring and applying all the 16 filters concurrently using dispatch_async on a background queue.
As soon as a filter completes processing the corresponding thumbnail is shown.
Things work mostly fine, but I'm having some strange issues:
1) if I use 2 or more times the same filter with different parameters, I get the same visual result for all of them, as if one computation result is overriding all the others;
2) if I include filter groups such as GPUImageSmoothToonFilter or GPUImageSoftEleganceFilter I sometimes get corrupted visual results for these groups;
3) If I throw in the mix some CIFilter, then the app sometimes crashes.
So I'm wondering: is GPUImage thread safe? what's going wrong with my concurrent initializations?
Notice that it mostly works; there are just a few corner cases where it fails.
The whole framework used to be completely unsafe to use around any kind of threading. I changed that about a month ago, when I implemented a core serial dispatch queue that runs on a background thread for doing the image processing. I have to use a single queue, because that needs to lock around the shared OpenGL ES context used by the framework.
This context should be distinct from Core Image's OpenGL ES context (unless you somehow are sharing the two), so there should be no conflicts there. Are you sure this isn't a memory-related crash, because others have run into that while having the two operate in parallel.
As far as the odd artifacts with multiple filters of the same type, that's a known bug in an optimization I implemented a couple of weeks ago. I tried to avoid having to recompile shader programs of the same type by caching them, but that has lead to problems where setting parameters on one shader program sets them for all instances of that program. I'm working on a fix for that. This is unrelated to any threading issues.
In reality, you gain little by trying to parallelize any processing using this framework, because only one thing can use the GPU at a time. The threading benefits have already been realized by moving the framework's processing off of the main thread and by allowing for certain CPU and GPU actions to run in parallel (leading to a ~10-20% speedup with many filters, and preventing the main thread from being blocked or blocking the processing in the framework). Still, it should be safe to attempt to initialize and use several filters at once, because I've tried to protect against undesirable changes in the OpenGL ES state machine. Perhaps I haven't caught all of the cases here.
Hi Brad,
thank you for your answer and for the great framework.
Maybe you're right about the crashes I see when using concurrently CoreImage and GPUImage: that may be a memory-related issue.
About the parameter issue: OK, I'll wait for the fix.
I understand that I'm not gaining much by parallelizing the creation/parametrization/usage of the filters; I'm now using them serially, and that's fine. However, it seems to me that the most time-consuming part is the shader compilation, which happens during the filter allocation/initialization phase if I'm right; where does this compilation happen? GPU or CPU? Is it parallelizable in principle if you need to set up a set of filters?
The slow shader compilation was why I tried to cache the shader programs for reuse, but as you can see that had some side-effects. There's a balance to be struck in there somewhere, but I still need to think about it.
Shader compilation is done on the CPU, via the OpenGL ES driver, but it needs GPU resources. If I recall correctly, I tried to parallelize this and ended up causing crashes due to shared resources, but I may need to explore that again.
I was able to answer this myself; I just wanted to share.
So, it seems like the underpinnings of GPUImage are not thread safe, as they rely on a shared openGLES context.
The framework itself is taking actions against possible thread-safety issues by serializing all the key operation on a serial CGD queue (called "com.sunsetlakesoftware.GPUImage.openGLESContextQueue").
This means that by dispatching asynchronously and concurrently my 16 filters I was under the *impression* of having them work in parallel; in fact, everything was being serialized by the framework queue.
This is sufficient to explain issue (3): CIFilter is not (of course) using the same serialization queue and is possibly causing race conditions on the use of openGLES resources.
Issue (1) and (2) are more complex; the typical three-step workflow (allocate-parametrize-use filter) is dispatched by the GPUImage framework as three (or more) separate blocks on the serial queue. If you ask concurrently to allocate-parametrize-use several filters, these three steps from different filters may be interspersed with unpredictable results.
The bottom line seems to be that it's only safe to use one filter (or chain of filters) at a time.
Or am I missing something?