Saturday, February 6, 2010

Writing High Performance Code on the iPhone

Over the last few months a lot of people have asked me how we write our code for the iPhone. Do we use straight C99? Or do we use C++? If so do we use exceptions? RTTI? STL? So I decided that instead of responding by email I would detail it in an article here.

Now please keep in mind that this isn't meant to be a bible on iPhone or mobile/embedded game development. You should always solve the problem with the right tools, and no two games are alike. Would you really try to develop a casual puzzle game using the exact same methods you would use to build a cutting edge shooter?

Also note that the majority of people that have been asking me these questions are PC/Mac developers who are either looking at or have already started iPhone development. So they are the target audience of this article. For the majority of developers who have worked on mobile/embedded devices and game consoles there will likely be nothing new here.

So let's get down to business.

What language did we use to build our iPhone engine?
We use a mix of C and C++ that I believe is commonly referred to as C with classes. The majority of the code would actually be classified as C, but we group objects together in a class (as well as using namespaces) rather than a struct and functions. The decision to allow classes was made simply because it was preferred by the development team, and we knew we would end up having to reimplement a lot of what C++ gives us with objects.

Exception Handling
We have exceptions disabled in the compiler settings for all projects. The current version of GCC used in the iPhone SDK has zero cost exceptions, but you should understand that this does not mean zero overhead. What it means is that there is no execution time overhead when an exception does not occur. Great we have free error handling? Not quite! The compiler will generate additional code and every stack frame will now contain added information required for stack unwinding. With an embedded device like the iPhone the unwind tables will take away resources from the precious little amount that we have. Meanwhile we don't lose much at all by disabling exceptions as there are very few errors that should actually be classified as exceptional on the iPhone. Considering you are shipping for an embedded device, your software has a single point of origin (App Store), and users are unable to modify the file system, the majority of errors will be developer errors. Of course there are some exceptional situations such as low memory, unable to allocate memory, network connection dropped, etc; but for the most part this is a very small group.

We use assertions very heavily in order to catch developer errors, and functions will return error codes in places where that functionality is needed. There are a lot of C++ programmers that heavily rely on exceptions, which is fine 99% of the time on the PC where you have near infinite resources (although this can be debatable), but on an embedded device I need a smaller footprint and all the resources I can get my hands on. To anyone interested in exception handling and wanting to know the cost and implementation details I encourage you to read the Exception Handling ABI for the ARM Architecture.

Virtual Functions, RTTI, and Multiple Inheritance
All of our projects have RTTI disabled in the compiler settings for a few reasons. The first being that while a failed dynamic cast on a pointer will return null, in the case of a reference cast it will throw an exception, which we have disabled. Another reason is that some compiler implementations will end up adding roughly 40 bytes per class of overhead (although many have much better implementations and I have not researched GCC to know exactly how it handles this). The base object model in our engine gives us the non-generic RTTI functionality that we need and only 4 bytes of overhead. Although we do not use this often because as a rule we make sure not to delay any type checking that can take place at compile time rather than run time. This way not only do we have a smaller footprint and faster execution time, but we also have earlier developer error detection.

I'm not going to say much regarding multiple inheritance / virtual inheritance other than we don't use this functionality on any platform and we design our code as not to need it.

Virtual functions are one of the great things that C++ gives us over C, but we use them extremely sparingly. Of course there are the usual rules of not using virtual functions in performance critical areas, tight loops, etc; but we also go a little further. As our engine runs on multiple platforms there is a need for different platform specific subsystems such as rendering. This is easily handled in C++ by creating an interface of virtual functions and then having your platform specific classes inherit from this. But at the same time this is not completely free and on some platforms the performance is extremely bad. You could end up disabling branch prediction and in the worst cases trashing the cache or flushing the instruction pipeline. Dereferencing the pointer may not be the main concern on some platforms, but the compiler skipping optimizations and inlining can be. What we do in these situations is to use platform #defines to include and typedef or inherit. Andre Weissflog from Radon Labs made a great post about this on his blog and how he switched to this method with the Nebula 3 engine. You are sacrificing the abstract class / virtual interface design in this way, but anyone with a PPC device (whether it be Mac, PS3, or 360) will tell you that there is definitely a cost with a missed branch or flush. I did not test the performance impact on an ARM device like the iPhone simply because a small benchmark will show you nothing, and at the time we made this decision we didn't have a large codebase to test with.

Memory Management
This is the one area that can actually differ from regular game console development (at least up until the 360/PS3 but even those systems have better memory gaurantees). In most cases you would do all of the memory management in your engine by allocating various heaps for different uses and making sure to minimize fragmentation. So what makes the iPhone any different?

While consoles have dedicated amounts of memory that you can count on (note that with the 360/PS3 you have to account for the background system usage and events such as the dashboard/XMB which use memory) the iPhone will not always give you what you ask for on application launch. On top of that the iPhoneOS will detect that it is running low on memory and send a warning to all of the current running applications (your game as well as the background applications such as phone, mail, etc) because in the case of a call or SMS there would be the possibility of failure. If you receive this notification you had better free up some resources as soon as possible otherwise the iPhone will force quit your process. With the iPod Touch and iPad this is much less of a concern as there are no phone interruptions, although they will still give you low memory warnings.

This is still an area that we are constantly working on and optimizing. Currently we are using nedmalloc as an allocator due to the extremely low memory fragmentation (we gained a performance boost just by using this rather than the default allocator). On top of that we have various pools for application, level, and frame lifetime allocations as well as pools for specific uses such as particle effects.

I will write another post strictly on this issue in the future once I have more time to analyze the exact performance impact and issues that we are running into.

When developing on the PC or Mac I love the STL because it is simple to use and gives you fully debugged and working containers and algorithms. Unfortunately because our engine and games and built for multiple platforms, the majority of which are embedded devices, we do not use it. Why? Well as I mentioned earlier we have exception handling and RTTI disabled in the compiler settings. Also note that memory management isn't the most fun thing in the world to watch when using the STL on a constrained platform. The STL is a great library for generic cases on a variety of platforms, but for us custom containers were needed.

We did not want to start from scratch and wanted to use containers that were written, debugged, tested, and preferably used in shipped titles. So where could we find this magical library? Again I will point you to the blog of Andre Weissflog where you can download the Nebula 3 SDK which is MIT licensed and use/modify the containers how you see fit. Sure we could have built our own container library from scratch instead, which would have also meant testing and debugging, but time is not infinite and we chose to concentrate on the engine, graphics, and gameplay. This library gave us the functionality we needed with very little modification.

We are using more of a C style approach adding in the convenience of C++ for working with objects. Please don't confuse this with "we used C++ for object oriented programming" because we could have accomplished the exact same design in straight C99. Objects would be structures instead of classes and functions would have a this pointer as the initial argument.

We sacrificed some common C++ idioms, designs, and functionality in order to gain maximum performance on the iPhone and other embedded devices because our games push the hardware to its limit.

I barely scratched the surface with this introductory article and really only covered the very basic details. There are still many more topics to cover such as SIMD using VFP and NEON (you will need this for animation and skinning performance whether you like it or not), various game loop structures on the different devices, optimizing mesh data and packing vertex data with your asset pipeline, memory alignment, using compressed PVRTC textures, and many more. These topics require an in depth explanation and implementation details / samples so I have saved them for a series of future articles.

Also here are some great reference points for console engine development, graphics, and the iPhone:

Oolong Engine - An MIT licensed iPhone game engine owned and managed by Wolfgang Engel, co-founder and CTO of Confetti Special Effects and former lead graphics programmer at Rockstar San Diego.

Diary of a Graphics Programmer - The personal blog of Wolfgang Engel containing extremely good information on graphics programming and iPhone development.

The Brain Dump - The personal blog of Andre Weissflog, co-founder and technical director at Radon Labs. Home of the MIT licensed Nebula 3 engine and a source of very good information on cross platform and console game engine development.

Gamedev.Net OpenGL Forums - Not only do these forums contain years worth of great information on GL and GLES but it is a great community to get help with problems you are unable to solve.

iPhone 3G Rendering Pipeline Video - A video by the lead iPhone developer Renaldas Zioma at Unity Technologies giving an in depth view of the iPhone GPU and drivers.

Good Coding!

No comments:

Post a Comment