This post was actually written sometime ago, alas XE2 Update 1 didn’t change much.
I’ve been looking at FireMonkey 3D side, by that I mean strictly the 3D side, not the UI components, or the 2D. Here are some observations, most born from maintaining and developing 3D software in C++ and later with GLScene , and with an eye to eventually porting some of GLScene code to FireMonkey (after all, most of GLScene’s code is actually linear algebra stuff, mesh manipulations, file format imports, etc. and not OpenGL-specific).
Note that everything below is fixable, most of it quite trivially, but Embarcadero has to lead by baking it into the framework, in the core classes & components. If they don’t and you implement it yourself, you’ll end up having to duplicate huge portions of the framework. Such a duplication will be a PITA when they realize they need it in the framework, and implement it… in a fashion that will likely be incompatible with yours.
Also note that implementing some of these will require either interface-breaking changes, still possible now IMHO, as FMX is still young, or hacks/workarounds later on (what happened with GLScene, might as well avoid that if they can).
FireMonkey  is officially pitched at “Business” 3D (as opposed to 3D games), which isn’t that far from what GLScene is used for, as even if GLScene is used for gaming purpose, its bread and butter was as much business as it was gaming (cf. the galleries here  and here ).
Assuming we’re restricting the scope to real-time rendering engines, what differentiates a game engine for a 3D engine? In terms of pure functionality and capability, there is little specific, Unreal Engine  f.i. encompasses a broad range of visualization and UI applications, the most differentiating factor is what the engine processes:
- A 3D game engine typically sits at the end of an assets tool-chain, and handles “ready to use” meshes, textures, shaders and other assets. The tool-chain is supposed to pre-compile and prepare everything, so that the game engine only has to deal with the rendering and interactivity.
- A business 3D engine on the other end sits at a higher level, it has to handle raw assets, which come out of simulations, data crunching, image libraries, etc. and do what’s needed to render them in robust, quality fashion, while handling gracefully a variety of situations and corner cases.
In the case of FireMonkey, target platforms are mobile devices, iPad and business machine GPUs: all these are rather low-end hardware, in terms of capability, performance and available memory. In other words, FMX can’t rely on having a powerful GPU with plenty of super-fast video RAM, but rather has to deal with paltry integrated chipsets which share RAM with the CPU.
Next: Scene Graph. 
Previous: Scope. 
Like GLScene, FMX is based on a scene-graph , and more precisely a variant with roots and concepts from the ancient 3DStudio  (DOS era), you can see cut down versions of them in FMX: the Camera/Target approach and the Dummy being the most obvious. Similar concepts exist in other scene graphs, but typically with different terminology and slightly different (cf. Ogre , Blender , OSG …).
The FMX scene-graph was however simplified/crippled in several ways:
- the primary design-time orientation is absolute angles, that’s problematic because rotations aren’t commutative in 3D space, and there are such things as gimbal lock  to cause trouble. If relative rotations are practical beyond simple demos, for real world use, absolute angles are not, you need well defined orientation, which means vectors (ie. matrix ) or quaternions .
- the camera model has been over-simplified, leaving out such key aspects as field of view , depth of view and near plane bias. While the first two are key for obvious reasons, the near plane bias is just as important. Because of the maths behind the depth buffer, it is the single most governing factor to numerical accuracy of the depth buffer  (and minimize artifacts known as Z-fighting ).
- the scene graph is rendered hierarchically (see below).
The absolute angles orientation existed in GLScene from very early on, and over the years, grew to become a major source of frustration for users, ending up with tutorials dedicated to explaining why they were frustrating, and why you should move away from them.
Unless you’re an airman or accustomed to working in a roll/pitch/turn  environment, rotations in 3D just won’t always behave as you expect they should. It was kind of a let down to see this mistake repeatedso prominently in FMX.
The hierarchical scene graph rendering was actually GLScene’s original approach, it’s one that feels quite simple and natural, but it also grew over the years to be a factor holding back the library, and had to be worked/hacked around in different ways. Once again, it was a disappointment to see that FMX was based on GLScene’s old approach.
A better solution is to separate the rendering from the scene graph, this is useful and even required in various scenarios:
- required to render semi-transparent objects , all techniques for handling opacity require a non-pure -hierarchical rendering
- facilitates handling of visibility culling , ie. not rendering what isn’t visible, because it’s off-camera, beyond the field of view, occluded by opaque objects, etc.
- required for deferred shading  and other multi-pass approaches (either for shading, shadow volume , etc.)
As a consequence, FireMonkey can’t even render scene graphs containing semi-transparent objects correctly if you don’t manually order them… That’s a major letdown.
Next: Materials and Textures. 
Materials and Textures
This is another major roadblock, FMX is using approaches similar to the original GLScene one, that proved problematic and later had to be worked around.
The default material is very limited, and defined into the objects. A better approach (like later GLScene) would be to have them in a material library (which is to materials what an action list is to an action, it allows reuses and centralizes them). With Delphi XE2 property editors, this could have been done in a painless and convenient fashion.
The standard texture  model is too limited too, not only are textures not shared (they should live in a library) they also lack basic properties. Sharing textures is key when you don’t have a lot of video memory, or when that memory is slow and not dedicated (like on an iPad, or a business PC). You also need at the very least to be able to control texture wrapping /clamping and texture filtering (mipmap generation and trilinear filtering at the very least).
The lack of mipmapping  and anisotropic filtering  have implications on performance and rendering quality, the lack of texture built-in texture compression  support has performance implications. The COLLADA Viewer demo f.i. exhibits aliasing and pixel shimmering issues because of it.
Provision for 3D textures should also be made, those are almost useless in games, but useful in a business 3D engines (f.i. in medical visualization).
Shaders are for all practical purposes handled as if FMX was a pure game engine: you need pre-compiled shaders, you don’t have a unified cross-platform high-level shader compiler (like Cg ) or generator component (writing shaders by hand gets old very fast, as it quickly becomes a combinatorial problem).
All in all, materials aren’t well supported by FMX at design-time, you’re left with having to write your own code to manage material libraries. That just isn’t what you would expect from a general purpose or business 3D engine.
Once again, FMX goes for a limited approach similar to that of early GLScene versions: having a specific mesh object for each mesh format, and no standardized mesh format (well there is an embryo of one, but it’s too limited, and the COLLADA viewer skirts it f.i, the 3DS demo before it skirted it too). This is compounded by the scene-graph and materials limitations: the mesh object has to handle its own rendering and its own materials.
This is problematic because it means any form of advanced mesh-based algorithms have to be written against specific mesh objects. This impacts everything mesh-related from imports/exports, manipulations, optimizations, animations (skeletal animations, morphs, etc.) to rendering (extracting silhouettes, BSP, bounding boxes, occlusion etc.) to interaction (collision testing, etc.).
Next: Candencing and animations 
Cadencing and Animations
When doing animations, be it a simple following of a spline path or simulations, you need to refresh the display periodically, after having all the scene elements updated.
Typically that involves a so called “game timer”, which triggers at a fixed frequency (usually that of the display, or a fraction of it), along with a frame stepping/progression logic that can handle frame skipping (so that you don’t end up having the UI lag the user when the hardware can’t keep up). You also need a time reference, preferably global and not looping.
Well, FMX only has an embryo of such an architecture, and it is not pervasive. Also the cross-platform time reference (TPlatform.GetTick) returns a single precision floating-point value that can loop… double ouch. Might as well take the opportunity of a new framework to Do It Right.
In GLScene, cadencing wasn’t in initially, and adding it after the fact took time, especially to make it pervasive, pity FMX didn’t build it right into the framework.
All the above points are fixable, but they’re also fundamental missing aspects for doing 3D with FireMonkey, if you don’t want to replicate huge portions of the framework (cf. the COLLADA Viewer sample).
They mean that if you want to achieve anything beyond a few poorly texture objects, you’ll need to design and write a lot of custom code rather than rely on the framework… with obvious implications of obsolescence and compatibility issues whenever FMX finally gets the features in standard.