Defining a Complete Reference Architecture to Help Accelerate full-scale adoption of Augmented Reality
A report compiled by respected analyst firm Technavio estimates that the worldwide AR market will grow by over $70 billion between now and 2024.
There are a plethora of potential use cases that AR is likely to enable in the years ahead. In an educational environment, AR could be used to help capture pupils’ imagination – allowing them to visualise things that are perhaps too complicated for teachers to explain verbally. There are, likewise, many benefits it offers in terms of training, with the wearing of AR headsets permitting staff or apprentices to practice operating items of equipment, carrying out maintenance/repair work, or undertaking other tasks that could have serious costs or safety implications if they were being done for real at that stage. There are also a wealth of possibilities for AR when it comes to product design and architecture, enabling form to be better matched with function before any physical prototypes or mock-ups are started on. Studies have even shown AR to have great value in the treatment of severe psychological trauma. Elsewhere opportunities are already being acted upon in areas like retail, interactive museum exhibitions, entertainment, gaming and healthcare, to name just a few.
The imminent roll-out of initial 5G mobile communication infrastructure will be a major catalyst in accelerating the progression of AR – providing the higher bandwidth and lower latency levels needed to deliver compelling, lag-free user experiences. The advances being witnessed in image sensing, accelerometer/gyroscope modules and wearable technology are also set to play a big part in AR’s future traction.
As the global AR business continues to develop at pace, with new innovations emerging on a regular basis, it is expected that the way in which deployments are embarked upon will change significantly. There have been a large number of pilot projects and experimentations using AR technology that have demonstrated the benefits of AR for defined use cases, with measurable Return on Investment. Interoperability will be essential to go to the next level and deploy AR for mainstream applications. Interoperability is the ability for technology components from different providers to communicate and exchange data without translation, conversion or delays. Interoperability will limit vendor lock-in which is often a barrier to deployment at scale. Until now it has generally been the case that complete systems were purchased from one single vendor. Moving forwards, with new players entering the sector and a wider array of products on offer, there will be increasing interest in the sourcing of different component parts from multiple vendors. By taking this approach, end-users will be able to take greater advantage of the numerous options available, and thereby construct systems that better meet their specific application criteria and can evolve more easily by swapping a component for a higher performance or better suited component as and when it becomes available. Based on such dynamics, there is a pressing need for interoperability throughout the AR ecosystem.
There is a growing consensus of opinion that a comprehensive functional reference architecture for AR will help its adoption. It should cover all the different hardware and software components that would be used in the construction of AR systems (including sensing, processing, rendering, etc.) so that full interoperability may be achieved (through the use of well-defined interfaces). The upshot of all this would be that AR-based systems and services become much easier to implement, with the possibility of different AR components (offered by multiple vendors) to be ported from one platform to another. New AR platforms will thus be able to co-exist alongside legacy ones, and the barriers that use of different proprietary technologies currently poses will no longer be an issue.
In response to this, an ETSI Industry Specification Group (ISG) was founded towards the end of 2017; as a first step, the group identified and defined the components and functionalities that such an architecture should encompass. This spring, it published the ETSI GS ARF 003 group specification – which introduces the characteristics of an AR system and describes the functional building blocks of a generic AR reference architecture and their mutual relationships.
Formulating a Common AR Architecture
In the next few paragraphs we will look to outline each of the key functional building blocks which are included in the ETSI AR framework architecture, as featured in ETSI GS ARF 003. Each of these will be described in turn, with details given of their purpose and how they interrelate to one another.
To give a preliminary overview, before going slightly more in depth, the proposed ETSI framework breaks up AR systems into three key layers. Figure 1 gives a graphic representation of the high-level architecture of an AR system.
The three layers are:
• A hardware-based upper layer – Here tracking sensors are tasked with capturing data on the position and orientation of the AR device, so that virtual content relating to the user can be registered in real-time within a real-world setting. Though currently RGB cameras plus inertial measurement units (featuring accelerometers and gyroscopes) and GPS are generally relied on, more sophisticated approaches are now starting to be employed. It is expected that these will incorporate complementary sensing apparatus – such as dedicated vision sensors (depth sensors and event cameras), or exteroceptive sensors (IR/laser tracking, etc.). This sensory aspect has to be complemented by some form of processor element, usually either a graphics processing unit (GPU) or a vision processing unit (VPU). The data intensive nature of this work means that a considerable amount of resource is going to be required here. Then there are the interaction interfaces, which allow the user to interact with the virtual content, and the rendering interfaces, which are responsible for rendering the virtual content to the user.
• A software-based middle layer – This comprises a number of different parts. The ‘Vision Engine’ is responsible for mixing together virtual and real-world content. Based on input from the sensors, it constructs a 3D representation of the real-world scene that is to be augmented. It then localises the AR device in relation to the real world and objects present within the scene in relation to the position and orientation of the AR device. Alongside that there is the ‘3D Rendering Engine’, which maintains an up-to-date internal 3D representation of the virtual scene and renders it. This is continuously updated, using data on object movements and user interactions.
• Finally, there is the lower layer – which consists of all the relevant data related to the interactive virtual content that augments the real world and to the representation of the real world reconstructed by the AR system.
Figure 1: Global overview of the architecture of an AR system
There are a series of functions and sub-functions that are defined within the functional reference architecture as specified in ETSI GS ARF 003. These deal with data pertaining to the AR device or the objects present within the real-world scene. Here are some of the key ones to be aware of:
• Within the ‘World Capture’ function, which provides information from the real world and the AR device itself captured by various sensors, are the following sub-functions.
• The ‘Positioning’ sub-function is assigned with delivering the location of the AR device and potentially its orientation with respect to the coordinate reference system in place.
• The ‘Movement and Orientation’ sub-function is focused on the movement and orientation of the AR device.
• The ‘Visual’ sub-function is responsible for delivering any image produced by imaging equipment (e.g. RGB cameras, depth sensors, or event cameras).
• Relying on access to sensor data, the ‘World Analysis’ function is concerned with the AR system’s understanding of the real world.
• The ‘AR Device Relocalization’ and the ‘AR Device Tracking’ are two key sub-functions which provide the position and orientation of the AR Device in order to align/register virtual content with the real world. The relocalization sub-function is used when the AR system has no prior knowledge of where it is (at initialization or when tracking has failed) while the tracking sub-function benefits from the position and orientation of the AR device estimated at the previous image captures.
• The ‘3D Mapping’ sub-function is in charge of constructing a detailed 3D representation of the real-world – derived from a set of 3D features acquired by sensor devices (and using a triangulation of the features which match in at least two frames).
• Its ‘Object Recognition and Identification’ sub-function enables the identity of different real-world objects to be derived. This can be done by accessing current imaging data or through access to prior knowledge that has already been collected on the object (and is stored for reference).
Other functions and sub-functions within the framework reference architecture relate to data storage, object segmentation, scene meshing, preparation of the assets needed for the AR experiences, gesture recognition, haptic feedback, audio effects, AR authoring, content optimisation, scene management, etc. These can be learnt about by reading the full specification.
By adhering to a consistent and cohesive architecture, future AR deployments (as well as the services emanating from them) will prove to be more effective. There will be less technical challenges to tackle when it comes to integrating components from different vendors. As a result, the whole AR ecosystem will be strengthened significantly. The recently published ETSI group specification represents the first step toward formulating the desired framework for the interoperability of AR components, systems and services. The aim is for this, and future iterations, to encourage the adoption of AR technology and make this a vibrant and inventive market segment for everyone to benefit from.