29 Mar 2019

ROAMING: FROM NETWORK TO NETWORK IN REAL-TIME

Tech

 

Authors
Xavier Bush

Jesper Lindström
Dr. Christian Dombrowski

 

During the three first blog posts of this series, we have learnt that wireless technologies can, in fact, provide the required performance of time-critical automation processes. However, we need to stop here and recall that one of the main purposes of making machines mobile is to make the shop floor flexible, and thus reconfigurable. Naturally, whenever there is a mobile station, a very important question pops up: will this mobile station have the same quality of service in its complete working area? In other words, will it have the same low latency and high robustness performance to do the time-critical task it is supposed to do?

The most popular techniques used to ensure that a mobile station always maintains service/coverage regardless of its position are roaming and mesh networks. The goal of this blog post is to analyse both techniques and to expose why mesh networks are not optimal for time-critical applications, making roaming the current optimal solution for mobile stations in industrial automation setups.

 

Mesh networks: a clever technique that overuses wireless resources

In recent years, the concept of mesh networks has become increasingly popular. Its flexibility and the fact that it is used in Wi-Fi and WirelessHART networks account for both, its popularity and its fit for numerous use cases. The concept of the mesh network is clever. As users, we have more than once found ourselves sitting in a room where the Wi-Fi signal from the router is weak, making our user experience painful. Therefore, using devices that are allocated between our device and the router to forward the information is a smart solution.

With this technique, the coverage area of a simple Wi-Fi network increases with the number of stations. Which means that if we take an average 20 m of coverage area, a network of 5 stations could reach 100 m of coverage, when conveniently allocated. As all stations can be used to forward packets, although some pre-config is needed, this solution is highly flexible.

 

A mesh network composed by AGVs, forklifts and a human (master)

 

However, mesh networks present certain flaws that make them unsuitable for time-critical automation processes. Additionally, the potential latency that a single transmission might experience is way too large and unpredictable, making it not suitable for most automation applications. In the example of having the transmitter (TX) two hops away from the receiver (RX), we would need to multiply the standard latency of a single transmission by 3. Taking into account that cycle times in current industrial applications are heavily optimised, this solution does not scale. The fact of having a single packet transmission over two hops obviously means to have the packet three times on the air – using wireless resources (channel occupation) 3 times – depriving other stations to transmit in that time. Besides, in case of repeated failures, the mesh protocol tries to update routes that may cause connection losses due to updates of the routing table. Consequently, the latency would increase with the number of stations, making networks with a high device density infeasible in such use cases.

On the other hand, mesh networks cannot guarantee continuous service, especially when potential hop stations are mobile. Let’s consider a typical Industry 4.0 example: automated guided vehicles (AGVs). AGVs usually work in a large shop floor with lots of obstacles in the way between the AGVs and the central control unit (CCU). In this sense, a mesh network would rely on having enough AGVs that are distributed all over their working area to guarantee the connectivity among them and the CCU. Nevertheless, as the other AGVs are also mobile, they might all end up in an area where they have connectivity among them, but not to the CCU. In this scenario, the AGVs would go to the emergency state and stop working, forcing a human to move to the AGVs’ position and recover them manually. It goes without saying that in this situation would definitely not comply with time-critical requirements.

 

All AGVs and forklifts out of range of the CCU (the human)

 

These two characteristics make mesh networks an unsuitable solution for time-critical automation applications. Therefore, we need to explore other possibilities.

 

Roaming: a known solution for automation applications

Taking the definition of roaming as moving aimlessly, we could derive that a technology that has roaming capabilities allows its users/devices to move around without losing connectivity in the meantime.

As consumers, we commonly refer to roaming as the possibility to get mobile phone service when we travel abroad the borders of our home country service provider. Nevertheless, the technical concept of roaming, or handover, is also used in a much smaller range. To continue with the mobile phone example, cellular networks use roaming every time that a user moves from one base station to the other. In this way, we know that each base station provides service to a specific area and, whenever a user leaves that area, will need to connect to a different base station not to lose connectivity. Naturally, a consumer does not want to interrupt the connectivity, especially if a phone call is taking place at the same time as the handover. Therefore, the handover needs to take place within a certain time lapse making it imperceptible to the consumer (typically around 50 ms with outliers). Consequently, the base stations need to be smartly allocated to allow users to have continuous connectivity.

Nowadays, cellular technologies are not yet used in production processes in factories. Although 5G with its Ultra-Reliable Low-Latency Communications (URLLC) is working on solutions for precisely this purpose, it is not yet implemented, and it looks as if we still need to wait some time for that to happen. Luckily, the roaming technique can also be applied in non-cellular technologies such as WLANs. In this sense, the wireless technology to use this technique needs to provide two characteristics already mentioned: continuous service (full area coverage) and low-latency handover.

 

Backbone connection: Continuous Service

To allow for continuous service, WLANs offer a network structure that follows somewhat the concept of a cellular network: networks with backbone connection. Reusing the example of the AGVs, a backbone connection is nothing, but a non-mobile station directly connected–usually via cable–to the CCU. When smartly allocated, these backbone connections cover the whole area of the shop-floor allowing for continuous service to all the mobile stations. Hence, no AGV would go to the emergency state because of a connectivity loss to the CCU.

The principle of the backbone is very similar to the one of cellular networks: whenever an AGV that is connected to a backbone station finds an access point to a backbone that has better connectivity, the AGV would roam to the backbone for better performance. But this roaming has to be done in a low-latency manner.

 

Two EchoRings with backbone connectivity

 

Low-latency roaming

Once the continuous service is guaranteed, the network needs to make sure that the handover happens within the required application deadline. In other words, the handover needs to happen transparently to the application.

At this point, we should pause again for a moment and talk about “who” decides, whether the handover needs to take place. In this regard, there are mostly two different approaches: application-driven decision and network driven decision.

The application-driven decision allows for better control of the overall network management, as the application decides when to which network, and whether to switch at all. However, the latency introduced during the process makes it unsuitable for time-critical automation purposes. As we can see in the following figure, whenever the application of a device sends a request to switch the network, the information travels all the way from the Application Layer (Layer 7 of the OSI Model) to the Physical Layer of the mobile device (Layer 1 of the OSI Model). The delay added by this journey includes processing and protocols translation times (each layer works with a different protocol).

The network-driven decision implies that all stations decide entirely on their own when and to which network to do the switch. To make this decision correctly, the network needs a continuous feed of environment data, which requires time and frequency resources. However, the upsides of this approach are substantial: it is totally transparent to the application and thereby allowing it to focus on other tasks. As the communication takes place between the MAC layer of the moving station to the MAC layer of the networks’ access points, the protocol translation time is negligible and no processing time at the application takes place.

 

Application-Driven vs Network-Driven Decisions

 

All in all, both approaches can be valid, and the decision strongly depends on the use case. What is crucial, however, is to guarantee that the handover takes place within the deadline of the application to avoid, for instance, that a working device goes to the emergency mode.

 

EchoRing: roaming in real-time

Remember that EchoRing was specially designed to fulfil the needs of automation processes. In this sense, EchoRing proposes to use a backbone topology to solve the continuous service issue mentioned above. However, EchoRing easily adapts to both ad-hoc and classical access network topologies. To use a backbone topology, there is some network planning needed. For instance, it is essential to identify the areas that need better connectivity to allocate the EchoRing stations that are connected to the backbone there – they are called anchor stations. Once the number of networks and their position are decided, it is also needed to smartly allocate the channels to each “ring”.

Regarding the handover alternatives, EchoRing supports both approaches. Also, it supports them fulfilling the deadlines of time-critical automation processes. This flexibility allows a vast variety of use cases, regardless of their preferred approach.

Other mechanisms need to involve higher layers or trigger programs scanning the environment in the host, incurring delays in the decision, and hence, potential transmission errors in the handover phase. With this in mind, we conclude that EchoRing provides significant advantages for automation applications that require roaming capabilities.