Intra-cloud latency is real. Here’s what to do about it
We’ve all heard about inter-cloud latency, the delay in sending and receiving information to and from public clouds. Think of a remote client leveraging the crowded open Internet, or applications communicating between cloud brands, such as AWS and Azure.
The assumption is that application-to-application or application-to-data storage communications would pretty much be latency-free within the same cloud brand. After all, they exist on the same cloud platform and there should be a huge pipe between system components internally contained. Intra-cloud solutions on public cloud platforms should not have latency issues.
This is generally true, but not always. I’m running into a lot of instances where intra-cloud latency is hurting cloud-native application performance for larger enterprises. Moreover, those facing the latency issues don’t have the knowledge or tools to deal with these performance hindrances.
After all, inter-cloud latency is easily solved. You simply need to increase the size of the pipe between the two communicating components and/or humans. Public cloud providers would be happy to sell you dedicated circuits to solve inter-cloud performance latency, including moving to physically closer regions so the system components’ packets don’t have to travel a great distance.
Intra-cloud latency is not that simple to diagnose and solve. I’m finding a few core symptoms to look for, including:
- Poor application design leads to “chatty” communications between system components (applications and data). This overwhelms the internal paths and leads to poorer performance.
- Public clouds have internal failings, such as narrow pipes between data centers in specific regions. Here’s a fictitious example: Although there may be no problem with latency between Australia and China, for some reason Australia and New Zealand have a huge latency issue, even though they are geographically closer together.
- The cloud-based application components themselves have problems, such as databases incapable of transmitting larger result sets back to the requesting applications. This is typically due to a limitation in the database itself, such as a poorly tuned cache, and not a network limitation.
How can you avoid intra-cloud latency?
Really, the best protection is to test, test, and test some more. Make sure to use the configuration that you’ll use after deployment. It does you little good to test in the same region when the applications and data will be scattered worldwide in production. Although it’s more expensive, I urge these tests to be placed in the devops processes and toolchains so they are automated and thus constant.
This is a tough one, really—one of those problems that most people don’t think about because they logically assume that it does not exist. I’m a witness to the fact that it does, and it’s one more thing we need to think about as cloud architects.
Copyright © 2020 IDG Communications, Inc.