This article looks at how to enable multi-region support in a Python application.
Say you just finished development of your Python-based web application using Django and pushed it to production. After some time, a few of your users report that the application is slow. You check the logs and server metrics for resource usage and what not -- everything looks good. There are no spikes in CPU, memory, or disk usage. You spin up your local development environment locally and attempt to reproduce the issue, but it works fine. What's going on?
You forgot something critical yet easy to overlook: The laws that govern our universe. Information can only travel so fast from one place to another.
As developers, our primary responsibility is to ensure that information systems function correctly. These systems involve the movement of data through time and space, and it is our job to manage and coordinate this process. However, just like any other system, there are fundamental laws that govern how data is transferred, which can make the process less instantaneous. This is where potential problems may arise, and it is our role to address and solve them.
For example, when you opened this page in your browser, the information was conveyed over time and space -- and it took some time to load.
Keep in mind that our local development environment didn't help us in this situation because it lives on our local machine, so point A and point B are very close to each other since the distance between them is very small. So the time it takes to transmit the information is very small. But in production, our application lives on a server that is located somewhere in the world and the distance between our browser and the server is much larger. So the time it takes to transmit the information is much larger.
Which tools can help us to measure the time it takes to transmit information from the server to the browser? And how we can improve our application performance using this tool? This is what we're going to discuss in the next section.
This article assumes that you have ruled out any issues with long-winded processes, inefficient database queries, and other potential causes of slow performance. In other words, you've determined that the lag experienced by your users could be latency caused by the distance that packets have to travel from the browser to the server and back. This can be caused by a number of factors, including physical distance, network congestion, and limitations in the infrastructure used to transmit the data.
Before jumping into a solution, you should measure the current transmission speed -- i.e., request + response time -- and set a benchmark that you can use to measure against:
So, we need a tool that can give us the total time that it takes to send a request to the server and then receive the response back in the browser. This is the concept of ping. A ping measures the round-trip time for messages sent from the originating host to a destination computer that are echoed back to the source.
Fortunately, for each of the most popular Python web frameworks -- Django, Flask, and FastAPI -- it's very easy to set up a view or route handler to address this. Let's look at some examples...
If you'd like to see a full Django example in action, check out the web-diagnostic repo. Follow the instructions in the readme to clone it down and set up the project.
With the Django development server running, visit http://localhost:8000 in your browser of choice. Then, after clicking Diagnostic and the Ping button, you should see the ping time in the browser:
Take note of the time in milliseconds (ms). For me, it's between 5 to 11 ms. The transmission time is really small because the client and server are very close to each other.
Now visit https://web-diagnostic.fly.dev and open Diagnostic and click on the Ping button. This server is located in the Miami region, so depending on your location, the ping time should be much higher.
If you have a VPN, you can see what the response time will look like for users across the world. Here's what it looks like from an IP in Spain:
As you can see, the response time got even higher, because the server is located in the US while the user is in Spain. So it took more time for the information to travel from the server to the user.
While there's multiple ways to enable multi-region support, we'll focus on the top cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. These providers offer a range of services that can help us implement a multi-region architecture for our web applications.
In general, the architecture for deploying a multi-region application involves setting up multiple instances of the application in different regions around the world. These instances can then be connected to a load balancer, which distributes incoming traffic to the nearest instance based on the user's location.
Here are some services that these providers offer to help implement this architecture:
AWS provides a range of services that can help you manage your infrastructure in multiple regions. While you can spin up and manage your own multi-region EC2 cluster and load balancer using the above services, you also have the option to use services like Elastic Container Service (ECS) and Elastic Beanstalk to manage the infrastructure for you. These services provide a convenient and fully-managed solution for deploying and scaling applications in different regions.
When we deploy our application to the cloud, sometimes we ignore the region option and select the default one. By now, you should realize how important it is to put your application as close to your users to optimize performance.
Many applications have users all over the world. As developers, our responsibility is to ask, "Where is my next user likely to be?". What's more, we should keep in mind that the internet is growing fast and we should find the right architecture to support the demand for fast global connections to deliver information in a way that is pleasant to our users.