today is Jan 30, 2023

This article looks at how to enable multi-region support in a Python application.

Problem

Say you just finished development of your Python-based web application using Django and pushed it to production. After some time, a few of your users report that the application is slow. You check the logs and server metrics for resource usage and what not -- everything looks good. There are no spikes in CPU, memory, or disk usage. You spin up your local development environment locally and attempt to reproduce the issue, but it works fine. What's going on?

You forgot something critical yet easy to overlook: The laws that govern our universe. Information can only travel so fast from one place to another.

As developers, our primary responsibility is to ensure that information systems function correctly. These systems involve the movement of data through time and space, and it is our job to manage and coordinate this process. However, just like any other system, there are fundamental laws that govern how data is transferred, which can make the process less instantaneous. This is where potential problems may arise, and it is our role to address and solve them.

For example, when you opened this page in your browser, the information was conveyed over time and space -- and it took some time to load.

Keep in mind that our local development environment didn't help us in this situation because it lives on our local machine, so point A and point B are very close to each other since the distance between them is very small. So the time it takes to transmit the information is very small. But in production, our application lives on a server that is located somewhere in the world and the distance between our browser and the server is much larger. So the time it takes to transmit the information is much larger.

Which tools can help us to measure the time it takes to transmit information from the server to the browser? And how we can improve our application performance using this tool? This is what we're going to discuss in the next section.

This article assumes that you have ruled out any issues with long-winded processes, inefficient database queries, and other potential causes of slow performance. In other words, you've determined that the lag experienced by your users could be latency caused by the distance that packets have to travel from the browser to the server and back. This can be caused by a number of factors, including physical distance, network congestion, and limitations in the infrastructure used to transmit the data.

Benchmark

Before jumping into a solution, you should measure the current transmission speed -- i.e., request + response time -- and set a benchmark that you can use to measure against:

Transmission Speed

So, we need a tool that can give us the total time that it takes to send a request to the server and then receive the response back in the browser. This is the concept of ping. A ping measures the round-trip time for messages sent from the originating host to a destination computer that are echoed back to the source.

Fortunately, for each of the most popular Python web frameworks -- Django, Flask, and FastAPI -- it's very easy to set up a view or route handler to address this. Let's look at some examples...

Django

from django.http import JsonResponse def ping ( request ): data = { 'message' : request . GET . get ( 'ECHOMSG' , '' ) } return JsonResponse ( data )

Flask

from flask import Flask , request , jsonify @app . route ( '/ping' ) def ping (): data = { 'message' : request . args . get ( 'ECHOMSG' , '' ) } return jsonify ( data )

FastAPI

from fastapi import Request from fastapi.responses import JSONResponse @app . get ( '/ping' ) def ping ( request : Request ): data = { 'message' : request . query_params . get ( 'ECHOMSG' , '' ) } return JSONResponse ( content = data )

Client Example

You'll also need to set up a mechanism to calculate the request and response times. If your client is the browser, then you can set up a simple JavaScript function like so:

function ping () { const startTime = new Date (). getTime (); // start the timer const xhr = new XMLHttpRequest (); // create a new request xhr . open ( "GET" , "/ping" , true ); // call the ping endpoint xhr . onreadystatechange = function () { if ( xhr . readyState == 4 ) { const endTime = new Date (). getTime (); // stop the timer const time = endTime - startTime ; // calculate the time it took to get the response console . log ( time ); } } xhr . send (); }

Full Example

If you'd like to see a full Django example in action, check out the web-diagnostic repo. Follow the instructions in the readme to clone it down and set up the project.

With the Django development server running, visit http://localhost:8000 in your browser of choice. Then, after clicking Diagnostic and the Ping button, you should see the ping time in the browser:

Local Ping

Take note of the time in milliseconds (ms). For me, it's between 5 to 11 ms. The transmission time is really small because the client and server are very close to each other.

Now visit https://web-diagnostic.fly.dev and open Diagnostic and click on the Ping button. This server is located in the Miami region, so depending on your location, the ping time should be much higher.

For example:

Time : 71 ms Time : 76 ms Time : 65 ms Time : 67 ms Time : 64 ms Time : 75 ms Time : 71 ms Time : 70 ms Time : 72 ms Time : 75 ms

If you have a VPN, you can see what the response time will look like for users across the world. Here's what it looks like from an IP in Spain:

Time : 324 ms Time : 320 ms Time : 320 ms Time : 330 ms Time : 324 ms Time : 319 ms Time : 326 ms Time : 321 ms Time : 320 ms Time : 324 ms

As you can see, the response time got even higher, because the server is located in the US while the user is in Spain. So it took more time for the information to travel from the server to the user.

Multi-Region Support

While there's multiple ways to enable multi-region support, we'll focus on the top cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. These providers offer a range of services that can help us implement a multi-region architecture for our web applications.

In general, the architecture for deploying a multi-region application involves setting up multiple instances of the application in different regions around the world. These instances can then be connected to a load balancer, which distributes incoming traffic to the nearest instance based on the user's location.

Multi Region Architecture

Here are some services that these providers offer to help implement this architecture:

Amazon Web Services

  • Amazon EC2: Allows us to launch and manage virtual servers in multiple regions around the world.
  • Amazon Elastic Load Balancer (ELB): Helps distribute incoming traffic across multiple instances of the application in different regions.
  • Amazon Route 53: A Domain Name System (DNS) service that helps route users to the nearest instance based on their location.

AWS provides a range of services that can help you manage your infrastructure in multiple regions. While you can spin up and manage your own multi-region EC2 cluster and load balancer using the above services, you also have the option to use services like Elastic Container Service (ECS) and Elastic Beanstalk to manage the infrastructure for you. These services provide a convenient and fully-managed solution for deploying and scaling applications in different regions.

Google Cloud Platform

  • Google Compute Engine (GCE): Provides virtual machines in multiple regions around the world. If your application is Dockerized, instead of GCE, you can use Google Kubernetes Engine (GKE) which allows you to run and manage Kubernetes clusters in multiple regions around the world.
  • Google Load Balancer: Distributes incoming traffic across multiple instances of the application in different regions.
  • Google Cloud DNS: A DNS service that helps route users to the nearest instance based on their location.

Microsoft Azure

  • Azure Virtual Machines: Allows us to launch and manage virtual servers in multiple regions around the world. If your application is Dockerized instead of Virtual Machines, you can use Container Instances which allows you to quickly deploy and manage Docker containers in multiple regions around the world.
  • Azure Load Balancer: Helps distribute incoming traffic across multiple instances of the application in different regions.
  • Azure Traffic Manager: A DNS-based traffic management service that routes users to the nearest instance based on their location.

Conclusion

When we deploy our application to the cloud, sometimes we ignore the region option and select the default one. By now, you should realize how important it is to put your application as close to your users to optimize performance.

Many applications have users all over the world. As developers, our responsibility is to ask, "Where is my next user likely to be?". What's more, we should keep in mind that the internet is growing fast and we should find the right architecture to support the demand for fast global connections to deliver information in a way that is pleasant to our users.