Request Demo

Mar 7, 2025

VPC networking: A Deep Dive in AWS Resources & Best Practices to Adopt

Article Series

(Expert) A Deep Dive in AWS Resources & Best Practices to Adopt

0 min read

The utopian idea of serverless functions making up all your cloud infrastructure is compelling.

No servers, no network.

However, this ideal is rarely achievable for several reasons. The main reasons are usually security and compliance. You need private communication between your cloud environment and your on-premises environment, and for compliance reasons your databases can't be publicly accessible - even if they are protected by strong authentication.

For these and other reasons, cloud networking is often a large part of your cloud infrastructure.

Cloud networking, or simply networking, is a large topic. In this blog post we will learn about the main networking components on Amazon Web Services (AWS), how you can manage these using HashiCorp Terraform, and some best practices around cloud networking on AWS.

For a broader look at AWS best practices, including IAM and DNS, check out our previous article A Deep Dive in AWS Resources: Best Practices to Adopt Identity and Access Management (IAM). DNS: A Deep Dive in AWS Resources & Best Practices to Adopt.

Networking on AWS

The main component of network infrastructure on AWS is the Virtual Private Cloud (VPC). You can think of a VPC as the equivalent of the network you have in your home or office.

A VPC can be subdivided into smaller networks called subnets. Traffic between subnets, and between the VPC and the internet and other VPCs is allowed or denied through the use of security groups. A security group has one or many security group rules that allow or deny traffic of a given type on a given port.

In larger cloud infrastructures on AWS it is common to connect multiple VPCs together in different network architectures. One example is the hub-and-spoke network architecture where you have a single VPC (the hub) at the center that one or many other VPCs (the spokes) are connected to. Each VPC, the hub and each spoke, is managed independently and often by different teams.

There are a number of options for how to connect multiple VPCs together, or sometimes just connecting small parts of VPCs together. Among these options are:

VPC peering: you connect two VPCs together by establishing a peering connection. This is a good solution for some VPCs.
AWS Transit Gateway: you connect VPCs and other networking solutions to a Transit Gateway resource that allows the connected pieces to communicate with each other. This solution is ideal for a hub-and-spoke architecture.
AWS PrivateLink: you can expose a service in your VPC that other VPCs can connect to using VPC endpoints. This is ideal if you do not want to expose and connect your full VPC to other VPCs, but you want to allow others to access a given service you run in your VPC.
AWS Cloud WAN: this is a fully managed service to connect and manage all your cloud infrastructure components on AWS.

A VPC with many subnets is a good start for a cloud networking architecture. However, there are more details of a VPC we should be aware of.

Routing within the VPC is configured using route tables. A route table is simply a table with specified rules of the type "if you want to reach IP address range X, then the target is Y". To make that example more concrete, consider this example: "if you want to reach 10.0.0.0/16, then the target is local because that is the address range of this VPC".

If you want the applications and resources located in the VPC to be able to reach the internet, you need to attach an internet gateway to the VPC. A subnet that has a direct route to an internet gateway (see the discussion of routes above) is called a public subnet. On the contrary, if there is no direct route to an internet gateway then the subnet is called a private subnet.

It is common that resources that you run in your private subnets still need to be able to reach the internet. There are a few ways to provide internet access to these resources. The most common way is to provision Network Address Translation (NAT) gateways in your public subnets. Then you add a route in your private subnets that says, "if you want to go to the internet at 0.0.0.0/0, first go to the NAT gateway in the public subnet". The traffic will then be directed from the NAT gateway to the internet via the internet gateway.

Using the components we have discussed so far (VPCs, subnets, security groups, route tables, internet gateways, NAT gateways) together with the options for how to connect VPCs together, we can set up complex cloud network architectures covering most needs.

There are many more components that fall under the networking category on AWS. Most of these are outside the scope of this blog post to avoid writing the longest blog post on the internet. A few of these are worth mentioning just for you to be aware of them:

AWS CloudFront: this is a Content Delivery Network (CDN) service. With a CDN you can distribute content (often static files; HTML-pages, images, videos, etc) to edge locations around the world. An edge location is a smaller data center where the content is cached locally to allow faster delivery to consumers close to the edge locations.
Load balancers: to expose our applications to users on the internet it is common to front them with a load balancer. The load balancer will distribute traffic among different instances of your applications. The two most common load balancing services on AWS are Application Load Balancer (ALB) and Network Load Balancer (NLB). The main difference is that the ALB works at layer 7 of the OSI network model, and the NLB at layer 4.
AWS Web Application Firewall (WAF): a service to protect your applications from common web security threats. A WAF can be used to protect other AWS resources, among these are API gateways and ALBs.
AWS Firewall: a firewall service used to filter and inspect traffic in your VPC environments, and traffic coming in to and leaving your cloud environment. An AWS Firewall is often used in a hub-and-spoke networking architecture to inspect traffic flowing between different spokes.

Managing AWS networking resources with Terraform

The AWS provider for Terraform offers full support for automating networking setup on AWS. The basic components of a VPC network on AWS is shown in the following figure.

This is a common VPC design with public and private subnets in each availability zone. This figure shows three availability zones, but some regions could have more than this. Each subnet has a route table, but the public subnets will share the same route table. The private subnets will each have their own route tables (we will learn the reason for this later on). A NAT gateway is provisioned in each public subnet. Note that with Terraform we will create a public and private subnet in each availability zone of the AWS region we are working with.

Creating the network architecture shown in the above figure using Terraform requires many resources – more resources than what is visible in the figure, in fact. In this section we will go through how to configure these resources.

We start by defining two variables that we will need. The first variable is for the AWS region we want to use and the other variable is for the VPC Classless Inter-Domain Routing (CIDR) block. A CIDR block is a block of IP addresses that the VPC consists of.

The variables are defined like this:

variable "aws_region" {
  type = string
}

variable "vpc_cidr_block" {
  type = string
}

The main resource that all other resources depends on is the VPC resource:

resource "aws_vpc" "default" {
  cidr_block = var.vpc_cidr_block
  tags = {
    Name = "vpc-${var.aws_region}"
  }
}

As is common with other resources on AWS, we can set a "Name" tag to give the resource a friendly name that is displayed in the AWS console. We will do this for most resources we create in this walkthrough.

Next we want to create the subnets. To get a list of all the availability zones of the AWS region we have selected we use the aws_availability_zones data source:

data "aws_availability_zones" "available" {
  state = "available"
}

Using the output from this data source we can create all the subnets, one public and one private for each availability zone. First we must transform the data coming from this data source a bit:

locals {
  azs    = data.aws_availability_zones.available.names
  az_map = zipmap(local.azs, range(length(local.azs)))
}

The main point here is that each availability zone will be assigned an index number (e.g. eu-west-1a = 0, eu-west-1b=1, ect). With the data in this form we create the subnets:

resource "aws_subnet" "public" {
  for_each = local.az_map
  vpc_id            = aws_vpc.default.id
  cidr_block        = cidrsubnet(var.vpc_cidr_block, 8, each.value * 2)
  availability_zone = each.key
  tags = {
    Name = "public-subnet-${each.key}"
  }
}

resource "aws_subnet" "private" {
  for_each = local.az_map
  vpc_id            = aws_vpc.default.id
  cidr_block        = cidrsubnet(var.vpc_cidr_block, 8, each.value * 2 + 1)
  availability_zone = each.key
  tags = {
    Name = "private-subnet-${each.key}"
  }
}

Each subnet requires a small slice of the VPC CIDR block. To compute this slice we have used the cidrsubnet function.

Before we can set up route tables to connect to these new subnets we must set up the internet gateway and the NAT gateways. The internet gateway is simple:

resource "aws_internet_gateway" "default" {
  vpc_id = aws_vpc.default.id
  tags = {
    Name = "igw-${var.aws_region}"
  }
}

We only need to specify what VPC it should be part of, and give it a friendly name tag. Note that we only need a single internet gateway since it is attached to the VPC itself, not to individual subnets.

The NAT gateways are a bit more complex. Each NAT gateway requires an Elastic IP (EIP) resource. Since we are creating an EIP and a NAT gateway for each availability zone (technically for each public subnet) we need to use for_each again:

resource "aws_eip" "nat" {
  for_each = local.az_map
  domain   = "vpc"
  tags = {
    Name = "eip-natgw-${each.key}"
  }
}

resource "aws_nat_gateway" "default" {
  for_each = local.az_map
  allocation_id = aws_eip.nat[each.key].id
  subnet_id     = aws_subnet.public[each.key].id
  tags = {
    Name = "natgw-${var.aws_region}"
  }
  depends_on = [
    aws_internet_gateway.default,
  ]
}

For the NAT gateways we have configured that they depend on the internet gateway resource. This is a requirement inherent in the VPC and we must make it explicit in our Terraform configuration.

We are now ready to set up the last pieces, and these are the route tables. We start with the public route table. This route table should only route traffic inside of the VPC and out to the internet:

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.default.id
  route {
    cidr_block       = var.vpc_cidr_block
    local_gateway_id = "local"
  }
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.default.id
  }
  tags = {
    Name = "rt-public-${var.aws_region}"
  }
}

The first route, directing VPC traffic inside the VPC, could be left out. AWS will add it automatically. However, for completeness it makes sense to include it in your Terraform configuration.

The route table must be associated with the public subnets. This is done through the use of the aws_route_table_association resource type:

resource "aws_route_table_association" "public" {
  for_each       = local.az_map
  route_table_id = aws_route_table.public.id
  subnet_id      = aws_subnet.public[each.key].id
}

The private subnet route table is a bit different. We want traffic destined to the internet to be routed to the NAT gateway in the same availability zone. This is to avoid any cross-availability zone data transfer charges.

To achieve this we need three different route tables, so we use for_each again:

resource "aws_route_table" "private" {
  for_each = local.az_map
  vpc_id = aws_vpc.default.id
  route {
    cidr_block       = var.vpc_cidr_block
    local_gateway_id = "local"
  }
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_nat_gateway.default[each.key].id
  }
  tags = {
    Name = "rt-private-${var.aws_region}"
  }
}

resource "aws_route_table_association" "private" {
  for_each       = local.az_map
  route_table_id = aws_route_table.private[each.key].id
  subnet_id      = aws_subnet.private[each.key].id
}

As is clear from the above walkthrough there are a lot of resources required to create a functioning VPC. However, creating a Terraform module of the above code will simplify future VPC provisioning.

As mentioned in the overview of networking on AWS, there are a lot more networking resources we can create using Terraform. What we have seen so far only scratches the surface of what is available.

Best practices for networking on AWS

There are a number of best practices to keep in mind for networking on AWS. In this section we will cover a few of the most important things to consider when working with networking resources on AWS.

Isolate workloads using VPCs and subnets

Some resources on AWS can function without a VPC resource. A great example of this is a Lambda function.

For security reasons it could be a good idea to run most or all of your workloads inside a VPC and not directly accessible over the public internet. Use VPCs and subnets to isolate workloads based on your needs.

Plan your VPC design

The design of your VPCs and how they are connected is one of the most important decisions you make, and one of the most difficult to change afterwards.

You should choose a VPC design that fits your intended use case. A popular design pattern is the hub-and-spoke model. A central VPC (the hub) contains shared components. Applications and services are deployed to their own VPC (a spoke). Traffic between spokes are controlled and configured in the hub.

A common and costly mistake is to use a too small VPC CIDR block address space for your network. However, another common mistake is to use a very large (commonly 10.0.0.0/16) VPC CIDR, and then ending up with problems connecting it together with a different VPC through peering or another means because you would end up with overlapping CIDR blocks.

If you only have time to sit down and plan one thing thoroughly, let it be the design of your network architecture. Note that it makes sense to give this a thought even if you do not think you will connect your VPC with a different VPC anytime soon. One sensible approach is to use a small VPC CIDR block (but not too small) suitable for your application. This will simplify any potential connections to other VPCs in the future.

Use security groups and NACLs

A security group is a basic stateful firewall that allows or denies incoming and outgoing traffic for individual resources.

A Network Access Control List (NACL) is a stateless mechanism to allow or deny incoming and outgoing traffic from a subnet. Stateless means that you need to allow the incoming and outgoing traffic, and the response to this traffic, explicitly.

Using security groups and NACLs allow you to set up sensible rules for what traffic is allowed to flow between your different network segments. However, at larger scales NACL and security group management can start to be a burden to get right. There will be hidden dependencies between the applications you run (that expose and use ports) and the infrastructure pieces that allow or deny traffic to and from these applications.

To get better insights into how changes to your security group rules and NACLs will affect your infrastructure, use a tool like Anyshift.

Connect your AWS environment to your on-premises environment

If you run a hybrid cloud solution where you have important resources on-premises that need to communicate with your cloud resources, then it makes sense to set up a connection between the two. There are a few options for how to do this (site-to-site VPN, Direct Connect, transit gateway, etc). Evaluate what method works best in your context and set it up.

Remember that your network architecture is only as strong as the weakest link, so don't let that weakest link be an insecure connection between AWS and your on-premises environment.

Use private-link and VPC endpoints

To securely access AWS services inside your VPC, use VPC endpoints. This allows you to avoid crossing the public internet when you want to communicate with AWS services like CloudWatch or S3. This will likely save you some money on data transfer costs, but it will also make your environment more secure.

Use VPC flow logs to monitor network traffic

VPC flow logs allow you to get insights into all the network traffic taking place inside of your VPCs. You can use this data to troubleshoot connection issues or to discover traffic that should not be allowed. Treat monitoring of your network architecture as important, or even more important, than monitoring for your user-facing applications.

Automate networking with infrastructure as code

As with your other cloud infrastructure you should automate the provisioning of your network architectures and resources using infrastructure as code. Terraform has great support for all AWS resources related to networking.

Infrastructure as code comes with a wealth of benefits that are equally important for your network architecture as for the rest of your cloud infrastructure. Most importantly is to get documentation of the state of your current cloud network.

Protect publicly accessible applications

Inevitably you will need to expose applications to the public internet. Use the available security measures to safely expose user-facing applications.

Use AWS WAF to protect your applications from web-based vulnerabilities like SQL injection and cross-site scripting. You can also set rate-limit rules to protect against DDoS attacks.

Use load balancers (ALB or NLB) to distribute load to different instances of your applications. You can combine the ALB with AWS WAF for strengthened protection. The ALB is a layer 7 load balancer and can take decisions based on the content of the incoming request. The NLB is a layer 4 load balancer. It is more performant but lacks some additional security features of the ALB (e.g. integration with AWS WAF).

There is also an option to use an AWS API gateway to protect one or many user-facing applications. You can add authentication and authorization functions for apps protected by the API gateway.

Protect internal and external traffic using AWS Network Firewall

AWS Network Firewall is a relatively expensive service, but has a lot of features for protecting your workloads both internally within your VPC and from external traffic.

A common use-case is to place the AWS Network Firewall in a hub of a hub-and-spoke architecture, and use it to inspect internal traffic between your spokes. This allows you to set up rules for what traffic should be allowed or denied.

Even if you use an AWS Network Firewall, remember that all the other protection mechanisms are still important. That is, your subnets should still be protected by NACLs and your compute services should be protected by security groups.

Terraform and Anyshift for AWS VPC networking

Network architectures that span multiple VPCs, AWS accounts and AWS regions naturally will be split into multiple Terraform configurations and state files. In this situation it is imperative that you have a workflow for how you introduce changes into your environment.

In a sense, Anyshift creates a digital twin of your AWS environment and together with your Terraform state files and Terraform configurations has a complete AWS knowledge graph to base insights on.

In the context of AWS VPC networking this means Anyshift can know about any hidden dependencies that exist between Terraform configurations such as CIDR blocks for subnets defined in one Terraform configuration that is then used in a security group rule in a different Terraform configuration.

Terraform drift detection is available on HCP Terraform and Terraform Enterprise, but drift detection is an after-the-fact notification. This is why you need an SRE AI-copilot like Anyshift that can inform you before-the-fact that a change in your Terraform configuration could lead to potential issues in a different Terraform configuration.

Visit the documentation for more on Anyshift.

Key points

Your organization's cloud network infrastructure is often large and complex. It is the foundation of the rest of your cloud infrastructure.

In this blog post we learned about the fundamental pieces of networking on AWS: how to design and build a VPC using Terraform. VPCs can be connected together to segment your network architecture and still allow different parts to communicate. If you intend on working with AWS it is a good idea to understand the basic pieces that go into a VPC and how they are connected. This knowledge will help you troubleshoot network issues.

We also learned about best practices for networking on AWS with focus on networking security around VPC and your resources inside of it. Among these best practices is using security groups and NACLs to allow and deny traffic from flowing between segments of your network, and using AWS WAF, load balancers and AWS Network Firewall to protect publicly exposed resources and internal resources.

Articles by

Mattias Fjellström

Accelerate at Iver Sverige

Cloud Architect | Author | HashiCorp Ambassador | HashiCorp User Group Leader

Mattias is a cloud architect consultant working to help customers improve their cloud environments. He has extensive experience with both the AWS and Microsoft Azure platforms and holds professional-level certifications in both.

He is also a HashiCorp Ambassador and an author of a book covering the Terraform Authoring and Operations Professional certification.

Blog: https://mattias.engineer
Linkedin: https://www.linkedin.com/in/mattiasfjellstrom/
Bluesky: https://bsky.app/profile/mattias.engineer

See my articles

Find me on Linkedin