Autoscaling is the ability of a system to scale automatically, typically, in terms of computing resources. With an autoscaling system, resources are automatically added when needed and can scale to meet fluctuating user demands. The autoscaling process varies and is configurable to scale based on different metrics, such as memory or process time. Managed cloud services are typically associated with autoscaling functionality as there are more options and implementations available than most on-premise deployments.

Previously, infrastructure and applications were architected to consider peak system usage. This architecture meant that more resources were underutilized and inelastic to changing consumer demand. The inelasticity meant higher costs to the business and lost business from outages due to overdemand.

By leveraging the cloud, virtualizing, and containerizing applications and their dependencies, organizations can build applications that scale according to user demands. They can monitor application demand and automatically scale them, providing an optimal user experience. Take the increase in viewership Netflix experiences every Friday evening. Autoscaling out means dynamically adding more resources: for example, increasing the number of servers allowing for more video streaming and scaling back once consumption has normalized.

Last modified September 29, 2021: fixed internal links (ff51ee5)