Faced with more extreme weather events and dangerous heat episodes concerns about global warming and carbon emissions are on the rise. Corporate and cloud data centers are, unfortunately, part of the problem. Today, global data centers account for as almost as much CO2 emissions as the entire airline industry and consume more electricity than the country of Iran.
What’s most concerning is the rate of growth. Fueled by growing internet traffic, AI and IoT, some worrisome models predict that information and communications technology may grow to 20% of global electricity consumption by 2030 from just 2% today. Getting to carbon-neutral data centers will be essential to combat global climate change.
Waste in the cloud adds to the problem
Gartner projects that spending on cloud IaaS services will grow 27.6% to reach $39.5 billion in 2019, and according to InfoWorld, as much as 35% of cloud spending will be wasted. The ease with which cloud services can be consumed is part of the problem.
Users are frequently tempted to use cloud resources even when capacity exists on-premises. It’s not uncommon for users to over-provision compute instances and storage or start cloud services and forget them.
Toward a carbon-neutral data center
An auto manufacturer might build a more fuel-efficient engine, but if a motorist leaves the vehicle idling all day, these hard-won efficiency gains are undermined — the same holds in corporate and cloud data centers. How data center resources are used and managed is a critical part of the problem.
Reducing data center carbon footprint is a multi-dimensional challenge. Increased use of renewable energy, more efficient designs, more power-efficient components, and better management are all parts of the puzzle. Fortunately, technologies available today ensure that HPC performance and environmental responsibility don’t have to be mutually exclusive.
For instance, some large data center operators are embracing wind and solar aggressively and even relocating cloud data centers to northern climates where they are more efficient to operate and cool. Summit, the worlds fastest supercomputer, is also among “greenest” delivering 14.7 GFlops/watt for a #2 ranking on the global Green 500 list.
Here are five steps we can take to use HPC resources more efficiently
Workload and energy-aware scheduling
One of the best ways to improve efficiency is through more effective workload management. In addition to optimizing performance and resource usage, state-of-the-art workload managers are also energy aware. For example, IBM Spectrum LSF provides workload-driven power management policies. Servers can be powered down when not in use, and powered-up as required.
Users can auto-adjust CPU frequencies at the level of queues applications or jobs – reducing the frequency of unused cores to save power or placing cores in turbo-mode for brief periods when required. The scheduler can also query NVIDIA’s Data Center GPU Manager (DCGM) to maximize the utilization of costly and power-hungry GPUs.
Put the brakes on data center expansion with efficient cloud bursting
For sites that have periodic or “bursty” workloads, cloud bursting can be a good way to increase efficiency and curb power use. During periods of peak demand, workloads can be automatically shifted to the cloud.
While cloud resources are generally more expensive if capacity can be brought online quickly and in a fashion that is transparent to users, HPC managers can boost average server utilization, reduce cost, and improve productivity. The key to gaining efficiency is policy-based automation so that cloud bursting is used only when necessary and when local resources are fully subscribed.
Avoid idle instances in the cloud
Much like idling your vehicle, idling cloud instances is wasteful – Not only does it waste money, it ties up resources and consumes power. Most cloud providers offer monitoring facilities to track resource usage. These tools can be configured to detect idling instances, raise alerts, or shut down idling instances.
Workload managers can do the same, detecting idle hosts and scaling down the number of cloud instances during periods of low demand subject to policy.
Right-size cloud instance selection
One of the more insidious forms of waste are under-utilized cloud instances. HPC users are concerned about performance, and often request instances with more memory or cores than an application can use. The result is wasted resources and higher costs for no discernable performance gain.
These problems are difficult to diagnose with native cloud monitoring tools because they are not application-aware. Using tools such as IBM Spectrum LSF Data Explorer users can monitor resource consumption accurately by application and adjust resource requirements and instance types to maximize efficiency. Features such as application profiles can be used to take decisions about resource requirements out of the hands of end-users, further reducing costs.
Get smart about data movement
In hybrid cloud environments, data locality is a key challenge. Moving or staging data is costly in terms of time, resource usage, and cost. Also, persisting data in the cloud can be expensive – particularly in elastic file systems or block storage.
Smart workload management policies can play an important role here, as well. For example, IBM Spectrum LSF Data Manager can solve data locality problems by intelligently staging data in advance of workload execution and storing results asynchronously in lower-cost storage tiers.
IBM has made power-efficiency a key priority for HPC-class servers and cloud services. Summit is built on IBM Power System AC922 servers which contributes to its high Global Green ranking.
By combining power-efficient servers with sophisticated workload management tools, HPC users can easily take steps to improve efficiency both on-premises and in the cloud, reducing cost as well as the data center’s carbon footprint.