With cloud offerings becoming more abundant and diverse, cloud infrastructure seems to offer a much cheaper and simpler alternative to an on-premises data center. Many organizations, that need Artificial Intelligence to help with decision-making, problem-solving, etc. face a complicated decision: what is the best infrastructure deployment for AI workloads?
Generally speaking, there are three possible deployment options. You can run your AI on-premises in your own datacenter, rent some space at a co-location facility, or use an infrastructure cloud service. For AI workloads that require extremely high-performance compute and high-speed networking for access to huge amounts of data, the benefits of cloud are not so obvious.
The infrastructure-as-a-service motto has been “you can now treat HW as a commodity”. Quickly set up a server. Turn it on only when you need it. And as soon as you are done with it, just delete it. This is seen as a benefit by many data scientists and researchers who want to be able to run their workloads, and not have to worry about managing hardware.
Another challenge is sharing hardware resources among data scientists in the same lab. Cloud services usually offer a relatively simple interface to configure and spin up virtual machines. You do have to remember to shut them down when you are not using them. Otherwise the VMs could run up a brutal bill. But implementing virtualization on-premises requires additional time and expertise.
Cloud infrastructure could be the best option if you are just getting your feet wet with AI workloads. If you are experimenting with deep learning, you want to quickly set up a cloud environment without making a huge up-front hardware investment. Pay-as-you-go has been a selling point of infrastructure cloud services. However, the implied cost efficiency claim is not always true. While you can take advantage of pay-as-you-go with compute services, storage services obviously cannot offer this benefit. And since AI models need to work with colossal amounts of data, storage is going to make up a sizable portion of your cost.
In fact, data location is the deciding factor for many organizations when choosing a deployment model. If you have lots of historical or other data stored on-premises and want to use it to train your AI models, you might want to set up your AI infrastructure nearby. Security concerns also come into play here. Sensitive data protection has always been critical. It is becoming a serious responsibility with governments passing data protection regulations. And in many cases, making sure your data never leaves your data center is still the most reliable way to safeguard it.
When choosing to buy your own hardware, you have the freedom to use any of the most powerful hardware currently available on the market. The challenge, however, is that, because of technological process rapidly moving forward, hardware becomes obsolete in a few years and requires considerable effort to update. Cloud providers promise to rid you of this headache by constantly updating their infrastructure.
An important consideration for any business is cost. Co-location facilities will charge you rent. And when deploying on-premises, you need to account for electric power, fire prevention, and cooling. Cloud services seem attractive because of little up-front investment. And that makes sense for short-term projects. In the long run, however, on-premises deployment can be more cost efficient. This is especially true of high-performance servers and high utilization scenarios. Machine learning with large models solving complex problems need high performance, that can only be provided with GPUs, high speed networks, distributed and performant storage solutions. With all this in mind, when it comes to AI infrastructure deployment, hardware really shouldn’t be treated as commodity.