• cloud

Using Cloud with Oracle Commerce: Pluses and Minuses

With the rise of “Cloud Computing” over the last several years, more and more companies are looking to leverage virtual machines or cloud services in their IT infrastructure. Aside from the hype, cloud services and VMs can provide some real-world benefits that are hard to ignore. Lately, companies have started asking if VMs or “The Cloud” can be leveraged as part of their ATG/Oracle Commerce infrastructure.

“The Cloud” or Cloud Computing is a broad term that covers a wide range of technologies and models. Wikipedia defines Cloud Computing as:

Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet).
That’s pretty vague, and covers everything from normal websites to specialized models of service delivery. When it comes to hosting an ATG/Oracle Commerce application we’re really talking about Infrastructure as a Service (IaaS), that is: providing Virtual Machines, storage, load balancers, and networking.

Throughout this document I will be using the terms Cloud, VMs, The Cloud, Cloud Infrastructure, etc… to refer to IaaS and VMs delivered as part of an IaaS service.

Types of The Cloud

Most cloud providers use a shared cloud. This means that multiple clients’ VMs and resources are provided from shared physical cloud infrastructure(s).

A private cloud is a cloud infrastructure dedicated to a single client or organization. All the hardware and VMs are built and managed for the single client. Some hosting companies will provide private cloud options, but many private clouds are built and managed internally within the company using them.

Benefits of The Cloud

The benefits of a cloud based infrastructure usually fall into two basic categories: Stability and Scalability.

Stability

Large cloud service providers, such as Amazon, RackSpace, SoftLayer, Linode, and SliceHost use a large physical infrastructure and an intelligent hypervisor layer which protects reasonably well against hardware failure on a real world server. RAM, CPUs, Power Supplies, NICs, and Motherboards will all fail eventually, given enough time and enough servers. A well-built cloud infrastructure can provide protection against those types of hardware failure. However, you can also introduce other risks (shared service overload, hypervisor failure/bugs), so it’s not a silver bullet.

Scalability

The key with cloud infrastructures is Elastic Scalability, which is the ability to quickly add additional resources to your cluster by making new VMs (or depending on your situation, larger VMs) available within 15 minutes or less. On-demand elastic scalability can help you weather a huge surge in traffic, or just holiday season visitor increases, without needing to deploy massive infrastructure you won’t need 365 days a year.

Drawbacks of The Cloud

Cloud computing and VM based infrastructures have several drawbacks which are important to consider. While the relative importance of each of these is dependent on your application and requirements, it’s critical to keep the following points in mind.

Performance Overhead

Because VMs rely on a hypervisor and VM management software you will see a negative performance impact when compared to running on the same hardware as dedicated servers. The exact performance impact depends on the VM software, configuration, and other factors, however, here are some basic guidelines:

CPU Impact: Expect to lose at least 5-15% of your CPU performance (some cases can be much worse, with 41% slowdowns or more (JVM server performance seems to be in the 10-15% range in most cases))
Network: Up to 95% slower than native (VMWare is in the 0-5% penalty range, while Xen may be as bad as 95+%)
Memory: 0-10% slower than native
Disk: 30-60% slower than native

In my personal experience, Disk and Network are the two worst areas. CPU impact is also measurable and shouldn’t be ignored, especially for ATG/Oracle Commerce applications (more on that later), but Disk and Network slowness can easily bring a “lightweight” application to its knees. The more “shared” the Cloud infrastructure is, the worse these areas are likely to be.

Security and PCI Compliance

With Shared Cloud infrastructures there can be security and PCI compliance concerns. There have been many hypervisor bugs which allow one client of the Cloud to access another client’s VMs, compromising security. Many Shared Cloud offerings are not PCI Level 1 certified. A few of the larger players have PCI compliant offerings now, but they are typically separate from their main Cloud service, have higher costs, and more limitations. This is less of an issue with Private Clouds, although they have their own drawbacks.

Resource Over-Utilization

With Cloud infrastructure, especially Shared Clouds, it’s possible to have the infrastructure over-provisioned. While good Cloud providers will probably avoid this, it can be hard to know for sure. You know how airlines overbook flights, assuming that some people won’t show? Some Cloud providers will oversell their hardware, selling more VM resources than the hardware can even provide at full capacity. This is part of the magic of the Cloud, both good and bad.

By selling 20 VMs on a 16 core box, you can make the price affordable, and most of the time, most of the VMs won’t be using anywhere close to 100% of their CPU/Network/RAM/Disk, etc… However, you don’t want to need all your CPU resources, or all of your network resources, and not have them available to you…. Again, this is an area where network and disk are often the most oversold and under-delivered.

Research your Cloud provider to ensure they are under-provisioning, not over-provisioning.

Cost

In order to provide scaling, most Cloud infrastructures should be under-provisioned. This, combined with the performance overhead inherent in VMs, means that you will always be paying more for a given amount of performance, than if you were buying dedicated hardware. The only way you save money is if you’re using less than a server’s worth of resources.

Licensing

Some products, including ATG/Oracle Commerce and many other Oracle products are licensed based on cores/processors or sockets. While Oracle have introduced new requests-based licensing model, the majority of ATG/Oracle Commerce customers are still on old cores-based one. As such, there are typically significant restraints in the licensing terms around virtualization, VMs, and Cloud infrastructures.

For instance, for ATG/Oracle Commerce, unless you’re using Oracle VM, you have to be licensed for (or buy licensing for) all of the physical cores/processors that provide your VMs. If you use Oracle VM you only need to be licensed for your provisioned cores/processors. However, Oracle VM is rarely used by Cloud providers, and usually isn’t the first choice of IT departments building private clouds.

Where Cloud Hosting Shines

So after all the detail about Cons, after a very lightweight look at the Pros, you might think I’m against Cloud hosting. I’m not. The benefits are well known and easy to understand, so I’ve spent more time detailing out some of the negative aspects, especially ones that are relevant for ATG Oracle Commerce hosting.

Cloud Hosting is absolutely fantastic for two high level use cases:

1. When you need fewer resources than a full dedicated server.

In the this case, you don’t need to have a full quad core server with 16 GB of RAM and 2 Gbit NICs for your application, much less multiple boxes for redundancy. So using a smaller cheaper VM makes a lot of sense. It saves money and offers better redundancy than one dedicated server, and saves a lot more money than paying for two dedicated servers.

At Spark::red we utilize VMs in this way for things like IMAP, LDAP, DNS, and other services that aren’t resource intensive, but uptime, and the ability to spin up multiple instances around the world is a very good thing.

2. When you need to, and are able to architecture wise, scale your infrastructure resources up and down dramatically.

In the second situation, you have an application where the architecture, and licensing (or lack there-of), facilitates scaling your number of VM nodes up and down around demand (hourly, or seasonally, or whatever). Stateless and/or simple applications can do this very well, as can applications written to support this deployment topology concept. Having an application where you can add in caching front end servers, or stateless application servers, or database read-only nodes, or other similar pieces, can allow you to scale your capacity up and down based on need, minimizing costs when your traffic is lower.

Pintrest does this in a way that saves them significant hosting costs. Netflix is another great example of a very elastic and self managing scalable application. Many other web apps support similar scaling.

Why Cloud Isn’t a Good Fit for ATG/Oracle Commerce Hosting

As you can imagine, ATG/Oracle Commerce doesn’t fit into either of these two high level use cases.

ATG/Oracle Commerce environments almost always need a full server’s resources or multiple servers’ resources. Production obviously needs multiple servers’ worth of CPU, RAM, and I/O. And in production, where page response times have a direct impact on conversions and revenue, the performance overhead of a VM system has significantly negative impacts on your revenue. In Development or Staging environments, you typically will need 16-32 GB of RAM per environment, plus Database, for Development you want quick restart times (CPU heavy), and in Stage you will probably want to be able to run load testing, so you need performance that maps well to production and isn’t artificially constrained on VM system disk I/O or sub-par virtual CPUs.

While the ATG/Oracle Commerce platform has some impressive architecture, it is not set up for easy cluster topology changes. The BCC/CA deployment configuration points to hostnames/IPs and ports for each agent, and it has to know what agents are live so it can ensure accurate deployments to all of them. Also if you add in new agents (app server instances), assuming you don’t want to do a full deploy (you don’t), you’ll need to pre-load the latest deployment data and VFS files, as well as the current snapshot ID, make sure the switching data sources are all lined up to match the current live config of the existing instances, and then add the new agent to the CA deployment topology. You will also need to add all new instances to your load balancer or Apache proxy configuration, and reload. There are also complications around instance names and ports in various JMS messages, cache invalidation events, lock managers, etc., that can add serious speed bumps to making frequent on the fly cluster topology changes.

In short, an ATG/Oracle Commerce cluster does not lend itself to automated or frequent scaling up or down.

Another important factor is the ATG/Oracle Commerce licensing. The licensing is done based on CPU cores. Licenses are sold by “processor” which has a basic multiplier relationship with CPU cores based on your processor architecture. For Intel chips, it’s a 2x multiplier. That means if you buy 4 “processors” of ATG/Oracle Commerce, you are actually buying 8 Intel cores worth. You can run on 1 octo-core proc, or two quad-core procs. This has a few important implications regarding virtualization.

First, Oracle actually has specific licensing rules around virtualization. One set of rules applies if you’re using the Oracle VM product. In this case you have to have licenses for all virtual cores you are using in the VM(s). This is pretty standard. However, if you’re using any other VM product, you have to have licenses for all of the physical cores on the infrastructure that the VM system is running on, regardless of how many are provisioning in the VM(s) you are running ATG/Oracle Commerce on. That means you have to pay for a great deal of expensive licenses that you aren’t actually using to serve customer traffic. Since the ATG/Oracle Commerce licenses and support contracts are likely to be one of the largest costs for running your website, it makes no sense to waste those licenses, and that money, on anything other than serving up your website pages as quickly as possible.

What if we ignore the two primary benefits of Cloud Hosting I’ve described above? Maybe you have other goals, or you have corporate standards and you don’t really care about the additional complexity required to manage ATG clusters in changing topologies. Well, even then it comes down to issues of price and performance.

Server Costs

Since ATG/Oracle Commerce instances typically require large amounts of RAM and CPU resources, in addition to I/O, the servers needed to build out a useful VM infrastructure will have to be very large, with huge amounts of RAM, massive CPU density, multiple 10 Gbit NICs, very fast disk arrays, and more. It is almost always significantly more expensive to purchase a smaller number of very large servers than a larger number of smaller servers. So for that same amount of computing power, or web site capacity, you will end up paying significantly more for your hardware if you go with a large server VM based solution, rather than using a larger number of smaller commodity servers.

License Costs

You can only ever run as many CPU cores under your ATG/Oracle Commerce application as you have purchased licenses for. That’s true for dedicated hardware, it’s true for Oracle VM based VM solutions, and you’re actually worse off if you’re using any other VM solution. Assuming that you’re running on a private cloud (for security, PCI, and performance reasons), that means you’re paying for and running at least as many CPU cores as your maximum ATG/Oracle Commerce licenses. Let’s use an example here to illustrate. Assume you have 32 cores (16 “processors”) worth of ATG/Oracle Commerce licenses for your production environment. Let’s pretend for the moment that you don’t need more CPUs for the VM system and management, etc., (although you really do, so the costs are even higher) so you have 4 8-core servers (or 2 16-core servers or whatever). At peak, you can be running ATG/Oracle Commerce on all of these servers’ resources, using all of your licenses. In quiet times, you could scale back to running half that, say 16 cores, leaving 16 cores available to scale up into. However you still pay for those licenses, and in a private cloud, you’re still paying for the hardware 365 days a year (plus the VM costs and management overhead). In this case there is no reason to not run at full capacity all year, as it will provide better performance to your end users, increasing revenue, with no additional operational costs at all. So you would never want to scale down below your maximum licensed core count, which completely defeats the purpose. Now on a shared cloud, things can be a bit different, but the PCI and security issues can often make this a non-starter.

Performance

As I mentioned above, you will never get the same performance from a VM system as you would from using the same hardware directly. You will likely see a 10+% performance penalty. This reduces capacity and end user performance, which in turn directly impacts revenue. If you’re paying for the hardware and the ATG/Oracle Commerce licenses, you should always maximize the performance you are getting for your dollar. VMs do just the opposite.

Summary

While Cloud and VM based solutions have positive aspects that can benefit many applications and infrastructure components in your corporate architecture, ATG/Oracle Commerce is not a platform that is well suited to The Cloud. At this point in time, I highly recommend you use high performance, late model, dedicated servers for your ATG/Oracle Commerce environment.

What if you have large seasonal traffic spikes (holiday or season or event driven), and while you own all the licenses you need to support these peak load times, you don’t want to be paying for infrastructure you don’t need the other 10 or 11 months out of the year? Easy! Spark::red allows you to add servers on a monthly basis (as long as you have the ATG/Oracle Commerce licenses for them) as needed. We can scale up your cluster in a couple of days or less, and you’ll only pay for the additional hardware for the time you need it, month by month. We have several clients who do just that. It’s higher performance and more cost effective than VM scaling, and best of all, you don’t need to manage it or worry about it. Leave it to us!

About the Author:

Devon has worked with Oracle Commerce for nearly two decades, and is also considered an expert in JBoss open source platform development. Since building one of the first well known Seam-based sites, 10minutemail.com, he has been repeatedly called upon to review manuscripts for Seam development books. Devon’s Oracle Commerce history traces back to 1998, when he worked at ATG as a senior ATG architect (later rebranded as Oracle Commerce) for both Professional Services and Sales Engineering. With all those years under his belt focusing on Oracle Commerce technologies, including work done for AT&T Wireless/Cingular, People’s Choice Awards, Scotts, and Ulta Cosmetics, Devon has proven his expertise in creating and maintaining high-volume Oracle Commerce implementations.

3 Comments

  1. Nitin Shingne July 6, 2014 at 11:40 pm - Reply

    Thank you Devon for an excellent blog and Thanks to Jonathan for putting forward his point of view.

    An area, that may help further discussion is the use of Commerce platform thru the API layer. Agree with Devon that more integration is with the Commerce Engine helps rather than using it just thru the API layer…would want to qualify this however by saying this is more thru of sites that primarily are meant for online sales (macys.com, walmart.com etc). For CPG and Manufacturing companies, for whom Content Marketing is more important than the Online Sales capabilities, we are discovering that API based use of the e-Commerce platform is a better architectural option.

    Personalization is this day and age probably requires a different discussion altogether and would be interested in reading about it. It could cover the personalization services provided by the likes of Experian, the missing Social Profile component in the Customer database of most of the customers etc.

    Thank you both once again for the nice discussion.

  2. devon June 24, 2014 at 10:43 am - Reply

    Jonathan,

    some good questions:) I’ll try to answer as best as I can here, but also happy to discuss further via email.

    I know a few people are looking at fronting Oracle Commerce with another technology, often something like Node.js or similar platforms, and using Oracle Commerce like a commerce API platform. You gain some things around stateless request handling, some easy of scaling, etc… However Oracle Commerce was never designed as an API platform, and in many ways isn’t very optimized to that type of usage. You lose out on all the droplet functionality, content targetters, promotions, BCC content handling and previewing, and more. Plus you’re adding a whole other layer and technology stack into your system, which increases operational, development, QA, and maintenance complexity and costs. Personally I’m not sure the tradeoffs are worth it. I think the future direction of the Oracle Commerce product involves MORE integration with the UI (via Endeca Experience Manager, WCS, more OOTB responsive design support, etc…), not less.

    Regarding security, any multi tenant system is almost always going to be less secure than a well designed private system. AWS, and other cloud providers, were very late to get PCI level 1 certifications, and the upcoming PCI DSS 3 seems like it will be adding a lot of things around Cloud. Also theoretical and practical exploits are always coming out like this: https://www.sparkred.com/blog/cross-vm-attacks-targeting-aes-keys/ Or on Docker: http://stealth.openwall.net/xSports/shocker.c They come out much more frequently than anything that would allow access into a system like those that Spark::red manages, which are strictly firewalls off and all dedicated hardware on dedicated client VLANs.

    While I agree that Oracle Commerce does lag behind in some core technology areas, the previous licensing scheme was the big barrier that prevented the other technological issues from getting addressed. I’m not sold on stateless web apps, especially for complex eCommerce. Having in-memory, already instantiated session objects/data performs and scales much better than having to load/deserialize state data for every request, etc… But that’s a much larger discussion for another day. And yes, I do understand the new licensing:> I’m planning a blog post about it. While it has some potential issues, it’s a big step forward versus the previous model, and I’m very excited that Oracle made that progress.

    Coherence has a lot of potential, dependent on OOTB integrations in the Commerce product making good use of it, and it having a price point that allows most/all implementations to use it.

    The performance issues, combined with what can often be a price penalty (overall) mean that VM solutions for Oracle Commerce need to be very carefully planned to make sense.

  3. Jonathan June 16, 2014 at 1:07 pm - Reply

    Thanks for posting the review. Is it time for an update?

    Interested to here what your thoughts on putting a stateless asynchronous server side technology in front of Oracle Commerce are. This is what I understand Wallmart to have architected.

    We all know that the funnel is large at the start of a customer journey but tails off rapidly to checkout. The advantage here would be that the frontend serving (maybe api or microservice) side would suit an elastic cloud whilst the Commerce engine is relegated to serving just the checkout requiring less resources and less licenses. Good use of a hybrid situation.

    The disadvantages could be no personalisation or targeting but there other ways to achieve this.

    I don’t entirely share your views on security. If I’m looking at someone like AWS then I’d say that there environment is going to be more secure than doing it as a customer of a Commerce site however diligent one is.

    Such a shame to see Oracle lagging behind the curve in terms of elastic Commerce due to its dated architecture and blocking servlet rather than non-blocking async code base. Oracle have eventually adapted their licensing policy to included metered usage. Let me know if anyone understands how that works exactly.

    The introduction of a Coherence data grid into the mix does help speed stuff up.

    I agree with your conclusions regarding VM taxation on performance. On commodity hardware I would say just the networking overhead alone is in the range 10-30% degradation. Perhaps lxc/Docker will help, but bare metal will always perform better.

Leave A Comment

Welcome !