Sunday, September 23, 2012

Keeping the Lights on Syndrome

Does your IT organization suffer from "Keeping the Lights on Syndrome"? For those of you who are asking, what the heck is "Keeping on the Lights Syndrome" here's a quick definition of the problem. "Keeping the Lights on Syndrome" is a situation that more and more IT organizations are finding themselves in where they are spending 70% of their IT budget on "keeping the lights on" and only 30% on innovating with the business and modernizing their technology.

So, what's the right number? Should it be 60% - 40%? 50% - 50%?  Well for a lot of years the number has been closer to 50-50%, and that's probably a good number to strive toward for most IT organizations. The next question I'm often asked is how can I address this problem?  What can I do to get my Infrastructure and Operations costs down? I'm already virtualizing my server infrastructure, and I'm looking at more virtualization including virtualizing my storage and my networks, what more can I do to get my I&O expenses down even further?

The answer is that virtualization has been a great help to keep the number down to just 70-30%. Without virtualization a lot of organization might be staring at 80-20%, or even 90-10%.  Ok, so what's the next step you ask? Please don't say "cloud", I've heard that enough already in the last year! As a matter of fact, every manufacturer of infrastructure, and infrastructure software has been telling me that all I have to do is buy their solution and I have a "cloud solution" in place.

I tend to agree, the term "cloud" is over used. So, let's not use it here, lets instead look at some practical things that the I&O organization can do to address the "Keeping the Lights on Syndrome". Longer term, yes something like IT as a Service whether it's implemented using a private (internal) cloud, a public cloud, or a combination of the two called Hybrid Cloud doesn't matter. But that's a longer term solution. So what can the I&O organization do in the shorter term to address the problem, and maybe lay the groundwork for the longer term cloud solution as well?

What can I&O do beyond completing the current drive toward virtualizing almost the entire infrastructure? They can start "comoditizing" their infrastructure. What is "commodity" infrastructure? Its the idea that you buy your infrastructure including network, server, and storage as a single unit.  Some people call this converged infrastructure, but what ever you call it the idea is to buy your infrastructure as a single SKU which defines a single unit of capacity for your infrastructure.

How does this help with the "Keeping the Lights on Syndrome"? It removes a major cost from your  I&O organization. That cost being the cost of developing the "right" solution for each and every application, and then building a customer infrastructure to support that "right" solution. instead you buy your infrastructure capacity in "chunks", and then carve those "chunks"into standard sized pieces.  That doesn't mean that those standard peices must all be identical, but rather there should be a limited number od standard sized "chunks". For example, Small, Medium, Large, and X-Large.

How does this help your I&O organization address the "Keeping the Lights on Syndrome"? It does so by making your purchasing more efficient. It also reduces the amount of engineering you have to do by eliminating most of the custom engineering and custom building that is still happening in the I&O organization in spite of the fact that you have virtualized much of your infrastructure.

So, how would this work, you ask me?  My application teams need to have their requirements met! My response is that it would work just like buying a car. If you have a family of, say, 5 people, and you like to go on family driing trips, do yo ugo to the Ford dealership and tell them what you want in a car, and then have them build you a custom car that exactly meets your needs? No, of course not, you go to the dealership and choose among several different offering that they have, and then buy the one that is closest to meeting your needs. Sure, you can "customize" that car buy picking the color, the size of the engine, maybe pick some custom wheels, etc. But all of this is based on a limited set of standard platforms that Ford builds, it's not custom from the ground up.

Right now, I would argue that, even with virtualization, most I&O organizations are still building custom cars from the ground up. What I'm suggesting is that instead the I&O organization should be buying a "standard" platform, provide some standard sized "environments" that the application teams can pick from, and then only "customize" the application environments based on the standard platforms/environments. So, lets say for example that you have a converged infrastructure where a single unit of converged infrastructure can handle any combination of 2,000 small VM's, 1,000 medium VM's, 500 large VM's, and 250 X-large VM's.  When the applications teams need new VM's they simply request one of the standard sizes. No custom engineering required. If, however, none of the standard sizes fits the needs of the application teams, then a custom engineered VM is built for them. The key here is to keep the number of custom VM's down to a minimum. Mostly this can be done through the charge back process by making any custom VM cost significantly more than an X-large VM.

What does this buy the I&O organization? First it addresses the "Kepping the Lights on Syndrome" by reducing the cost of deploying new infrastructure.  It also makes the I&O organization more agile since it saves all of the time that is needed to engineer custom solutions.  Finally, this approach also lays the groundwork for automation the deployment of the infrastructure, otherwise known as Infrastructure as a Service and IaaS is one of the first layers on the way to ITaaS and "cloud".

So, what are the barriers to implementing a converged infrastructure solution for your I&O organization? There are a number of them, actually, and they are all organizational in nature. First, you need to pick a partner that can provide you with the right converged infrastructure for your I&O organization.  This can be an issue because typically your purcasing organization already has agreements in place with your storage, server, and network vendors, so if you want to continue to use that technology you are going to have to get your purchasing people on board and they are going to have to talk with your storage, server, and network suppliers about working with a partner that can pull together all three and provide them as a single SKU.  Once you have that worked out, you need to get buy in from your engineering and architecture organization.  The architecture organization is going to be threatened by this move to converged infrastructure since they will perceive this a a move to reduce their control over the infrastructure in the organization. However, the engineering organization is the one that will likely feel the most threatened by the move to converged infrastructure. They will very like view it as a direct attack against them and will though up every argument for why "this won't work" you can imagine. Finally, your storage, server, and network administration organizations will need to be revamped. Managing a converged infrastructure with 3 separate teams. Unless you reorganize to support/manage your converged infrastructure by a single organization much of the advantage of pulling storage/server/network together physically can be lost.

Finally, let me say that several of our customers who are at various stages of implementation of converged infrastructure, IaaS, and Cloud infrastructure, and how successful these initiatives are is directly related to the organization's ability to change and embrace the new technology. It also is directly related to the partner's ability to deliver on the organizations needs. Without a good partner who understand the needs and goals of the organization, they are doomed to failure.

Friday, August 17, 2012

ILM/HSM part 2, Return of ILM/HSM

Folks, Sorry it's been so long since my last posting! Time fly's when you're having fun and I've been having a lot of fun over the last year. What have I been doing, you ask? Well a lot and among other things I've been trying to help our customers sort through a changing storage environment, and I've learned a few things in the process. What's all this change I'm referring to? Well, among other things, Flash/SSD has really started to take off, and that has a lot of implications for the storage team. So I have spend a lot of time helping our customers sort through the different options, etc. and discovered some things in the process that I would like to share with you. But first, a quick review of what's up with storage and Flash/SSD. As I indicated above, Flash/SSd is really beginning to make in-roads into the data center. Flash/SSD comes in basically three different flavors. First, Flash/SSD's can be used in something that looks like a traditional storage array. There are a couple of different variations of this type of storage array. Some use SSD drives in place of traditional disk drives, and some use Flash memory directly. Typically, the arrays that use SSD's provide many of the same features as other tradition storage arrays such as snapshots, replication, etc. Arrays based on Flash memory, on the other hand, typically provide better performance than arrays that use SSD drives mainly because the avoid all of the overhead involved with the SCSI protocol, etc. However, these arrays also often don't have all of the features we need in the data center such as snaps and replication, etc. In both cases, from a storage management perspective you would manage it much like any other storage array in your data center. Second there are the traditional storage arrays with Flash/SSD added to them. Again, these arrays come in basically two flavors. In both cases, however, an effort is made to only utilize the Flash/SSD for data which is currently "in use" or "hot" in an effort to keep the costs down. With the first flavor, SSD drives are used to hold "hot" blocks of data, with "cool" blocks of data being stored on traditional disk drives. This requires sophisticated software that monitors how "hot" the data is and moves it appropriately. With the second type of array Flash is added to the controller and used to extend the cache. This has the advantage that the software is a simple extension of the existing controller software, and as I mentioned above, the overhead of the SCSI protocol is avoided. The downside is that this only provides a performance boost for the read half of the equation. Finally, there is the ability to add Flash memory to the servers that run your applications. Once again, there are two flavors here. The first, and simplest flavor is to utilize the Flash memory as an extended disk cache. The advantage to this is that it accelerates I/O to/from any disk arrays you may already own. The down side is that it is often limited in what kinds of OS's it works with. The second flavor makes the Flash memory appear to the OS on the server as a disk drive. This has the advantage of very high performance, but is limited in size. It is also limited in that you can't use features like server clustering, etc. since this data can't be shared among a group of servers. So what's the lesson learned from all of the above? I think that there are a couple. One is that if we are going to utilize some or all of this technology in the data center, we are really looking at bringing back the old ILM/HSM days. For the "Flash/SSD" only arrays, because of their cost, most data centers aren't going to bring them in to replace all of their traditional storage array capacity. So some way to move data from the expensive storage to the less expense storage needs to be found if costs are going to be kept under control. With the second type of array software to move the data is supplied, but there are questions about how effective this software is particularly in keeping up with quickly changing "temperature" data. The third type of Flash/SSD certainly improves performance, but increases the "storage islands"in your data center unless some kind of ILM/HSM software can be applied. Where this leaves us is with many of the same issues that, ultimately, derailed ILM the last time around. The main issue at the time was the classification of the data. Getting the business to classify their data was very difficult, and in the end,we often threw up our hands and just moved data based on "last access date". While this works for file based data, it doesn't work for database data, for example, at all.