Cloud managed services      

links

Frank W. Goettert photoSandeep Gopisetty photo photo photo photoNathaniel  (Nat) Mills photo

Cloud managed services - overview


Selected projects:


Resiliency

The virtualization and cross-system management capabilities of cloud computing offers a unique opportunity of providing highly resilient and highly available systems. Resilience techniques can provide recovery measures for replicating unresponsive services. Virtualization allows packaging of workloads in a portable virtual machine image container for easy transfer from one server to another. High availability features can migrate a VM image seamlessly from one physical server to another within the same data center if the original server suffers any failure, performance loss, or to perform scheduled maintenance. This portability and migration of workloads allows building of highly available systems – systems where a customer expects that the most important workloads are continuously available. These changes at the system, software and applications levels have the potential to move computing from highly available systems to continuously available systems.

Cloud computing is radically changing the way disaster recovery of a data center can be implemented. In traditional disaster recovery solutions, a data center has to have a disaster recovery site; traditional disaster recovery typically uses data backup and stores data onto storage tapes which are then transported to a recovery data centers. The recovery process is lengthy and can take hours and days until the workloads are available again. Cloud computing and virtualization make recovery of VM images – and their restarting on another server in a different data center for disaster recovery – more cost effective, and enable significantly faster recovery times. The ability to restart VMs from one data center in another data center in a matter of minutes is redefining how disaster recovery can be implemented. Cloud computing also significantly lowers the cost and affordability of disaster recovery solutions.

Foundational Cloud Capabilities

Procuring a virtual machine in an enterprise-class managed cloud requires much more than just creating a VM. It needs to perform a large number of mandatory IT service management tasks, including creating a change request for auditing, applying security patches, registering IT assets, verifying software license, scheduling periodic backup and healthcheck scan. Because of the high labor cost and long turn-around time, in cloud it is infeasible to manually perform these lengthy tasks and manually gather auditable evidence to show that every step is done properly. We’ve developed a SA&D Engine that fully automates the whole process of VM configuration and security compliance validation. The solution is in production use for IBM SmartCloud Enterprise+.

Applying patches to operating systems, middleware, and applications is considered a major IT pain point due to several reasons. The operating systems and software are of myriad types, there is interdependency among the updates, operating system, and applications, there is lack of standardization among different enterprise customers, and finally testing the applications and operating systems post-update is challenging. As a result, human operator is involved in different stages of the patching process, making it costly and cumbersome. Cloud can help standardize various offerings to customers, and potentially remove human operators, reduce or eliminate downtime, and allowing completely end-to-end process automation.

Problem determination and root cause analysis is critical to the stable operation of cloud services. The goal of this project is to develop a suite of techniques that can aid administrators in solving difficult systems problems, especially in the cloud environment. The BlueCoat toolset is built on top of a framework that inserts an interception layer between the application and the operating system to provide various smart application management functions. A key challenge in managing distributed application is to trace how a request travels through various distributed components and understand the causality of events on distributed components. We developed the BlueCoat/Defective tracing tool that takes a black-box approach to monitor application events at the system-call and library-call level, without application source code or application knowledge. BlueCoat/Detective constructs the request-processing path across distributed components. By correlating system-level events with application log file contents, BlueCoat/Detective can pinpoint the root cause of errors that propagate through multiple servers.

In data-center relocation, server consolidation, or cloud migration engagement, business applications are relocated or migrated from the source environment to the target environment. From a network perspective, the ability to function properly is critically dependent on the firewall configuration in the target environment. Depending on the contractual obligations of the engagement, the target firewall configuration may simply reestablish the source firewall configuration or may be more complex, requiring implementation of new security and compliance policy requirements. The BlueGates project is focused on providing migration architects with an automated and intelligent capability for analyzing, designing, and configuring firewalls as part of a DC relocation or server consolidation engagement.

Highlighted papers:

  • Understanding Performance Implications of Nested File Systems in a Virtualized Environment
    Duy Le, Hai Huang, and Haining Wang
    Proceedings of the USENIX Conference on File and Storage Technologies, Feb 14-17, 2012, San Jose, CA
  • FVD: a High-Performance Virtual Machine Image Format for Cloud
    Chunqiang Tang
    Short Paper in the 2011 USENIX Annual Technical Conference. Portland, OR, June, 2011.
  • Towards automated identification of security zone classification in enterprise networks
    H V Ramasamy, C L Tsao, B Pfitzmann, N Joukov, and J W Murray
    Proceedings of the USENIX workshop on Hot topics in management of Internet, Cloud, and Enterprise Networks and Services, pp. 9--9, USENIX, 2011
  • vPath: Precise Discovery of Request Processing Paths from Black-Box Observations of Thread and Network Activities
    Byung Chul Tak, Chunqiang Tang, Chun Zhang, Sriram Govindan, Bhuvan Urgaonkar, Rong N. Chang
    Proceedings of the USENIX Annual Technical Conference, June 14-19, 2009, San Diego, CA

Contact:

Alan Bivens: jbivensatus.ibm.com