2017
Crystal: Software-Defined Storage for Multi-Tenant Object Stores
Raul Gracia-Tinedo, Josep Sampe, Edgar Zamora, Marc Sanchez-Artigas, and Pedro Garcia-Lopez, Universitat Rovira i Virgili; Yosef Moatti and Eran Rom, IBM Research Haifa
File and Storage Technologies (FAST' 17)., pp. 243, 2017
Abstract
Object stores are becoming pervasive due to their scalability and simplicity. Their broad adoption, however, contrasts with their rigidity for handling heterogeneous workloads and applications with evolving requirements, which prevents the adaptation of the system to such varied needs. In this work, we present Crystal, the first Software-Defined Storage (SDS) architecture whose core objective is to efficiently support multi-tenancy in object stores. Crystal adds a filtering abstraction at the data plane and exposes it to the control plane to enable high-level policies at the tenant, container and object granularities. Crystal translates these policies into a
set of distributed controllers that can orchestrate filters at
the data plane based on real-time workload information.
We demonstrate Crystal through two use cases on top of OpenStack Swift: One that proves its storage automation capabilities, and another that differentiates IO bandwidth in a multi-tenant scenario. We show that Crystal is an extensible platform to deploy new SDS services for object stores with small overhead.
Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop
Yosef Moatti, Eran Rom, Raul Gracia-Tinedo, Dalit Naor, Doron Chen, Josep Sampe, Marc Sanchez-Artigas, Pedro Garcia-Lopez, Filip Gluszak, Eric Deschodt
IEEE International Conference on Data Engineering (ICDE), 2017
Abstract
Extracting value from data stored in object stores, such as OpenStack Swift and Amazon S3, can be problematic in common scenarios where analytics frameworks and object stores run in physically disaggregated clusters. One of the main problems is that analytics frameworks must ingest large amounts of data from the object store prior to the actual computation; this incurs a significant resources and performance overhead. To overcome this problem, we present Scoop. Scoop enables analytics frameworks to benefit from the computational resources of object stores to optimize the execution of analytics jobs. Scoop achieves this by enabling the addition of ETL-type actions to the data upload path and by offloading querying functions to the object store through a rich and extensible active object storage layer. As
a proof-of-concept, Scoop enables Apache Spark SQL selections
and projections to be executed close to the data in OpenStack
Swift for accelerating analytics workloads of a smart energy grid company (GridPocket). Our experiments in a 63-machine cluster with real IoT data and SQL queries from GridPocket show that Scoop exhibits query execution times up to 30x faster than the traditional ingest-then-compute approach.
2016
WatchIT: who Watches your IT Guy?
Noam Shalev Idit Keidar Yosef Moatti Yaron Weinsberg
8th ACM CCS International Workshop on Managing Insider Security Threats (MIST `16), 2016
Abstract
System administrators have unlimited access to system resources. As the Snowden case shows, these permissions can be exploited to steal valuable personal, classied, or commercial data. In this work we propose a strategy that increases the organizational information security by constraining IT personnel's view of the system and monitoring their actions. To this end, we introduce the abstraction of perforated containers { while regular Linux containers are too restrictive to be used by system administrators, by "punching holes" in them, we strike a balance between information security and required administrative needs. Our system predicts which system resources should be accessible for handling each IT issue, creates a perforated container with the corresponding isolation, and deploys it in the corresponding
machines as needed for xing the problem.
Under this approach, the system administrator retains his superuser privileges, while he can only operate within the container limits. We further provide means for the administrator to bypass the isolation, and perform operations beyond her boundaries. However, such operations are monitored and logged for later analysis and anomaly detection.
We provide a proof-of-concept implementation of our strategy, along with a case study on the IT database of IBM Research in Israel.
IOStack: Software-Defined Object Storage
Raul Gracia-Tinedo, Pedro Garcia-Lopez, Marc Sanchez-Artigas, Josep Sampe, Yosef Moatti, Eran Rom, Dalit Naor Ramon Nou, Toni Cortes, William Oppermann, Pietro Michiardi
Internet Computing, 2016
Abstract
As the complexity and scale of cloud storage systems grow, software-defined storage (SDS) has become a prime candidate to simplify cloud storage management. Here, the authors present IOStack, the first SDS architecture for object stores (such as OpenStack Swift). At the control plane, the provisioning of SDS services to tenants is made according to a set of policies managed via a high-level domain-specific language (DSL). Policies can target storage automation or specific service-level agreement (SLA) objectives. At the data plane, policies define the enforcement of SDS services, namely filters, on a tenant's requests. Moreover, IOStack is a framework to build a variety of filters, ranging from general-purpose computations close to the data to specialized data management mechanisms. Experiments illustrate that IOStack enables easy and effective policy-based provisioning, which can significantly improve the operation of a multitenant object store.
IOStack: Software-Defined Object Storage (PDF Download Available). Available from: https://www.researchgate.net/publication/298902698_IOStack_Software-Defined_Object_Storage [accessed Aug 10, 2017].
2012
VM placement strategies for cloud scenarios
Nicolo Maria Calcavecchia, Ofer Biran, Erez Hadad, Yosef Moatti
2012 IEEE Fifth International Conference on Cloud Computing
Abstract
The problem of Virtual Machine (VM) placement in a compute cloud infrastructure is well-studied in the literature. However, the majority of the existing works ignore the dynamic nature of the incoming stream of VM deployment requests that continuously arrive to the cloud provider infrastructure.
In this paper we provide a practical model of cloud placement
management under a stream of requests and present a novel technique called Backward Speculative Placement (BSP) that projects the past demand behavior of a VM to a candidate target host. We exploit the BSP technique in two algorithms, first for handling the stream of deployment requests, second in a periodic optimization, to handle the dynamic aspects of the demands. We show the benefits of our BSP technique by comparing the results on a simulation period with a strategy of choosing an optimal placement at each time instant, produced by a generic MIP solver.
2011
Guaranteeing high availability goals for virtual machine placement
Eyal Bin, Ofer Biran, Odellia Boni, Erez Hadad, Eliot K. Kolodner, Yosef Moatti, Dean Lorenz
2011 31st International Conference on Distributed Computing Systems
Abstract
The problem of Virtual Machine (VM) placement in a compute cloud infrastructure is well-studied in the literature. However, the majority of the existing works ignore the dynamic nature of the incoming stream of VM deployment requests that continuously arrive to the cloud provider infrastructure. In this paper we provide a practical model of cloud placement management under a stream of requests and present a novel technique called Backward Speculative Placement (BSP) that projects the past demand behavior of a VM to a candidate target host. We exploit the BSP technique in two algorithms, first for handling the stream of deployment requests, second in a periodic optimization, to handle the dynamic aspects of the demands. We show the benefits of our BSP technique by comparing the results on a simulation period with a strategy of choosing an optimal placement at each time instant, produced by a generic MIP solver.
2003
Easy: engineering high availability QoS in wservices
Eliezer Dekel, Oleg Frenkel, Gera Goft, Yosef Moatti
Reliable Distributed Systems, 2003. Proceedings. 22nd International Symposium on, pp. 157--166
Abstract
Developing and administrating distributed applications is complex. Frameworks, hiding the distribution hurdles through encapsulation were proposed, but their acceptance by the industry has been limited. The main reason is the difficulty to provide a simple interface to meet a wide range of application types. In this paper we address the functional services provided
over the Web (henceforth, wServices). Examples of wServices are Grid Services. These typically small software components are often developed under tight budget and timeframe constraints. A wService may be deployed on different platforms and provide different QoS guarantees. With the advent of e-business, wServices become an important type of distributed applications. We claim that narrowing the view to this type of applications
allows providing a simple interface. Furthermore, we show that good performance can be achieved if wService developers provide simple tuning parameters as part of a wService package. In our Easy model, platform and QoS specifics are decoupled from wService development, thus reducing wService development costs. In addition we aim at increasing automation of wService deployment on various platforms and for different QoS. The focus of this paper is on performance aware high availability,
achieved through wService cloning and replication of its state. In our philosophy, wService developers are aware of potential cloning and replication but not of the mechanisms that provide it. We demonstrate the feasibility of Easy through a prototype with an automatically deployed TomCat Web Container. Easy clones TomCat and replicates its state. We show that this automated
process imposes only slight performance degradation compared to a manual one.
2002
An overview of the BlueGene/L supercomputer
N.R. Adiga, G. Almasi, G.S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, others
Supercomputing, ACM/IEEE 2002 Conference, pp. 60--60, IEEE Computer Society
Abstract
This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the United States Department of Energy ASCI Advanced Architecture Research Program. Application performance and scaling studies have recently been initiated with partners at a number of academic and government institutions,including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame, at price/performance and power consumption/performance targets unobtainable with conventional architectures.