Frontiers of Cloud Computing and Big Data Workshop 2014 - Program
|8:15am||Gathering with breakfast (Mezzanine)||Slides|
|Session 1 (9-10:30am)||Salman + Kaoutar (chair)|
|9:00-9:30am||IBM Talk 1: (Malgorzata Steinder - Research Challenges in Cloud Computing)|
Swapna Buccapatnam, Ohio State, PMA)
Stochastic Bandits with Side Observations on Networks
|9:45-10am||Wolfgang Reichter, CMU
Agentless Cloud-wide Monitoring of Virtual Disk State
|10:00-10:15am||Muhammad Naveed, UIUC
Controlled Functional Encryption
|10:15-10:30am||Puneet Jain, Duke University
Practical Mobile Augmented Reality
|Session 2 (10:45am-12:15pm)||Ramya + Xin (chair)|
|10:45-11:15am||ThinkLab tour (Sambit Sahu)|
|11:15-11:30am||Sara Elspaugh, UC Berkeley
Mining user interaction data to provide user assistance
|11:30-11:45am||Ankita Kejriwal, Stanford University
SLIK: Scalable Low-Latency Indexes for a Key-Value Store
|11:45-12:00pm||Ali Munir, Michigan State University
Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks
|12:00-12:15pm||Venkatanathan Varadarajan, U Wisconsin
How can VMM schedulers improve security in public clouds?
|Lunch 12:15-1:00pm||Cafeteria Annex|
|Session 3 (1:00-2:20pm) (poster session)||Cafeteria Annex + Tea/Coffee (setup at 2:00pm) - Ramya|
|Session 4 (2:30-4:15pm)||Oktie + Roman (chair)|
|2:30-2:45pm||Martin Jergler, Munich, Services Computing
Data-centric Workflow Management in the Cloud
|2:45-3:00pm||Pavan Kapanipathi, Wright State University
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Sourced Knowledge Bases
|3:00-3:15pm||Robert Escriva, Cornell University
HyperDex: A Consistent, Fault-tolerant, Transaction Key-Value Store
|3:15-3:30pm||Jiang Du, U Toronto
DeepSea: Revisiting materialized views in scalable data analytics
|3:30-3:45pm||Shumo Chu, U Washington
Evaluating Multiway Joins in a Parallel Database System
|3:45-4:15pm||IBM Talk 2: Watson Platform Next. Speaker: Marc-Thomas H. Schmidt - Distinguished Engineer, Chief Architect Watson Platform Next)|
IBM Talk 3: (Dimitrios Pendarakis - Cloud Security)
|4:45-5:15pm||Blue Gene Lab tour (Jose Moreira)|
Ankita Kejriwal (Stanford University)
Title: SLIK: Scalable Low-Latency Indexes for a Key-Value Store
Many large-scale storage systems sacrifice features and/or consistency in favor of scalability or performance. In contrast, I will talk about SLIK, which adds secondary indexes to an existing high-performance key-value store (RAMCloud) without sacrificing either latency or scalability. By keeping index data in DRAM, SLIK performs indexed reads in 10 μs and writes in 24 μs. At the same time, SLIK supports indexes spanning hundreds or thousands of nodes. It allows internal inconsistencies in the implementation of indexes, which makes the implementation more efficient, but provides strong consistency externally to clients. It uses the existing RAMCloud tablet mechanism to store the nodes of index B-trees, which allows SLIK to reuse existing mechanisms for durability and crash recovery.
Ankita Kejriwal is a 5th-year PhD candidate in the Computer Science Department at Stanford University. She works with John Ousterhout and the lab on RAMCloud. Her main research interests include Data Center Storage Systems and Distributed Systems. She did an internship at Microsoft Research - Silicon Valley (MSR-SVC) in Summer 2013 on Distributed Computing and enjoyed it thoroughly. Before Stanford, Ankita did her B.E. (Hons) in Computer Science at Birla Institute of Technology and Science (BITS) - Pilani, Goa and interned at Bhabha Atomic Research Center (BARC), Indian Institute of Science (IISc) and Yahoo!. Webpage: http://web.stanford.edu/~ankitak/.
Ali Munir ( Michigan State University)
Title: Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks
Abstract: Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these protocols are, in fact, complementary. Based on this insight, we design PASE, a transport framework that synthesizes existing transport strategies, namely, self-adjusting endpoints (used in TCP style protocols), innetwork prioritization (used in pFabric), and arbitration (used in PDQ). PASE is deployment friendly: it does not require any changes to the network fabric; yet, its performance is comparable to, or better than, the state-of-the-art protocols that require changes to network elements (e.g., pFabric). We evaluate PASE using simulations and testbed experiments. Our results show that PASE performs well for a wide range of application workloads and network settings.
Bio: Ali Munir received the BSc. degree in Electronics Engineering and Masters degree in Electrical Engineering from the National University of Sciences & Technology (NUST), Pakistan, in 2009 and 2012 respectively. He is a PhD student at Michigan State University, USA, since January 2013.
Title: Agentless Cloud-wide Monitoring of Virtual Disk State
Abstract: What if the cloud infrastructure assisted in monitoring the virtual servers it maintains? It already monitors them at a coarse-grained level for metering their consumed resources. From a research perspective, we have a huge opportunity to clean-slate design monitoring interfaces for virtualized clouds which handle up to 70% of all x86 applications as of 2014. We propose
Bio: Wolfgang Richter is a 5th-year PhD student in Computer Science at Carnegie Mellon University's School of Computer Science. He has won research and teaching awards at both Carnegie Mellon and the University of Virginia where he received his BS in Computer Science with Highest Honors. During his tenure as a graduate student at Carnegie Mellon he has been supported by an NSF Fellowship and an IBM Research Fellowship. His research interests lie in distributed systems and cloud computing. He has collaborated predominantly
Sara Alspaugh (UC Berkeley)
Title: Mining user interaction data to provide user assistance
Abstract: In this era of big data, one type of data that has not yet realized its full potential use is records of user interactions with software systems. Complex software systems create execution logs primarily for debugging. Logs have not been used for understanding how users have interacted with the system. By making an effort to collect high-volume, high-quality user interaction data, we can enable applications like interface optimization, recommendations, interaction analysis, user assistance, and so on, much like has been done in the web domain over the past two decades. In this talk we will discuss our work on collecting and mining records of log analysis activity from Splunk, a major platform for data analytics. The results of our analysis have been used to make improvements to the Splunk product and interface. We are currently working on harvesting exploratory visual analytic activity in Tableau from trained volunteers as part of an ongoing research seminar. We will use this activity data to construct exploratory analysis guidelines and train an intelligent assistance application that can help beginner users more quickly solve a given task or extract value from a given data set. We will conclude this short talk with a discussion of how such approaches can be applied in the domain large-scale multi-component distributed system debugging given appropriate activity data from such domains.
Bio: Sara Alspaugh is a PhD candidate in the EECS Department at UC Berkeley. She works in the AMPLab. Her advisors are Randy Katz and Marti Hearst. She received her Master’s in computer science in 2012 from UC Berkeley and her BA in computer science in 2009 from the University of Virginia. Her research interests include data mining, visualization, systems, and user interaction with data analysis tools. In particular, she is interested in mining user interaction records from data analysis and visualization tools to improve system and interface design.
Muhammad Naveed (UIUC)
Abstract: Motivated by privacy and usability requirements in various scenarios where existing cryptographic tools (like secure multi-party computation and functional encryption) are not adequate, we introduce a new cryptographic tool called Controlled Functional Encryption (C-FE). As in functional encryption, C-FE allows a user (client) to learn only certain functions of encrypted data, using keys obtained from an authority. However, we allow (and require) the client to send a fresh key request to the authority every time it wants to evaluate a function on a ciphertext. We obtain efficient solutions by carefully combining CCA2 secure public-key encryption (or rerandomizable RCCA secure public-key encryption, depending on the nature of security desired) with Yao’s garbled circuit. Our main contributions in this work include developing and formally defining the notion of C-FE; designing theoretical and practical constructions of C-FE schemes achieving these definitions for specific and general classes of functions; and evaluating the performance of our constructions on various application scenarios.
Robert Escriva (Cornell University)
Title: HyperDex: A Consistent, Fault-tolerant, Transaction Key-Value Store
Distributed key-value stores are now a standard component of high-performance web services and cloud computing applications. While key-value stores offer signiﬁcant performance and scalability advantages, the first wave of NoSQL stores typically compromise on consistency and limit the system API to key-based operations.
This talk will present HyperDex, a novel, distributed key-value store developed by my group that provides (1) strong consistency guarantees, (2) fault-tolerance for failures and partitions affecting up to f nodes, and (3) a rich API which includes ACID transactions and a unique search primitive that enables queries on secondary attributes. HyperDex achieves these properties through the combination of three recent technical advances called hyperspace hashing, value-dependent chaining and linear transactions. Performance measurements from the industry-standard YCSB benchmark show that these properties do not extract a high overhead: HyperDex is actually a factor of 2-13 faster than Cassandra and MongoDB.
Overall, HyperDex offers a rich API and combines strong consistency with fault-tolerance guarantees. We'll discuss how these techniques work behind the scenes, do a brief tour of the HyperDex API, and outline how these properties relate to the oft-repeated CAP credo.
Robert Escriva is a PhD candidate in Computer Science at Cornell University working with Prof. Emin Gun Sirer. He is interested in distributed systems to support large-scale applications. His current project is HyperDex, a new NoSQL system.
Venkatanathan Varadarajan (Wisconsin)
Title: How can VMM schedulers improve security in public clouds?
Abstract: Infrastructure as a Service (IaaS) solutions often pack multiple customer virtual machines (VMs) onto the same physical server and multiplex resources for greater efficiency and low service cost. Unfortunately, the system managing these resources are often directly adopted from a system that is tailored for private datacenters and stand-alone operating systems. Such systems are fraught with security threats when they are used in a publicly accessible and open environment of the public clouds where security is as important as performance and
Recent works have shown how to mount cross-VM side-channel attacks to steal cryptographic secrets. Such attacks rely on vulnerabilities in the Hypervisor's CPU scheduler. In this talk, I will first present how a simple change to the CPU scheduler can defend against a class of side-channel attacks with almost no performance overhead. I will also touch on other related works like, Resource-Freeing Attacks and VM placement gaming, done at Wisconsin that demonstrate similar vulnerabilities in resource schedulers and motivate the need for security research on resource schedulers.
Venkatanathan Varadarajan is a research assistant and a fifth year Ph.D student at University of Wisconsin-Madison working under the excellent guidance of Prof. Thomas Ristenpart and Prof. Michael Swift. He has worked on various layers of interactions between hardware and system software. His dissertation involves studying various security vulnerabilities in resource schedulers that are uniquely enabled through multi-tenancy and openness of the public clouds. He is a member of the security team in the Wisconsin Institute on Software-defined Datacenters Of Madison (WISDOM). In Summer 2013, he interned at VMware R&D, Palo Alto where he worked on automating horizontal scaling virtual machines of a multi-tier application. He completed my Bachelor's from College of Engineering Guindy, Anna University, Chennai in June 2010 and flew directly to University of Wisconsin-Madison to receive his Master's degree in December 2012.
Martin Jergler (Technical University of Munich)
Title: Data-centric Workflow Management in the Cloud
In recent years, data-centric workflows became increasingly popular. The unification of activity flow and associated data enables operational management and concurrent analytics of business processes (BPs). Moreover, it significantly alleviates adaptations of BPs in order to meet changing policies or other constraints. A prominent approach to manage data-centric workflows are Business Artifacts with Guard-Stage-Milestone lifecycles (GSM). In GSM, a workflow is described by a data model and a lifecycle model. The data model represents application data and the workflow status. The lifecycle model is defined by a set of ECA-like rules to describe how the status evolves over time and when individual tasks are executed. As nowadays BPs span across administrative and geographic boundaries, manage lots of possibly long running instances, consider vast amounts of data, and coordinate participants from all over the world, centralized workflow systems do no longer scale. Varying load characteristics, e.g., due to user behavior or the nature of the workflow itself impose additional challenges. To this end, we present a distributed architecture for data-centric workflows that is based on a publish/subscribe messaging infrastructure. A workflow is mapped into a set of interacting workflow components (IWCs). IWCs, for instance, represent gateways for attribute updates, evaluate single ECA-like rules, and manage the interaction with the environment. In this sense, (1) each IWC subscribes to updates on a subset of the data model, (2) evaluates a policy over the data, and (3) potentially updates other attributes by publishing to the infrastructure. The implicit order, in which IWCs communicate ensures the same execution semantics as compared to a centralized execution. In our architecture, IWCs are the unit of distribution and inherit all properties of publish/subscribe systems. This allows for a very flexible deployment and incremental scalability on cloud infrastructures as IWCs can be easily migrated from one VM to another. IWCs do not only distribute the execution semantics, but also partition the global data model into independent subsets. This facilitates the data management to be compliant with geographical constraints.
Martin Jergler is a doctoral candidate working with Professor Hans-Arno Jacobsen in Computer Science at the Technical University of Munich, Germany. He is a member of the chair for Application & Middleware Systems since 2012. Martin received his BSc (2009) in Internet Computing and MSc (2012) in Computer Science from the Department of Informatics and Mathematics at the University of Passau, Germany. In his undergraduate studies he focused on multimedia technologies and applications and received the :a:k:t: student scholarship in 2010. Martin’s current research interests revolve around distributed data management and applications. These include publish/subscribe middleware, service-oriented architectures, data-centric workflows and case management.
Puneet Jain (Duke University)
Title: Practical Mobile Augmented Reality
Mobile Augmented Reality remains a fascinating concept in the research community since the early adoption days of smartphones. Despite several attempts by the App Designers, we see no real-world application in the leading App hosting platforms such as Google Play and Apple AppStore. The realization Mobile AR’s true potential and development of new applications need immediate addressing of many open challenges. In this talk, we shall try to address one key question: "What does it take to enable real-time mobile augmented reality on current generation smartphones"? Specifically, we will demonstrate how careful system design rooted on sensing, vision, cloud offloading, and semi-supervised learning leads to sufficient precision for useful mobile applications and tolerable latency to an end user.
Puneet Jain is a Ph.D. candidate in computer science at Duke University. He is a part of Systems Networking Research Group at UIUC led by Professor Romit Roy Choudhury. Jain holds M.S. in CS (2013) and M.Tech/B.Tech in CSE (2009) from Duke and IIT Kharagpur respectively. Jain’s current research interests include real-time computer vision, mobile sensing, and big data for social analytics. More about him can be found at : http://www.cs.duke.edu/~puneet
Jiang Du (University of Toronto)
Title: DeepSea: Revisiting materialized views in scalable data analytics
Swapna Buccapatnam (Ohio State University)
Title: Stochastic Bandits with Side Observations on Networks
I will present my work on stochastic multi-armed bandit problems with side-observations on networks. In our model, choosing an action provides additional side observations for its neighboring actions in the network. One example of this occurs in the problem of targeting users in online social networks where users respond to their friends’ activity. These side observations can be leveraged to improve scalability of bandit policies in the presence of a large set of actions.
Our contributions are as follows: 1) We derive an asymptotic lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose two policies, both of which explore each action at a rate that is a function of its network position. We further show that these policies are optimal, i.e. they achieve the asymptotic lower bound on the regret up to a multiplicative factor independent of the network structure. Finally, we use numerical examples on a real-world social network to demonstrate the significant benefits obtained by our policies against other existing policies.
Swapna Buccapatnam is a Ph.D. candidate in the Department of Electrical and Computer Engineering at the Ohio State University. She received her B.Tech and M.Tech. degrees in Electrical Engineering from the Indian Institute of Technology, Madras, India, in 2008. Her research interests lie in the analysis and control of complex networks such as wireless networks, cloud computing systems, online social networks, and crowd-sourcing markets.
Shumo Chu (University of Washington)
Shumo Chu is a PhD student at University of Washington Computer Science and Engineering, working with Prof. Dan Suciu and Prof. Magdalena Balazinska. Before UW, he was an intern at Data Management Group of Microsoft Research Asia and had worked with Prof. James Cheng, at NTU, Singapore (now at CUHK). He obtained his bachelor degree from Wuhan University, China. He is interested in various aspects of big data management, such as system building, theoretical foundations and domain specific applications.
Pavan Kapanipathi (Wright State University)
Title: Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Sourced Knowledge Bases
Due to the increased adoption of social web, users, specifically Twitter users are facing information overload. Unless a user is willing to restrict the sources (eg number of followings), important information relevant to users' interests often go unnoticed. The reasons include (1) the postings may be at a time the user is not looking for; (2) the user unaware and hence not following the information source; (3) and the information arrives at a rate at which the user cannot consume. Furthermore, some information that are temporally relevant, discovered late might be of no use.
My research addresses these challenges by
(1) Generating user profiles of interests from Twitter using Wikipedia. The interests gleaned from users' Twitter data can be leveraged by personalization and recommendation systems in order to reduce information overload/Volume for users.
(2) Filtering twitter data relevant to dynamically evolving entities. Including Volume, this addresses the velocity challenge in delivering relevant information in real-time. The approach is deployed on Twitris to crawl for dynamic event-relevant tweets for analysis. The prominent aspect of the approaches is the use of crowd-sourced knowledge-base such as Wikipedia.
Pavan Kapanipathi is a PhD student at Knoesis Center-Ohio Center of Excellence in Knowledge Enabled Computing under the guidance of Dr. Amit Sheth. He is a web enthusiast and enjoy working with big data, especially in Social Web and Semantic Web areas. His work focuses on addressing the information overload problems on the Social Web by leveraging Wikipedia.