2013 Services Research Symposium
Building Bridges between Research and Practice
The Service Science Professional Interest Community (PIC) is hosting a Services Research Symposium on Tuesday, October 22, 2013 at the IBM T. J. Watson Research Center in Yorktown Heights, New York.
This symposium will bring together faculty, students and researchers focusing on topics such as big data, mobile computing, social network analysis, service optimization, human to computer interaction, cloud computing, scalable services ecosystems, and other areas related to the Service Science discipline.
The Symposium theme Building Bridges between Research and Practice highlights our expanding focus on research that has real value to and impact on people, businesses and the world. We are especially interested in research on improving how businesses interact with and provide services to its customers.
Invited Faculty Talks
A Methodology for Addressing Complex Service Systems by Professor Bill Rouse, Stevens Institute of Technology
Abstract: Complex service systems are typically laced with human behavioral and social phenomena embedded in physical and organizational contexts, both natural and designed. An overall methodology for addressing such systems will be presented. This approach emphasizes understanding the physical, organizational, economic, and political phenomena associated with the questions of interest. These phenomena, and relationships among them, are portrayed using immersive, interactive visualizations. This environment provides the foundation for identifying key tradeoffs underlying the questions of interest. The result is an understanding of places where deeper computational modeling is needed to resolve these tradeoffs. Examples from several domains will be discussed, including healthcare delivery and urban resilience.
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety, and Velocity using semantics and Semantic Web by Professor Amit Sheth, Wright State University
Abstract: Big Data has captured a lot of interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of Big Data, the raison d'etre, is none of these 4 Vs -- but value. In this talk, I will define the concept of Smart Data, and discuss how it can be realized by extracting value from a variety of data types (eg., social data, sensor data, health care data) that make up today’s Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.
Choosing the Next Experiment: Tradition, Innovation, and Efficiency in the Selection of Scientific Ideas by Professor James Evans, University of Chicago
Abstract: What factors affect a scientist's choice of research problem? Qualitative research in the history, philosophy, and sociology of science suggests that the choice is shaped by an 2essential tension2 between a professional demand for productivity and a conflicting drive toward risky innovation. We examine this tension empirically in the context of biomedical chemistry. We use complex networks to represent the evolving state of scientific knowledge, as expressed in digital publications. We then build measurements and a model of scientific discovery informed by key properties of this network. Measuring such choices in aggregate, we find that the distribution of strategies remains remarkably stable, even as chemical knowledge grows dramatically. High-risk strategies, which explore new chemical relationships, are less prevalent in the literature, reflecting a growing focus on established knowledge at the expense of new opportunities. Research following a risky strategy is more likely to be ignored but also more likely to achieve high impact and recognition. While the outcome of a risky strategy has a higher expected reward than the outcome of a conservative strategy, the additional reward is insufficient to compensate for the additional risk. By studying the winners of major prizes, we show that the occasional "gamble" for extraordinary impact is the most plausible explanation for observed levels of risk-taking. To examine efficiency in scientific search, we build a model of scientific discovery informed by key properties of this network, namely node degree and inter-node distance. We infer the typical research strategy in biomedical chemistry from 30 years of publications and patents and compare its efficiency with thousands of alternatives. Strategies of chemical discovery are similar in articles and patents, conservative in their neglect of low-degree, distant or disconnected chemicals, and efficient only for initial exploration of the network of chemical relationships. We identify much more efficient strategies for maturing fields.
On Using Simulations to Evaluate Hadoop Cluster Design by Professor Ali Butt, Department of Computer Science, Virginia Tech
Abstract: MapReduce/Hadoop has emerged as a model of choice for supporting modern data-intensive applications, and is a key enabler for cloud computing. Setting up and operating a large Hadoop cluster entails careful evaluation of various design choices and run-time parameters to achieve high efficiency. However, this design space has not been explored in detail. In this talk, I will discuss a simulation approach to systematically understanding the performance of Hadoop setups. I will present MRPerf, a toolkit that captures such aspects of Hadoop clusters as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance. The overall goal is to realize a tool for optimizing existing Hadoop clusters as well as explore new Hadoop resource management designs.
A Two-Stage Model of Consideration Set and Choice: Learning, Revenue Prediction, and Applications by Professor Srikanth Jagabathula, Stern School of Business, New York University
Abstract: A common operational problem in many business applications is the accurate prediction of demand shares of differentiated products in response to variations in the offer set and prices. The main challenge in predicting demand shares accurately is to isolate substitution, due to price variation from substitution, due to stock-outs, (absence of a product from the offer set), from the observed demand. Existing approaches either offer flexibility in the price-elasticity patterns they can capture or the ability to simultaneously account for substitution, due to price variation and stock-outs, but not both. Motivated by this limitation, this paper proposes a general two-stage model class of consideration set and choice. Under this model, each consumer first evaluates the products to form a smaller subset for consideration, and then makes a purchase decision from the consideration set. The model simultaneously captures the effect of substitution, due to stock-outs and price variation. We propose techniques to estimate our models from historical transaction data, which comprise observed sales for each of the products when offered at different price vectors. We study estimation procedures for two model instances in detail: a parametric instance and a semi-parametric instance. For both the model instances, we derive sample complexity bounds and prove that our estimation technique is computationally efficient. In addition to providing an appealing combination of model flexibility and analytical tractability, we demonstrate through numerical experiments, based on real-world transaction data on sales of television sets from a retailer and synthetic transaction data, that our semi-parametric method obtains a close to 30% improvement in prediction accuracy over the existing methods.
Story Matching Technologies for Cyberbullying Prevention by Jamie Macbeth, Postdoctoral Research Associate Massachusetts Institute of Technology
Abstract: While the Internet and social media help keep today’s youth better connected to their friends, family, and community, the same media are also the form of expression for an array of harmful social behaviors, such as cyberbullying and cyber-harassment. The main topic of this talk will be our work to develop intelligent interfaces to social media that use commonsense knowledge bases and automated narrative analyses of text communications between users to trigger selective interventions and prevent negative outcomes. While other approaches seek merely to classify the overall topic of the text, we match stories to finer-grained “scripts” that represent stereotypical events and actions. For example, many bullying stories can be matched to a “revenge” script that describes trying to harm someone who has harmed you. These tools have been implemented in an initial prototype system and tested on a database of real stories of cyberbullying collected on MTV’s “A Thin Line” Web site.
Invited Student Talks
Modeling Supply Chain System Structure to Trace Sources of Food Contamination by Abby Horn, Massachusetts Institute of Technology
Abstract: Abby’s research focus is quickly identifying the source of large scale, multi-state outbreaks of food borne illness. Current investigative methods are slow, resource intensive, and constrained by incomplete manually collected data. Abby is analytically modeling stylized versions of the problem of traceback on food supply chain network structures to derive exact theoretical results that lead to new, general insights. Concurrently, she is building probabilistic network models of the production, transportation and distribution of selected food products, using data from the USDA and industry sources, to derive algorithms to trace back to the sites where contamination is likely to have taken place. Outcomes of past outbreaks of food borne illness, adjusted by expert guidance, will help calibrate the models, which will then be used to project forward to plausible future outbreaks. The practical objective of this work will be a planning tool enabling public health and emergency preparedness officials to make informed trace-back policy decisions.
Modeling the Dynamics of Product Adoption by Qing Jin, Northeastern University
Abstract: While an ability to accurately predict product adoption patterns at a societal scale has important implications in a wide range of areas, it remains an often elusive task to find the mathematical laws that describe the 'success' of a given product. Here we compiled a rather comprehensive longitudinal mobile phone dataset by combining two related but distinct sources, allowing us to reconstruct a decade long adoption histories of more than 10,000 different handsets for 6 Million individuals. We derive a mechanistic model to capture the adoption dynamics of individual handsets, indicating that the dynamics of product adoption follow highly reproducible patterns. Our results provide significant insights into early signatures of a hit product as well as the role social influence plays in this process.
Preferences, Homophily and Social Learning by Evan Sadler, New York University
Abstract: We study a model of social learning in networks where agents have heterogeneous preferences, and neighbors tend to have similar preferences—a phenomenon known as homophily. Using this model, we resolve a puzzle in the literature: theoretical models predict that preference diversity helps learning, and homophily slows learning, while empirical work suggests the opposite. We ﬁnd that the density of network connections determines the impact of preference diversity and homophily on learning. When connections are sparse, diverse preferences are harmful to learning, and homophily may lead to substantial improvements. In a dense network, preference diversity is beneﬁcial. The conﬂicting ﬁndings in prior work result from a focus on networks with different densities; theory has focused on dense networks, while empirical papers have studied sparse networks. Our results suggest that in complex networks containing both sparse and dense components, diverse preferences and homophily play complementary, beneﬁcial roles.
Evaluating Cognitive Workload for 3D Volumetric Scientific Visualization by Jamal Thorne, Morehouse College
Abstract: Studies have shown that the human brain possesses a narrow capacity for processing raw numbers, but an astonishingly wide capability for processing visual data. In order to capitalize on this phenomenon, scientists create visualizations out of the data they collect. These visualizations give scientists a new perspective on the data, allows them to quickly grasp the "big picture", interpolate any missing data, quickly create or modify data sets by manipulating graphic objects on screen, and spot errors and inconsistencies in massive data sets more easily than compared to working with raw data. Many visualization techniques are designed in such a way that they demand more of viewer’s cognitive resources. These visualizations require viewers to concurrently divide their attention between multiple sources of information resulting in the Split-Attention Effect. This phenomenon increases cognitive load, which in turn decreases performance, impairs judgment, and negatively affects information processing abilities. If we can somehow gauge, evaluate, and measure user cognitive load using Brain Computer Interfaces while users interact with these visualizations, then we can analyze the effects various aspects of these visualizations have on its users and their cognitive load. This work is the precursor to the development of adaptive visualizations which adapt based on user cognitive load.
Recommending Resolutions for Problems Identified by Monitoring in IT Service by Liang Tang, Florida International University
Abstract: Service Providers are facing an increasingly intense competitive landscape and growing industry requirements. Modern service infrastructure management focuses on the development of methodologies and tools for improving the efficiency and quality of service. It is desirable to run a service in a fully automated operation environment. Automated problem resolution, however, is difficult. It is particularly difficult for the weakly-coupled service composition, since the coupling is not defined at design time. Monitoring software systems are designed to actively capture events and automatically generate incident tickets or event tickets. Repeating events generate similar event tickets, which in turn have a vast number of repeated problem resolutions likely to be found in earlier tickets. We apply a recommender system’s approach to the resolution of event tickets. This approach is based on the k-nearest neighbor algorithm that recommends the historical resolutions based on the similarity of their problem descriptions. If the historical ticket has a similar problem description with the income ticket, its resolution would be likely to recommend. In addition, we extend this methodology to take into account possible falsity of some tickets and the quality of the ticket resolutions. In service management, recommending the resolutions of false tickets for real incoming tickets would mislead the administrators. An additional penalty is introduced into the algorithm for minimizing this misleading in the results. Also, not every historical ticket has been effectively solved. Different ticket resolutions have different quality, which is not directly indicated in the data. Our algorithm also takes account of a quality estimation to maximize the usefulness of the recommended results. An extensive empirical evaluation on real ticket data sets demonstrates that the proposed algorithm achieves a high accuracy and quality with small misleading information.
A Web Application for the Design and Deployment of Conversational Agents by Jerome McClendon, Clemson University
Abstract: A conversational agent is a computer system capable of interacting with a human through natural language conversation. The interaction is organized in a turn-by-turn basis where contributions to a conversation are responses to what has previously been said. Previous studies done on conversational agents found that if the agent is unable to produce a relevant response to the user's last statement, the user may become frustrated and is less likely to use the system. The difficulty in retrieving a relevant response for a working conversational agent is tied to the system's ability to understand natural language. This is a difficult problem because language is imprecise, full of innuendos, idiosyncrasies, idioms and ambiguity and there is no simple set of rules that can be created and given to a machine so that it can understand all the possible inputs. We have developed a response retrieval technique that takes natural language input and matches it to the most appropriate response stored in a knowledge base using statistical and natural language processing methods. The response for that input is then returned back to the user as the most appropriate conversational response. We have also developed a web application with a graphical interface that allows developers to design their own conversational agent. The design process consists of creating a knowledge base that contains the set of potential sentences that the user might say to the agent. These potential sentences are referred to as queries. Every query is paired with a conversational response that is also stored in the knowledge base. Once the knowledge base has been created users can interact with the conversational agent using our web service to send the user's natural language input to the response retrieval technique running in the cloud. By offering our response retrieval technique as a web service we have allowed developers to focus on the design of the agent and not the creation of natural language understanding techniques.
Detecting & Analyzing Geographically Correlated Fields and Communities in Gowalla by Tommy Nguyen, Rensselaer Polytechnic Institute
Abstract: Detecting & Analyzing Geographically Correlated Friends and Communities in Gowalla Abstract : We collected friendship information and location data from a social media website called Gowalla to analyze the relationship between geographical space and friendship. First, we analyzed how geographic proximity shapes the structure of the social network by limiting joined activities among distant users. Second, we incorporated information about geographic locations that users visited into three selected community detection algorithms (Clique Percolation Method, Inference Algorithm, and GANXiS) to detect friendship communities where members are on average separated by one friendship link and also likely to be close to each other geographically. Third, we proposed a technique to generate covers of fixed sizes by using a combination of social and geographic information for the purpose of comparing them to communities detected by the selected algorithms. Finally, we used community quality measurements based on friendship link connectivity and geographic locations visited by users to examine detected communities for the purpose of understanding how the geographic proximity of friends affected the results.