|
ADMS
2013 Fourth International Workshop on
Accelerating Data Management Systems Using Modern Processor and
Storage Architectures
Monday, August 26,
2013 In conjunction with VLDB
2013 Riva del Garda, Trento, Italy
|
|
|
|
|
|
|
|
The objective of this one-day workshop is to investigate opportunities in accelerating data management
systems and workloads (which include traditional OLTP, data
warehousing/OLAP, ETL,
Streaming/Real-time, Business
Analytics, and XML/RDF Processing) using processors (e.g., commodity and specialized Multi-core, GPUs, and
FPGAs), storage systems (e.g., Storage-class Memories like SSDs and
Phase-change Memory), and
hybrid
programming models like CUDA, OpenCL, and OpenACC.
The current data management scenario is characterized by the following trends: traditional OLTP and
OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data,
complex queries under real-time constraints, etc.); applications are becoming far more distributed, often
consisting of different data processing components; non-traditional domains such as bio-informatics, social
networking, mobile computing, sensor applications, gaming are generating growing quantities of data of
different types; economical and energy constraints are leading to greater consolidation and virtualization
of resources; and analyzing vast quantities of complex data is becoming more important than traditional
transactional processing.
At the same time, there have been tremendous improvements in the CPU and memory technologies.
Newer processors are more capable in the CPU and memory capabilities and are optimized for multiple
application domains. Commodity systems are increasingly using multi-core processors with more than 4
cores per chip and enterprise-class systems are using processors with 8 cores per chip,
where each core can execute
upto 4
simultaneous threads. Specialized
multi-core processors such as the GPUs have brought the computational capabilities of supercomputers
to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are
becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change
memory are on the near-term horizon and can be game-changers in the way data is stored and processed.
In spite of the trends, currently there is limited usage of these technologies in data management domain.
Naive usage of multi-core processors or SSDs often leads to unbalanced system. It is therefore important to
evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This
workshop aims to understand impact of modern hardware technologies on accelerating core components
of data management workloads. Specifically, the workshop hopes to explore the interplay between overall
system design, core algorithms, query optimization strategies, programming approaches, performance modelling
and evaluation, etc., from the perspective of data management applications.
The suggested topics of
interest include, but are not restricted to:
- Hardware and System Issues in Domain-specific Accelerators
- New Programming Methodologies for Data Management Problems on Modern Hardware
- Query Processing for Hybrid Architectures
- Large-scale I/O-intensive (Big Data) Applications
- Parallelizing/Accelerating Analytical (e.g., Data Mining) Workloads
- Autonomic Tuning for Data Management Workloads on Hybrid Architectures
- Algorithms for Accelerating Multi-modal Multi-tiered Systems
- Energy Efficient Software-Hardware Co-design for Data Management Workloads
- Parallelizing non-traditional (e.g., graph mining) workloads
- Algorithms and Performance Models for modern Storage Sub-systems
- Data Layout Issues for Modern Memory and Storage Hierarchies
- Novel Applications of Low-Power Processors (e.g., ARM Processor
based systems)
- New Benchmarking Methodologies for Storage-class Memories
Hadoop: Past, Present, and (possibly) Future
Milind Bhandarkar, Chief
Scientist, Machine Learning
Platforms,
Pivotal
Inc. (Slides)
Bio: Milind Bhandarkar was the founding member of the team at Yahoo!
that took Apache Hadoop from 20-node prototype to datacenter-scale
production system, and has been contributing and working with Hadoop
since version 0.1.0. He started the Yahoo! Grid solutions team focused
on training, consulting, and supporting hundreds of new migrants to
Hadoop. Parallel programming languages and paradigms has been his area
of focus for over 20 years, and his area of specialization for PhD
(Computer Science) from University of Illinois at Urbana-Champaign. He
worked at the Center for Development of Advanced Computing (C-DAC),
National Center for Supercomputing Applications (NCSA), Center for
Simulation of Advanced Rockets, Siebel Systems, Pathscale
Inc. (acquired by QLogic), Yahoo! and Linkedin. Currently, he is the
Chief Scientist, Machine Learning Platforms, at Pivotal Inc. >
Active Storage: Exploring a
Scalable, Compute-In-Storage
model by extending the Blue
Gene/Q architecture with
Integrated Non-volatile Memory
Blake Fitch, Senior Technical
Staff Member, IBM T. J. Watson
Research Center (Slides)
Bio: Blake G. Fitch first joined the IBM T.J. Watson Research Center
in 1985 as a student. He
received his bachelor's degree
in Computer Science from Antioch College in 1987. From 1987 until
present, he has remained with IBM to pursue interests in distributed
and parallel systems. In 1990 he joined the Scalable Parallel Systems
group, contributing to the research and development that culminated in
the IBM scalable parallel system (SP) product. Since then, his
research interests have focused on application frameworks and
programming models suitable for production parallel computing
environments. Practical application of this work includes
contributions to the transputer based control system for IBM's CMOS
S/390 mainframes (IBM Boeblingen, Germany 1994) and the architecture
of IBM's Automatic Fingerprint Identification System parallel
application (IBM Hursley, UK, 1996). In 1999, he joined the Blue Gene
Project as the application architect for BlueMatter, a scalable
classical molecular dynamics package. Mr. Fitch is currently a Senior
Technical Staff Member at IBM Research and is the architect and
technical lead for the Active Storage project. The Active Storage
project aims to integrate non-volatile memory into highly scalable
parallel system architectures(currently IBM Blue Gene/Q) and to
explore system software and applications that leverage the new
capabilities of such systems.
9 am-5.30 pm,
Meeting Room 300
8.50 am: Welcome Comments
9-10.30 am:
Keynote by
Milind Bhandarkar, Chief
Scientist, Machine Learning
Platforms,
Pivotal Inc.
Hadoop: Past, Present, and (possibly) Future
(Slides)
Apache Hadoop has rapidly become the de facto data processing platform, and is often mentioned synonymously with "Big Data". Hadoop started as a project within Apache Lucene and Nutch to scale the content backend for web search engine. However, it is currently being used in majority of Fortune 500 companies, in many other application domains, such as fraud detection at credit card companies, healthcare analytics, churn detection and prevention at Telecom companies etc. In this talk, we will reminisce about the early days of Hadoop at Yahoo, and lessons learned in scaling this platform from a 20-node prototype to a datacenter-wide production deployment. We will give an overview of the current state of Hadoop ecosystem, and present some prominent patterns and use cases of this platform. We will also discuss how Hadoop is evolving, and its future as a platform for "Big Data" processing.
10.30-11 am:
Coffee Break
11 am-12.30 pm
noon Session 1: Compute Optimizations
12.30-2.00 pm: Lunch
2.00-3.30 pm Session
2: Memory/Storage Optimizations
3.30-4.00 pm: Coffee
Break
4.00-5.30 pm:
Keynote by Blake Fitch, Senior Technical Staff Member, IBM T. J. Watson Research Center
Active Storage: Exploring a Scalable, Compute-In-Storage model by extending the Blue Gene/Q architecture with Integrated Non-volatile Memory
(Slides)
Emerging storage class memories offer a set of challenges and
opportunities in system architecture, programming models, and
application design. We are exploring the close integration of emerging
solid-state storage technologies in conjunction with high performance
networks and integrated processing capability. Specifically, we
consider the extension of the Blue Gene/Q architecture by integrating
Flash into the node to enable a scalable, data-centric computing
platform. We are using BG/Q as a rapid prototyping platform allowing
us to build a research system based on an infrastructure with proven
scalability to thousands of nodes. Our work also involves enabling a
Linux environment with standard network interfaces on the BG/Q
hardware. We plan to explore applications of this system architecture
including existing file systems and middleware as well as more
aggressive compute-in-storage approaches. Compute-in-storage is
intended to enable the use of high performance (HPC) programming
techniques (MPI) to implement data-centric algorithms (e.g. sort,
join, graph) that execute on processing elements embedded within a
storage system. This presentation will review the architectural
extension to BG/Q, present a progress report on the project, and
describe some early results.
- Paper Submission: Friday, June 28, 2013 (Updated)
- Notification of Acceptance: Friday, July 19, 2013
- Camera-ready Submission: Monday, July 29, 2013
- Workshop Date: Monday, August 26, 2013
The workshop proceedings will be published by VLDB.
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the VLDB13
conference. You can find them here.
It is the authors' responsibility to ensure that their submissions adhere strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
The paper length is limited to 12
pages. Submissions of lesser
length are acceptable as long
as they adhere to the VLDB format.
Workshop Co-Chairs
For questions regarding the workshop please send email to contact@adms-conf.org.
Program Committee
- Peter Baumann, Jacobs University
- John Davis, Microsoft Research
- Gregory Diamos, Nvidia
- Christophe Dubach, University of Edinburgh
- Frank Dehne, Carleton University
- Maya Gokhale, Lawrence Livermore National Laboratory
- Francesco Fusco, ETH Zurich
- Tirthankar Lahiri, Oracle
- Alfons Kemper, TU Munich
- Rajaram Krishnamurthy, IBM
- Stefan Manegold, CWI
- C. Mohan, IBM Almaden Research
- Nadathur Satish, Intel
- Ji-Yong Shin, Cornell University
- Sayantan Sur, Intel
|
|
|
|
|