|
|
The objective of this one-day workshop is to investigate opportunities in accelerating data management
systems and workloads (which include traditional OLTP, data
warehousing/OLAP, ETL,
Streaming/Real-time, Business
Analytics, and XML/RDF Processing) using processors (e.g., commodity and specialized Multi-core, GPUs,
FPGAs, and ASICs), storage systems (e.g., Storage-class Memories like SSDs and
Phase-change Memory), and
programming models like MapReduce, GraphLab, CUDA, OpenCL, and OpenACC.
The current data management scenario is characterized by the following trends: traditional OLTP and
OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data,
complex queries under real-time constraints, etc.); applications are becoming far more distributed, often
consisting of different data processing components; non-traditional domains such as bio-informatics, social
networking, mobile computing, sensor applications, gaming are generating growing quantities of data of
different types; economical and energy constraints are leading to greater consolidation and virtualization
of resources; and analyzing vast quantities of complex data is becoming more important than traditional
transactional processing.
At the same time, there have been tremendous improvements in the CPU and memory technologies.
Newer processors are more capable in the CPU and memory capabilities and are optimized for multiple
application domains. Commodity systems are increasingly using multi-core processors with more than 6
cores per chip and enterprise-class systems are using processors with 8 cores per chip,
where each core can execute
upto 4
simultaneous threads. Specialized
multi-core processors such as the GPUs have brought the computational capabilities of supercomputers
to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are
becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change
memory are on the near-term horizon and can be game-changers in the way data is stored and processed.
In spite of the trends, currently there is limited usage of these technologies in data management domain.
Naive usage of multi-core processors or SSDs often leads to unbalanced system. It is therefore important to
evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This
workshop aims to understand impact of modern hardware technologies on accelerating core components
of data management workloads. Specifically, the workshop hopes to explore the interplay between overall
system design, core algorithms, query optimization strategies, programming approaches, performance modelling
and evaluation, etc., from the perspective of data management applications.
The suggested topics of
interest include, but are not restricted to:
- Hardware and System Issues in Domain-specific Accelerators
- New Programming Methodologies for Data Management Problems on Modern Hardware
- Query Processing for Hybrid Architectures
- Large-scale I/O-intensive (Big Data) Applications
- Parallelizing/Accelerating Analytical (e.g., Data Mining) Workloads
- Autonomic Tuning for Data Management Workloads on Hybrid Architectures
- Algorithms for Accelerating Multi-modal Multi-tiered Systems
- Energy Efficient Software-Hardware Co-design for Data Management Workloads
- Parallelizing non-traditional (e.g., graph mining) workloads
- Algorithms and Performance Models for modern Storage Sub-systems
- Exploitation of specialized ASICs
- Novel Applications of Low-Power Processors and FPGAs
- Exploitation of Transactional Memory for Database Workloads
- Exploitation of Active Technologies (e.g., Active Memory, Active
Storage, and Networking)
8.45 am- 5 pm, Dragon
Hotel, Diamond 4
8.45 am: Welcome
Comments
8.50-10.20 am: Keynote by Prof. Dhabaleswar
K. (DK) Panda,
The Ohio State University
Accelerating Data Management
and Processing on Modern
Clusters with RDMA-Enabled Interconnects
(slides)
Bio: Dhabaleswar K. (DK) Panda is a Professor of Computer Science and
Engineering at the Ohio State University. He has published over 300
papers in major journals and international conferences. Dr. Panda and
his research group members have been doing extensive research on
modern networking technologies including InfiniBand, High-Speed
Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The
MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and
MVAPICH2-X software libraries, developed by his research group
(http://mvapich.cse.ohio-state.edu ), are currently being used by more
than 2,150 organizations worldwide (in 72 countries). This software
has enabled several InfiniBand clusters to get into the latest TOP500
ranking during the last decade. More than 210,000 downloads of this
software have taken place from the project's website alone. This
software package is also available with the software stacks of many
network and server vendors, and Linux distributors. The new
RDMA-enabled Apache Hadoop package, consisting of acceleration for
HDFS, MapReduce and RPC, is publicly available from
http://hadoop-rdma.cse.ohio-state.edu . Dr. Panda's research has been
supported by funding from US National Science Foundation, US
Department of Energy, and several industry including Intel, Cisco,
Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow
and a member of ACM.
Abstract: Managing and processing large volumes of data is a significant
challenge being faced by the Big Data community. This has substantial
impact on designing and utilizing modern data management and
processing systems in multiple tiers, from the front-end data
accessing and serving to the back-end data analytics. This scenario
has led to many emerging Big Data middleware systems to emerge, such
as Memcached, HBase, Hadoop, and Spark. The design and deployment of
modern clusters during the last decade has largely been fueled by the
following three factors: 1) advances in multi-core/many-core
technologies and accelerators, 2) Remote Direct Memory Access
(RDMA)-enabled networking (InfiniBand and RoCE), and 3) Solid State
Drives (SSDs). However, current Big Data middleware and the
associated applications are not able to fully take advantage of these
advanced features on modern clusters. This talk will examine
opportunities and challenges in accelerating performance of Big Data
middleware (including Memcached, HBase, Hadoop, and Spark) in
different data management and processing tiers with the latest
technologies available on modern clusters. An overview of the
High-Performance Big Data project (http://hibd.cse.ohio-state.edu)
will be presented. High-performance designs using RDMA to accelerate
Memcached, HBase, Hadoop, and Spark frameworks on InfiniBand and RoCE
clusters will be demonstrated. The presentation will also include
initial results on optimizing performance of Memcached with SSD
support. An overview of a set of benchmarks (OSU HiBD Benchmarks,
OHB) to evaluate performance of different components in an isolated
manner will be presented.
10.20-10.30 am:
Break
Session 1: Compute Acceleration (10.30 am -12.15 pm)
-
Multipredicate Join Algorithms
for Accelerating Relational Graph Processing on GPUs,
Haicheng Wu (Georgia Institute of Technology), Daniel Zinn, Molham
Aref, (LogicBox Inc.) and Sudhakar Yalamanchili (Georgia Institute
of Technology) (slides)
- Data Parallel Quadtree
Indexing and Spatial Query Processing of Complex Polygon Data on GPUs,
Jianting Zhang, Simin
You, (City University of New York) and Le Gruenwald (The University
of Oklahoma) (slides)
-
HASHI: An Application Specific
Instruction Set Extension for Hashing,
Oliver Arnold, Sebastian Haas, Gerhard
Fettweis and Benjamin Schlegel, Thomas Kissinger, Tomas Karnagel,
and Wolfgang Lehner (Technische Universität Dresden) (slides)
-
QTM: Modelling Query Execution
with Tasks,
Steffen Zeuch and Johann-Christoph Freytag
(Humboldt Universität zu Berlin) (slides)
12.15-1.30 pm: Lunch
Session 2:
Memory/Storage Acceleration (1:30-3.15 pm)
-
Flash-Conscious Cache Population
for Enterprise Database Workloads,
Hyojun Kim (IBM Research, Almaden),
Ioannis Koltsidas,
Nikolas Ioannou (IBM Research, Zurich), Sangeetha Seshadri, Paul Muech, Clement
Dickey, and Lawrence Chiu (IBM Research, Almaden) (slides)
- A Prolegomenon on OLTP
Database Systems for Non-Volatile Memory,
Justin Debrabant (Brown University), Joy
Arulraj, Andrew Pavlo, (Carnegie Mellon University), Michael
Stonebraker (MIT CSAIL), Stan Zdonik (Brown University) and
Subramanya Dulloor (Intel Labs) (slides)
-
An Approach for Hybrid-Memory
Scaling Columnar In-Memory Databases,
Bernhard Höppner (SAP AG), Ahmadshah Waizy,
(Fujitsu Technology Solutions GmbH) and Hannes
Rauhe (SAP AG) (slides)
-
ERIS: A NUMA-Aware In-Memory
Storage Engine for Analytical Workload,
Thomas Kissinger, Tim Kiefer, Benjamin
Schlegel, Dirk Habich, Daniel Molka, and Wolfgang Lehner (Technische
Universität Dresden) (slides)
3.15-3.30 pm: Break
3.30-5 pm: Keynote by Tirthankar
Lahiri, Oracle Corp.
Oracle's In-Memory Data Management Strategy: In-Memory in all Tiers, and for all Workloads
(slides)
Bio: Tirthankar Lahiri is the Vice President of Development at Oracle, and is
responsible for the Data Technologies area for the Oracle Database
(this area coves Data, Space, and Transaction management) as well as
the Oracle TimesTen In-Memory Database. Tirthankar has 18 years of
experience in the Database industry. He has worked extensively in a
variety of Database Systems areas, for which he holds multiple
patents: Manageability, Performance, Scalability, High Availability,
Caching, Distributed Concurrency Control, In-Memory Data Management,
etc. Tirthankar has a B.Tech in Computer Science from IIT, Kharagpur,
and an MS in Electrical Engineering from Stanford University. He was
in the PhD program at Stanford and his research areas included
Multiprocessor Operating Systems and Semi-Structured Data.
Abstract: We describe Oracle's pragmatic two-pronged approach for
delivering In-Memory Technology to both OLTP as well as Analytics
usecases. On the one hand, Oracle TimesTen is a specialized,
memory-resident relational database, designed for ultra-low response
time. TimesTen typically runs within the application-tier as an
embeddable database and is deployed by thousands of customers
requiring low-latency database access. On the other-hand, the new
Oracle 12c Database In-Memory option delivers general-purpose
in-memory capabilities for the vast range of enterprise applications
and massive database sizes supported by Oracle Database. Since the
In-Memory option is built seamlessly into the Oracle Database engine,
it is fully compatible with all of the functionality and high
availability mechanisms of the Oracle Database, and can be used by
applications without any changes whatsoever. The In-Memory option
provides a unique dual-format row/column in-memory representation thus
avoiding the tradeoffs inherent in single-format in-memory
databases. Unlike traditional in-memory databases, the new In-Memory
option does not limit the size of the database to the size of
available DRAM: numerous optimizations spanning DRAM, Flash, Disk,
as well as machines in a RAC cluster, allow databases of virtually
unlimited size. We describe the new In-Memory option in the context of
the full spectrum of Oracle's numerous storage optimizations, showing
that in-memory data management is an important and natural
evolutionary enhancement to the existing deep technology stack of the
Oracle Database.
- Paper Submission: Friday, June 27,
2014, 11.59 pm PST. (Submission is closed.)
- Notification of Acceptance: Friday, July 18, 2014
- Camera-ready Submission: Monday, July 28, 2014
- Workshop Date: Monday, September 1, 2014
The workshop proceedings will be published by VLDB and indexed via DBLP.
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the VLDB14
conference. You can find them here.
It is the authors' responsibility to ensure that
their submissions adhere
strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
The paper length is limited to 12
pages. Submissions of lesser
length are acceptable as long
as they adhere to the VLDB format.
Workshop Co-Chairs
For questions regarding the workshop please send email to contact@adms-conf.org.
Program Committee
- Gustavo Alonso, ETH Zurich
- Nipun Agarwal, Oracle Labs
- T. Araki, NEC
- Sean Baxter, Nvidia
- David Cunningham, Google
- Christophe Dubach, University of Edinburgh
- Blake Fitch, IBM Watson Research
- Franz Faerber, SAP
- Arun Jagatheesan, Samsung
- Kajan Kanagaratnam, IBM Toronto
- Alfons Kemper, TU Munich
- Rajaram Krishnamurthy, IBM
- Qiong Luo, HKUST
- Stefan Manegold, CWI
- C. Mohan, IBM Almaden Research
- Nadathur Satish, Intel
- Sayantan Sur, Intel
- Xiaodong Zhang, Ohio State University
|
|
|