|
|
The objective of this one-day workshop is to investigate opportunities in accelerating data management
systems and workloads (which include traditional OLTP, data
warehousing/OLAP, ETL,
Streaming/Real-time, Business
Analytics, and XML/RDF Processing) using processors (e.g., commodity and specialized Multi-core, GPUs,
FPGAs, and ASICs), storage systems (e.g., Storage-class Memories like SSDs and
Phase-change Memory), and
programming models like MapReduce, Spark, CUDA, OpenCL, and OpenACC.
The current data management scenario is characterized by the following trends: traditional OLTP and
OLAP/data warehousing systems are being used for increasing complex workloads (e.g., Petabyte of data,
complex queries under real-time constraints, etc.); applications are becoming far more distributed, often
consisting of different data processing components; non-traditional domains such as bio-informatics, social
networking, mobile computing, sensor applications, gaming are generating growing quantities of data of
different types; economical and energy constraints are leading to greater consolidation and virtualization
of resources; and analyzing vast quantities of complex data is becoming more important than traditional
transactional processing.
At the same time, there have been tremendous improvements in the CPU and memory technologies.
Newer processors are more capable in the CPU and memory capabilities and are optimized for multiple
application domains. Commodity systems are increasingly using multi-core processors with more than 6
cores per chip and enterprise-class systems are using processors with 8 cores per chip,
where each core can execute
upto 4
simultaneous threads. Specialized
multi-core processors such as the GPUs have brought the computational capabilities of supercomputers
to cheaper commodity machines. On the storage front, FLASH-based solid state devices (SSDs) are
becoming smaller in size, cheaper in price, and larger in capacity. Exotic technologies like Phase-change
memory are on the near-term horizon and can be game-changers in the way data is stored and processed.
In spite of the trends, currently there is limited usage of these technologies in data management domain.
Naive usage of multi-core processors or SSDs often leads to unbalanced system. It is therefore important to
evaluate applications in a holistic manner to ensure effective utilization of CPU and memory resources. This
workshop aims to understand impact of modern hardware technologies on accelerating core components
of data management workloads. Specifically, the workshop hopes to explore the interplay between overall
system design, core algorithms, query optimization strategies, programming approaches, performance modelling
and evaluation, etc., from the perspective of data management applications.
The suggested topics of
interest include, but are not restricted to:
- Hardware and System Issues in Domain-specific Accelerators
- New Programming Methodologies for Data Management Problems on Modern Hardware
- Query Processing for Hybrid Architectures
- Large-scale I/O-intensive (Big Data) Applications
- Parallelizing/Accelerating Analytical (e.g., Data Mining) Workloads
- Autonomic Tuning for Data Management Workloads on Hybrid Architectures
- Algorithms for Accelerating Multi-modal Multi-tiered Systems
- Energy Efficient Software-Hardware Co-design for Data Management Workloads
- Parallelizing non-traditional (e.g., graph mining) workloads
- Algorithms and Performance Models for modern Storage Sub-systems
- Exploitation of specialized ASICs
- Novel Applications of Low-Power Processors and FPGAs
- Exploitation of Transactional Memory for Database Workloads
- Exploitation of Active Technologies (e.g., Active Memory, Active
Storage, and Networking)
Every year, we choose a theme around which the keynote or panel
sessions are organized. This year, the workshop theme is
Interactions of Processor Architecture with Data Management
and Analytics".
There will be three keynote presentations at the ADMS workshop.
Tim Mattson,
Principal Engineer,
Intel Parallel
Computing Lab, will talk
about, "Graph analytics with lots of CPU cores". Rick
Hetherington, Vice
President, Oracle
Hardware Development,
will talk about "SPARC
at Oracle: Vectoring
Processor Architecture
at the
Database". Finally,
Haider Rizvi,
Distinguished Engineer,
IBM Systems and
Technology Group, will
talk about, "IBM Power8: a processor built for big data".
Graph analytics with lots of CPU cores,
Tim Mattson, Intel
Parallel Computing Lab
Data analytics stress every
facet of a microprocessor;
from the details of the cache
hierarchy, the instruction set
of the processor cores, and
the network between cores. In
this talk, we will explore
modern CPU designs through the
lens of sparse matrix
multiplication; one of the
most important primitives for
graph analytics. We’ll
discuss vector instruction
sets (such as the Intel®
AVX-512 instructions) and how
to write portable code to
exploit them. Then we’ll
follow the behavior of the
sparse matrix multiplication
as we move away from a single
core to multiple cores on a
processor, across the memory
controller to the memory
subsystem, and finally the
network between nodes in a
cluster. The result is an
end-to-end understanding of
how graph analytics (as
represented through sparse
matrix multiplication)
interacts with the features of
a modern CPU based system for
data analytics.
SPARC at Oracle: Vectoring
Processor Architecture at the
Database, Rick
Hetherington, Oracle
SPARC has a long history and has gone through numerous transformations
over the past 26 years. As part of the initial debate of RISC vs CICS, SPARC was
primarily deployed in workstations and technical applications. The name SPARC is
an acronym for Scalable Processor ARChitecture and it soon found scalability was
a key asset lending itself perfectly to commercial applications, primarily database.
Contemporary SPARC processors have their roots in the multicore/multithread (CMT)
approach taken by the Niagara processors released in 2004. The target of CMT was
simply database. SPARC has flourished at Oracle with 20X performance rise culminating
in the set of features known as Software-in-Silicon. This talk will
focus on SPARC since the Oracle acquisition of Sun Microsystems in
2010. It will cover the process of developing world class silicon
targeted at database as well
as a ‘clouded' glimpse at what lies ahead.
IBM Power8: a Processor Built
for Big Data, Haider
Rizvi, IBM
IBM's POWER processors are the workhorses for all of IBM's Unix
servers. IBM Unix servers lead the industry in capabilities and
performance, especially for data intensive workloads, such as
databases, app servers, etc. The current generation of POWER8-based
systems have especially been designed for big data needs, with large
caches, memory bandwidth, and abilities to enhance the processor's
capabilities with CAPI (Coherent Attach Processor Interface)
accelerators. In this presentation, I'll talk about the design
decisions that enable big data analytics on these servers, and the
optimization efforts that deliver industry-leading performance on data
intensive workloads, such as data warehousing with DB2 BLU, Watson,
Spark, etc. I'll also talk about the use of SIMD (single-instruction
multiple-data) capabilities, exploiting the larger number of cores and
hardware threads available on the Power processors.
Session 1: 9.00-10.30 am
9.00 am: Welcome
Comments
-
9.00-10.00 am:
"Graph analytics with lots of CPU cores",
Tim Mattson, Intel Parallel Computing
Lab (Keynote
Presentation) (slides)
-
10.00-10.25 am:
"The Operator Variant Selection Problem on Heterogeneous Hardware",
Viktor Rosenfeld, Max
Heimel,Christoph Viebig and
Volker Markl, TU Berlin
(slides)
10.30-11.00 am:
Break
Session 2 (11.00 am -12.30 pm)
- 11.00-11.25 am:
"Towards Dynamic Green-Sizing for Database Servers",
Mustafa Korkmaz, Alexey
Karyakin, Martin Karsten and Kenneth Salem, University of Waterloo (slides)
-
11.30-11.55 am: "Dynamic and Transparent Data
Tiering for In-Memory Databases in Mixed Workload Environments",
Carsten Meyer, Martin Boissier, Adrian
Michaud, Jan Ole Vollmer, Ken Taylor, David Schwalb, Matthias
Uflacker and Kurt Roedszus, HPI and EMC Corp. (slides)
-
12-12.25 pm:
"Towards Adaptive Resource Allocation for Database Workloads",
Cong Guo and Martin Karsten, University of Waterloo (slides)
12.30-2 pm: Lunch
Session 3: (2.00-3.30 pm)
-
2.00-2.25 pm:
"nvm malloc: Memory Allocation for NVRAM",
David Schwalb, Tim Berning, Martin Faust,
Markus Dreseler and Hasso Plattner, HPI, and SAP (slides)
-
2.30-3.30 pm:
"SPARC at Oracle: Vectoring
Processor Architecture at the
Database",
Rick
Hetherington, Oracle (Keynote
Presentation)
3.30-4.00 pm: Break
Session 4 (4.00 -5.30 pm)
-
4.00-4.25 pm:
"Optimizing GPU-accelerated
Group-By and Aggregation",
Tomas Karnagel, Rene Mueller and Guy Lohman,
IBM Almaden Research Center (slides)
-
4.30-5.30 pm:
"IBM Power8: a Processor Built
for Big Data",
Haider Rizvi, IBM Systems and
Technology Group. (Keynote
Presentation)
Workshop Co-Chairs
For questions regarding the workshop please send email to contact@adms-conf.org.
Program Committee
- Reza Azimi, Huawei
- Nipun Agarwal, Oracle Labs
- Robert Halstead, University of California, Riverside
- Rashed Bhatti, IBM Almaden Research
- Christoph Dubach, University of Edinburgh
- Franz Faerber, SAP
- Arno Jacobsen, University of Toronto
- Hyojun Kim, Datos IO, Inc
- Thomas Kissinger, TU Dresden
- Qiong Luo, HKUST
- Stefan Manegold, CWI
- Sina Merji, IBM Toronto
- Duane Merrill, Nvidia
- Rupesh Nasre, IIT Madras
- Mohammad Sadoghi, IBM Watson Research
- Nadathur Satish, Intel
- Sayantan Sur, Intel
- Sudhakar Yalamanchili, Georgia Tech
- Pinar Tozun, IBM Almaden Research
- Jianting Zhang, CUNY
- Paper Submission: Monday, June 15,
2015, 11.59 pm PST.
- Notification of Acceptance: Wednesday, July 1, 2015
- Camera-ready Submission: Friday, July 17, 2015
- Workshop Date: Monday, August 31, 2015
The workshop proceedings will be published by VLDB and indexed via DBLP.
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the VLDB15 conference. You can find them here.
It is the authors' responsibility to ensure that
their submissions adhere
strictly to the VLDB format detailed here. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
The paper length for a full paper is limited to 12
pages.
|
|
|