Paper Digest: SIGMOD 2024 Papers & Highlights
Interested users can choose to read all SIGMOD-2024 papers in our digest console, which supports more features.
To search for papers presented at SIGMOD-2024 on a specific topic, please make use of the search by venue (SIGMOD-2024) service. To summarize the latest research published at SIGMOD-2024 on a specific topic, you can utilize the review by venue (SIGMOD-2024) service. To synthesizes the findings from SIGMOD 2024 into comprehensive reports, give a try to SIGMOD-2024 Research. If you are interested in browsing papers by author, we have a comprehensive list of all SIGMOD-2024 authors & their papers.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: SIGMOD 2024 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | AirIndex: Versatile Index Tuning Through Data and Storage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we develop a systematic method for finding those optimal design parameters? Ideally, the method must have the potential to generate almost any existing index or a novel combination of them for the fastest possible lookup.In this work, we present new data and an I/O-aware index builder (called AirIndex) that can find high-speed hierarchical index designs in a principled way. |
Supawit Chockchowwat; Wenjie Liu; Yongjoo Park; |
| 2 | Closest Pairs Search Over Data Stream Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing works have studied continuous 1-closest pair query (i.e., k=1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with k>1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily k. To achieve this, we introduce the concept of NNS (short for Nearest Neighbour pair-Set), which consists of all the nearest neighbour pairs and allows us to support KCP search via only accessing O(k) objects. |
Rui Zhu; Bin Wang; Xiaochun Yang; Baihua Zheng; |
| 3 | BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads Via Compiler Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the insight that what fusion optimization relies upon is tensor shape relationships between adjacent operators rather than exact shape values, it proposes the dynamic shape fusion approach based on shape information propagation. |
Zhen Zheng; Zaifeng Pan; Dalin Wang; Kai Zhu; Wenyi Zhao; Tianyou Guo; Xiafei Qiu; Minmin Sun; Junjie Bai; Feng Zhang; Xiaoyong Du; Jidong Zhai; Wei Lin; |
| 4 | Efficient Algorithm for Budgeted Adaptive Influence Maximization: An Incremental RR-set Update Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this incurs prohibitive computation, imposing limitations on real applications. To solve this dilemma, we propose an incremental update approach. |
Qintian Guo; Chen Feng; Fangyuan Zhang; Sibo Wang; |
| 5 | Efficient Core Maintenance in Large Bipartite Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, a few works have attempted to study how to maintain (α, β)-cores in the dynamic bipartite graph, but their performance is still far from perfect, due to the huge size of graphs and their frequent changes. To alleviate this issue, in this paper we present efficient (α, β)-core maintenance algorithms over bipartite graphs. |
Wensheng Luo; Qiaoyuan Yang; Yixiang Fang; Xu Zhou; |
| 6 | Efficient Maximum K-Defective Clique Computation with Improved Time Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advance the state of the art for exact maximum k-defective clique computation, in terms of both time complexity and practical performance. |
Lijun Chang; |
| 7 | Enriching Recommendation Models with Logic Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes RecLogic, a framework for improving the accuracy of machine learning (ML) models for recommendation. |
Lihang Fan; Wenfei Fan; Ping Lu; Chao Tian; Qiang Yin; |
| 8 | Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on the problem of finding a set of QCs containing all MQCs but deviate from further sharpening the pruning techniques as existing methods do. |
Kaiqiang Yu; Cheng Long; |
| 9 | FedCSS: Joint Client-and-Sample Selection for Hard Sample-Aware Noise-Robust Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Currently, there lacks an FL approach that can effectively distinguish hard samples (which are beneficial) from noisy samples (which are harmful). To bridge this gap, we propose the Federated Client and Sample Selection (FedCSS) approach. |
Anran Li; Yue Cao; Jiabao Guo; Hongyi Peng; Qing Guo; Han Yu; |
| 10 | Learning to Optimize LSM-trees: Towards A Reinforcement Learning Based Key-Value Store for Dynamic Workloads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies — the bottleneck of dynamic key-value stores. |
Dingheng Mo; Fanchao Chen; Siqiang Luo; Caihua Shan; |
| 11 | Memory-Efficient and Flexible Detection of Heavy Hitters in High-Speed Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a flexible sketch called SwitchSketch that embraces dynamic and skewed traffic for efficient and accurate heavy-hitter detection. |
He Huang; Jiakun Yu; Yang Du; Jia Liu; Haipeng Dai; Yu-E Sun; |
| 12 | Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. |
Zijin Feng; Miao Qiao; Hong Cheng; |
| 13 | OptiQL: Robust Optimistic Locking for Memory-Optimized Indexes Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Modern memory-optimized indexes often use optimistic locks for concurrent accesses. Read operations can proceed optimistically without taking the lock, greatly improving … |
Ge Shi; Ziyi Yan; Tianzheng Wang; |
| 14 | Origin-Destination Travel Time Oracle for Map-based Services Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries.We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. |
Yan Lin; Huaiyu Wan; Jilin Hu; Shengnan Guo; Bin Yang; Youfang Lin; Christian S. Jensen; |
| 15 | SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SAGA, a framework for automatically generating the top-K most effective data cleaning pipelines. |
Shafaq Siddiqi; Roman Kern; Matthias Boehm; |
| 16 | Secure Sampling for Approximate Multi-party Query Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, the goal of approximate query processing (AQP) with sublinear costs seems unachievable under MPC. To get around this inherent barrier, in this paper we take a two-stage approach: In the offline stage, we generate a batch of n/s samples with (n) total cost, which can then be consumed to answer queries as they arrive online. |
Qiyao Luo; Yilei Wang; Ke Yi; Sheng Wang; Feifei Li; |
| 17 | SH2O: Efficient Data Access for Work-Sharing Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: SH2O is based on the idea that an access pattern based on judiciously selected multidimensional ranges can replace a set of shared filters. To exploit the idea in an efficient and scalable manner, SH2O uses a three-tier approach: i) it uses spatial indices to efficiently access the ranges without overfetching, ii) it uses an optimizer to choose which filters to replace such that it maximizes cost-benefit for index accesses, and iii) it exploits partitioning schemes and independently accesses each data partition to reduce the number of filters in the access pattern. |
Panagiotis Sioulas; Ioannis Mytilinis; Anastasia Ailamaki; |
| 18 | TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TeraHAC, a (1+ε)-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. |
Laxman Dhulipala; Jakub \L{}\k{a}cki; Jason Lee; Vahab Mirrokni; |
| 19 | GEqO: ML-Accelerated Semantic Equivalence Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing solutions fall short of satisfying these requirements.In this paper, we take a major step towards filling this gap by proposing GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale. GEqO introduces two machine-learning-based filters that quickly prune out nonequivalent subexpressions and employs a semi-supervised learning feedback loop to iteratively improve its model with an intelligent sampling mechanism. |
Brandon Haynes; Rana Alotaibi; Anna Pavlenko; Jyoti Leeka; Alekh Jindal; Yuanyuan Tian; |
| 20 | The Battleship Approach to The Low Resource Entity Matching Problem Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the challenge of obtaining sufficient labeled data we offer a new active learning approach, focusing on a selection mechanism that exploits unique properties of entity matching. |
Bar Genossar; Avigdor Gal; Roee Shraga; |
| 21 | Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Udon, a novel debugger to support fine-grained debugging of UDFs. |
Yicong Huang; Zuozhi Wang; Chen Li; |
| 22 | ChainKV: A Semantics-Aware Key-Value Store for Ethereum System Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Log-Structure Merged tree (LSM-tree) based key-value (KV) store has been widely adopted as the storage engine for blockchain systems, such as Ethereum, in which blockchain … |
Zehao Chen; Bingzhe Li; Xiaojun Cai; Zhiping Jia; Lei Ju; Zili Shao; Zhaoyan Shen; |
| 23 | Proving Query Equivalence Using Linear Integer Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is to use the theory of LIA^*, which extends linear integer arithmetic formulas with unbounded sums and provides algorithms to translate a LIA^* formula to a LIA formula that can be decided using existing SMT solvers. |
Haoran Ding; Zhaoguo Wang; Yicun Yang; Dexin Zhang; Zhenglin Xu; Haibo Chen; Ruzica Piskac; Jinyang Li; |
| 24 | A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: 1) It works under bag semantics, for which we give the first dichotomy results in the problem space. 2) We extend our approach to the related problem of causal responsibility and give a more fine-grained analysis of its complexity. |
Neha Makhija; Wolfgang Gatterbauer; |
| 25 | ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate accuracy degradation, we introduce a new metric,Aggregation Difference (AD), that quantifies the gap between sampled and full neighbor set aggregation. |
Zhen Song; Yu Gu; Tianyi Li; Qing Sun; Yanfeng Zhang; Christian S. Jensen; Ge Yu; |
| 26 | ALP: Adaptive Lossless Floating-Point Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We created ALP after carefully studying the datasets used to evaluate the previous schemes. |
Azim Afroozeh; Leonardo X. Kuffo; Peter Boncz; |
| 27 | Anchor: A Library for Building Secure Persistent Memory Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Cloud infrastructure is experiencing a shift towards disaggregated setups, especially with the introduction of the Compute Express Link (CXL) technology, where byte-addressable … |
Dimitrios Stavrakakis; Dimitra Giantsidi; Maurice Bailleu; Philip S\{a}ndig; Shady Issa; Pramod Bhatotia; |
| 28 | AS-Parser: Log Parsing Based on Adaptive Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, all existing approaches share the common drawback: the flat segmentation with fixed delimiters fails to understand the structural information of logs, which causes low parsing accuracy. To address this problem, we propose a novel log parsing approach, AS-Parser. |
Xiaolei Chen; Peng Wang; Jia Chen; Wei Wang; |
| 29 | Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel strategy combining rapidly scalable but expensive resources with slow to start but inexpensive virtual machines to gain the benefit of elasticity without losing out on the cost savings of provisioned resources. |
Matthew Perron; Raul Castro Fernandez; David DeWitt; Michael Cafarella; Samuel Madden; |
| 30 | ChainedFilter: Combining Membership Filters By Chain Rule Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified and complete theory, namely chain rule, for general membership problems, which encompasses both approximate and exact membership as extreme cases. |
Haoyu Li; Liuhui Wang; Qizhi Chen; Jianan Ji; Yuhan Wu; Yikai Zhao; Tong Yang; Aditya Akella; |
| 31 | Correlation Joins Over Time Series Data Streams Utilizing Complementary Dimension Reduction and Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CorrJoin, short for Correlation Join, that combines a complementary dimension reduction and transformation step with a subsequent double-filtering step. |
AmirReza Alizade Nikoo; Michael H. B\{o}hlen; Sven Helmer; |
| 32 | Demystifying The QoS and QoE of Edge-hosted Video Streaming Applications in The Wild with SNESet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To close the knowledge gap, we collect SNESet, an active measurement dataset comprising QoS and QoE telemetry metrics of 8 VSAs over four months, covering end-users from 798 edge sites,30 cities, and 3 ISPs in one country.We characterize and compare the QoS and QoE metrics in SNESet with existing publicly available datasets, highlighting that SNESet includes a significantly greater number of metrics (horizontal diversity and vertical hierarchy) and provides more comprehensive coverage of specific metrics.Moreover, we qualitatively and quantitatively analyze the impact of QoS on QoE in both domain-general and domain-specific scenarios. |
Yanan Li; Guangqing Deng; Changming Bai; Jingyu Yang; Gang Wang; Hao Zhang; Jin Bai; Haitao Yuan; Mengwei Xu; Shangguang Wang; |
| 33 | DGC: Training Dynamic Graphs with Spatio-Temporal Non-Uniformity Using Graph Partitioning By Chunks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, dynamic graphs in practice are not uniformly structured, with some snapshots being very dense while others are sparse. To address this issue, we propose DGC, a distributed DGNN training system that achieves a 1.25\texttimes{} – 7.52\texttimes{} speedup over the state-of-the-art in our testbed. |
Fahao Chen; Peng Li; Celimuge Wu; |
| 34 | DP-starJ: A Differential Private Scheme Towards Analytical Star-Join Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are thus motivated to propose DP-starJ, a novel Differentially Private framework for star-Join queries. |
Congcong Fu; Hui Li; Jian Lou; Huizhen Li; Jiangtao Cui; |
| 35 | Efficient Approximation Framework for Attribute Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Firstly, their solution is tailored only for two limited metric functions: the Earth Mover distance and Euclidean distance, and cannot be generalized to more complicated metric functions. Besides, their solution still aims to return the exact top-k answers via the sampling method, which still causes high running costs as shown in our experiment.Motivated by these limitations, we propose a general approximation framework for attribute recommendation that efficiently returns the top-k attributes with theoretical guarantees while supporting an extensive range of metric functions, such as the Kolmogorov-Smirnov test (KS-test), Chebyshev distance, the Earth Mover distance, Euclidean distance, and with the potential to more metrics. |
Xingguang Chen; Fangyuan Zhang; Jinchao Huang; Sibo Wang; |
| 36 | Equitable Top-k Results for Long Tail Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This causes a handful of popular records (products, items, etc) getting overexposed and always be returned to the user query, whereas, there exists a long tail of niche records that may be equally desirable (have similar utility). To alleviate this, we propose θ-Equiv-top-k-MMSP inside existing top-k algorithms – instead of returning a fixed top-k set, it generates all (or many) top-k sets that are equivalent in utility and creates a probability distribution over those sets. |
Md Mouinul Islam; Mahsa Asadi; Senjuti Basu Roy; |
| 37 | F3KM: Federated, Fair, and Fast K-means Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a federated, fair, and fast k-means algorithm (F3KM) to solve the fair clustering problem efficiently in scenarios where data cannot be shared among different parties. |
Shengkun Zhu; Quanqing Xu; Jinshan Zeng; Sheng Wang; Yuan Sun; Zhifeng Yang; Chuanhui Yang; Zhiyong Peng; |
| 38 | FACET: Robust Counterfactual Explanation Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, there is a rising demand for explainable AI systems which provide actionable steps for lay users to obtain their desired outcome. To meet this need, we propose FACET, the first explanation analytics system which supports a user in interactively refining counterfactual explanations for decisions made by tree ensembles. |
Peter M. VanNostrand; Huayi Zhang; Dennis M. Hofmann; Elke A. Rundensteiner; |
| 39 | Generation of Training Examples for Tabular Natural Language Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For all steps, we introduce generic generation algorithms that take as input only the tables. |
Jean-Flavien Bussotti; Enzo Veltri; Donatello Santoro; Paolo Papotti; |
| 40 | Hierarchical Cut Labelling – Scaling Up Distance Queries on Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel solution hierarchical cut 2-hop labelling (HC2L) to address the drawbacks of the existing works. |
Muhammad Farhan; Henning Koehler; Robert Ohms; Qing Wang; |
| 41 | High-Ratio Compression for Machine-Generated Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing compression techniques tend to focus on general-purpose and data block approaches, but overlook the inherent structure of machine-generated data and hence result in low compression ratios or limited lookup efficiency. To address these limitations, we introduce the Pattern-Based Compression (PBC) algorithm, which specifically targets patterns in machine-generated data to achieve Pareto-optimality in most cases. |
Jiujing Zhang; Zhitao Shen; Shiyu Yang; Lingkai Meng; Chuan Xiao; Wei Jia; Yue Li; Qinhui Sun; Wenjie Zhang; Xuemin Lin; |
| 42 | HongTu: Scalable Full-Graph GNN Training on Multiple GPUs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To efficiently train on large graphs, we present HongTu, a scalable full-graph GNN training system running on GPU-accelerated platforms. |
Qiange Wang; Yao Chen; Weng-Fai Wong; Bingsheng He; |
| 43 | Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel method named Lemo for the multi-query optimization problem. |
Songsong Mo; Yile Chen; Hao Wang; Gao Cong; Zhifeng Bao; |
| 44 | Lightweight Materialization for Fast Dashboards Over Joins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, joins are among the most expensive operations in DBMSes, making the support of interactive dashboards over joins challenging.In this paper, we present Treant, a dashboard accelerator for queries over large joins. |
Zezhou Huang; Eugene Wu; |
| 45 | MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and Querying Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing LSM-tree key-value stores mainly focus on the optimizations of local storage, which incurs sub-optimal performance when directly applied to hybrid storage.In this paper, we present MirrorKV for efficient compaction and querying on hybrid cloud storage. |
Zhiqi Wang; Zili Shao; |
| 46 | MOST: Model-Based Compression with Outlier Storage for Time Series Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing compression techniques either show only low to medium compression ratio on time series data, or incur significant decompression overhead during query processing.We propose a novel compression technique, MOST (Model-based compression with Outlier STorage) for time series data. |
Zehai Yang; Shimin Chen; |
| 47 | Neural Attributed Community Search at Billion Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these, in this paper, we propose a new neurAL attrIbuted Community sEarch model for large-scale graphs, termed ALICE. |
Jianwei Wang; Kai Wang; Xuemin Lin; Wenjie Zhang; Ying Zhang; |
| 48 | NOCAP: Near-Optimal Correlation-Aware Partitioning Joins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do that, we derive the optimal partitioning using a new cost-based analysis of partitioning-based joins that is tailored for primary key – foreign key (PK-FK) joins, one of the most common join types. |
Zichen Zhu; Xiao Hu; Manos Athanassoulis; |
| 49 | PLATON: Top-down R-tree Packing with Learned Partition Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a divide and conquer strategy and two optimization techniques, early termination and level-wise sampling, to drastically reduce the MCTS algorithm’s time complexity and make it a linear-time algorithm. |
Jingyi Yang; Gao Cong; |
| 50 | Practical Dynamic Extension for Sampling Indexes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is difficult, because existing sampling indexes present a dichotomy; efficient sampling indexes are difficult to update, while easily updatable indexes have poor sampling performance. This paper seeks to address this gap by proposing a general and practical framework for extending most sampling indexes with efficient update support, based on splitting indexes into smaller shards, combined with a systematic approach to the periodic reconstruction. |
Douglas B. Rumbaugh; Dong Xie; |
| 51 | Rethinking Learned Cost Models: Why Start from Scratch? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach to tuning the conventional formula-based cost model for DBMS. |
Jiani Yang; Sai Wu; Dongxiang Zhang; Jian Dai; Feifei Li; Gang Chen; |
| 52 | Rethink Query Optimization in HTAP Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that hybrid plans can largely benefit query execution (e.g., up to 11x speedups in our evaluation). |
Haoze Song; Wenchao Zhou; Feifei Li; Xiang Peng; Heming Cui; |
| 53 | Rethinking The Encoding of Integers for Scans on Skewed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the concept of forward encodings: a family of encodings that shift pruning-relevant bits closer to the most significant bit. |
Martin Prammer; Jignesh M. Patel; |
| 54 | SALI: A Scalable Adaptive Learned Index Framework Based on Probability Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One promising solution is the learned index, which uses a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. |
Jiake Ge; Huanchen Zhang; Boyu Shi; Yuanhui Luo; Yunda Guo; Yunpeng Chai; Yuxing Chen; Anqun Pan; |
| 55 | Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the counting problems of two common types of em motifs in bipartite graphs: (i) butterflies (2×2 bicliques) and (ii) bi-triangles (length-6 cycles). |
Fangyuan Zhang; Dechuang Chen; Sibo Wang; Yin Yang; Junhao Gan; |
| 56 | SeeSaw: Interactive Ad-hoc Search Over Image Databases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many high-level data tasks in machine learning, such as constructing datasets for training and testing object detectors, imply finding ad-hoc objects or scenes within large image datasets as a key sub-problem. New foundational visual-semantic embeddings trained on massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can help users start searches on their own data, but we find there is a long tail of queries where these models fall short in practice. |
Oscar Moll; Manuel Favela; Samuel Madden; Vijay Gadepally; Michael Cafarella; |
| 57 | Selectivity Estimation for Queries Containing Predicates Over Set-Valued Attributes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we presents novel techniques for selectivity estimation on queries involving predicates over set-valued attributes. |
Zizhong Meng; Xin Cao; Gao Cong; |
| 58 | Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new system that lets users write natural language questions directly. A major barrier to using this learned data discovery system is it needs expensive-to-collect training data, thus limiting its utility.In this paper, we introduce a self-supervised approach to assemble training datasets and train learned discovery systems without human intervention. |
Qiming Wang; Raul Castro Fernandez; |
| 59 | TEE-based General-purpose Computational Backend for Secure Delegated Data Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider secure delegated data processing (SDDP), an expanded data processing scenario wherein data owners simply delegate their data to SDDP providers for subsequent value mining or other downstream applications, eliminating the necessary involvement of data owners or trusted entities to dive into data processing deeply. |
Mo Sha; Jialin Li; Sheng Wang; Feifei Li; Kian-Lee Tan; |
| 60 | A Learned Cuckoo Filter for Approximate Membership Queries Over Variable-sized Sliding Windows on Data Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel data structure called Learned Cuckoo Filter (LCF). |
Yao Tian; Tingyun Yan; Ruiyuan Zhang; Kai Huang; Bolong Zheng; Xiaofang Zhou; |
| 61 | Veil: A Storage and Communication Efficient Volume-Hiding Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a solution to prevent volume leakage, entitled Veil, that partitions the dataset by randomly mapping keys to a set of equi-sized buckets. |
Shanshan Han; Vishal Chakraborty; Michael T. Goodrich; Sharad Mehrotra; Shantanu Sharma; |
| 62 | Waffle: An Online Oblivious Datastore for Protecting Data Access Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Waffle, a datastore that protects an application’s data access patterns from a passive persistent adversary. |
Sujaya Maiyya; Sharath Chandra Vemula; Divyakant Agrawal; Amr El Abbadi; Florian Kerschbaum; |
| 63 | DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DProvDB, a fine-grained privacy provenance framework for the multi-analyst scenario that tracks the privacy loss to each single data analyst. |
Shufan Zhang; Xi He; |
| 64 | R2D2: Reducing Redundancy and Duplication in Data Lakes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on identifying and reducing redundancy in enterprise data lakes by addressing the problem of dataset containment. |
Raunak Shah; Koyel Mukherjee; Atharv Tyagi; Sai Keerthana Karnam; Dhruv Joshi; Shivam Pravin Bhosale; Subrata Mitra; |
| 65 | Splitting Tuples of Mismatched Entities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a scheme to decide what tuples to split and what tuples to correct without splitting, fix errors/assign attribute values to the split tuples, and impute missing values. |
Wenfei Fan; Ziyan Han; Weilong Ren; Ding Wang; Yaoshu Wang; Min Xie; Mengyi Yan; |
| 66 | VeriTxn: Verifiable Transactions for Cloud-Native Databases with Storage Disaggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present VeriTxn, a novel cloud-native database that efficiently provides verifiability of transaction correctness. |
Zhanhao Zhao; Hexiang Pan; Gang Chen; Xiaoyong Du; Wei Lu; Beng Chin Ooi; |
| 67 | Homomorphic Compression: Making Text Processing on Compression Unlimited Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The concept of operating on compressed data infuses new blood into efficient text management by enabling mainly access-oriented text processing tasks to be done directly on compressed data without decompression. Facing limitations of the existing compressed text processing schemes such as limited types of operations supported, low efficiency, and high space occupation, we address these problems by proposing a homomorphic compression theory. |
Jiawei Guan; Feng Zhang; Siqi Ma; Kuangyu Chen; Yihua Hu; Yuxing Chen; Anqun Pan; Xiaoyong Du; |
| 68 | Watchog: A Light-weight Contrastive Learning Based Framework for Column Annotation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Watchog framework, which employs contrastive learning techniques to learn robust representations for tables by leveraging a large-scale unlabeled table corpus with minimal overhead. |
Zhengjie Miao; Jin Wang; |
| 69 | Optimizing Distributed Protocols with Query Rewrites Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. |
David C.Y. Chu; Rithvik Panchapakesan; Shadaj Laddad; Lucky E. Katahanas; Chris Liu; Kaushik Shivakumar; Natacha Crooks; Joseph M. Hellerstein; Heidi Howard; |
| 70 | Grafite: Taming Adversarial Queries with Optimal Range Filters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our experimental evaluation shows that Grafite is the only range filter to date to achieve robust and predictable false positive rates across all combinations of datasets, query workloads, and range sizes, while providing faster queries and construction times, and dominating all competitors in the case of correlated queries.As a further contribution, we introduce a very simple heuristic range filter whose performance on uncorrelated queries is very close to or better than the one achieved by the best heuristic range filters proposed in the literature so far. |
Marco Costa; Paolo Ferragina; Giorgio Vinciguerra; |
| 71 | High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose HPEZ with newly-designed interpolations and quality-metric-driven auto-tuning, which features significantly improved compression quality upon the existing high-performance compressors, meanwhile being exceedingly faster than high-ratio compressors. |
Jinyang Liu; Sheng Di; Kai Zhao; Xin Liang; Sian Jin; Zizhe Jian; Jiajun Huang; Shixun Wu; Zizhong Chen; Franck Cappello; |
| 72 | MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach allows parallel processing of the graph edges within a sliding window but requires a blocking expiration phase between sliding windows to remove the old edges. This blocking phase can significantly degrade the query performance, especially when the edges arrive quickly and the sliding windows overlap significantly.This paper presents a new RPQ evaluation strategy called Multi-Window Parallel (MWP) method leveraging a new data structure called Timestamped Rooted Digraph (TRD). |
Siyuan Zhang; Zhenying He; Yinan Jing; Kai Zhang; X. Sean Wang; |
| 73 | Proximity Queries on Point Clouds Using Rapid Construction Path Oracle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since (1) all existing on-the-fly and oracle-based shortest path query algorithms on a TIN are very expensive, (2) all existing on-the-fly shortest path query algorithms on a point cloud are still not efficient, and (3) there are no oracle-based shortest path query algorithms on a point cloud, we propose an efficient (1+ε)-approximate shortest path oracle that answers the shortest path query for a set of Points-Of-Interests (POIs) on the point cloud, which has a good performance (in terms of the oracle construction time, oracle size and shortest path query time) due to the concise information about the pairwise shortest paths between any pair of POIs stored in the oracle. |
Yinzhao Yan; Raymond Chi-Wing Wong; |
| 74 | Efficient K-Clique Listing: An Edge-Oriented Branching Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial k-clique with two vertices that connect each other (which correspond to an edge ). |
Kaixin Wang; Kaiqiang Yu; Cheng Long; |
| 75 | Relative Keys: Putting Feature Explanation Into Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose relative keys that have the best of both worlds. |
Shuai An; Yang Cao; |
| 76 | NOC-NOC: Towards Performance-optimal Distributed Transactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new design objective and establish impossibility results with respect to the achievable isolation levels. |
Si Liu; Luca Multazzu; Hengfeng Wei; David A. Basin; |
| 77 | Robustness of Updatable Learning-based Index Advisors Against Poisoning Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the first attempt to study the robustness of updatable learning-based IAs against poisoning attack, i.e., whether the IAs can maintain robust performance if their training/updating is disturbed by injecting an extraneous toxic workload. |
Yihang Zheng; Chen Lin; Xian Lyu; Xuanhe Zhou; Guoliang Li; Tianqing Wang; |
| 78 | FedKNN: Secure Federated K-Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is costly to directly apply existing approaches to federated k-nearest neighbor (kNN) search with difficult-to-compute distance functions, like graph or sequence similarity. To address this challenge, we propose FedKNN, a system that supports secure federated kNN search queries with a wide range of similarity measurements. |
Xinyi Zhang; Qichen Wang; Cheng Xu; Yun Peng; Jianliang Xu; |
| 79 | FineMon: An Innovative Adaptive Network Telemetry Scheme for Fine-Grained, Multi-Metric Data Monitoring with Dynamic Frequency Adjustment and Enhanced Data Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FineMon, an innovative adaptive network telemetry scheme for precise, fine-grained, multi-metric data monitoring. |
Haojie Ji; Kun Xie; Jigang Wen; Qingyi Zhang; Gaogang Xie; Wei Liang; |
| 80 | PECJ: Stream Window Join on Disorder Data Streams with Proactive Error Compensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PECJ, a solution that proactively incorporates unobserved data to enhance accuracy while reducing latency, thus requiring robust predictive modeling of stream oscillation. |
Xianzhi Zeng; Shuhao Zhang; Hongbin Zhong; Hao Zhang; Mian Lu; Zhao Zheng; Yuqiang Chen; |
| 81 | Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. |
Mengzhao Wang; Weizhi Xu; Xiaomeng Yi; Songlin Wu; Zhangyang Peng; Xiangyu Ke; Yunjun Gao; Xiaoliang Xu; Rentong Guo; Charles Xie; |
| 82 | One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The modern database has many precise and approximate counting requirements. Nevertheless, a solitary multidimensional index or cardinality estimator is insufficient to cater to … |
Yingze Li; Hongzhi Wang; Xianglong Liu; |
| 83 | Optimizing Nested Recursive Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Temporel, a system that allows recursion to be freely combined with non-monotone operators. |
Amir Shaikhha; Dan Suciu; Maximilian Schleich; Hung Ngo; |
| 84 | Sub-optimal Join Order Identification with L1-error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the only result connecting Q-error with plan optimality is an upper-bound on the cost of the worst possible query plan computed from a set of cardinality estimates—there is no connection between Q-error and the real plans generated by standard query optimizers. Therefore, in order to identify sub-optimal query plans, we propose a learning-based method having as its main feature a novel measure called L1-error. |
Yesdaulet Izenov; Asoke Datta; Brian Tsan; Florin Rusu; |
| 85 | Efficient Algorithm for K-Multiple-Means Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To compute the similarity matrix of the bipartite graph efficiently, we skip unnecessary distance computations and estimate lower bounding distances between the data points and the multiple means. |
Yasuhiro Fujiwara; Atsutoshi Kumagai; Yasutoshi Ida; Masahiro Nakano; Makoto Nakatsuji; Akisato Kimura; |
| 86 | Predictive and Near-Optimal Sampling for View Materialization in Video Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to minimize the number of processed video frames, we propose a novel predictive sampling framework, namely LEAP, exhibits near-optimal sampling performance. |
Yanchao Xu; Dongxiang Zhang; Shuhao Zhang; Sai Wu; Zexu Feng; Gang Chen; |
| 87 | LIT: Lightning-fast In-memory Temporal Indexing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of temporal database indexing, i.e., indexing versions of a database table in an evolving database. |
George Christodoulou; Panagiotis Bouros; Nikos Mamoulis; |
| 88 | Optimizing Dataflow Systems for Scalable Interactive Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable analysts to focus on visualization design, we contribute VegaPlus, a system that automatically optimizes interactive dashboards to support large datasets. |
Junran Yang; Hyekang Kevin Joo; Sai Yerramreddy; Dominik Moritz; Leilani Battle; |
| 89 | Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it makes massive exploration for the redundant vertices not located in any simple path, thereby resulting in poor query performance. To alleviate this problem, we design a distributed approach DistriEnum to optimize query performance and scalability with well-bound memory consumption. |
Yuanyuan Zeng; Yixiang Fang; Chenhao Ma; Xu Zhou; Kenli Li; |
| 90 | Efficient High-Quality Clustering for Large Bipartite Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges.Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs. |
Renchi Yang; Jieming Shi; |
| 91 | DTT: An Example-Driven Tabular Transformer for Joinability By Leveraging Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we cast the problem as a prediction task and develop a framework that leverages large deep-learning language models to transform tabular data from a source formatting to a desired target representation. |
Arash Dargahi Nobari; Davood Rafiei; |
| 92 | Determining Exact Quantiles with Randomized Summaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to shrink the ranges more aggressively, using randomized summaries such as KLL sketch. |
Ziling Chen; Haoquan Guan; Shaoxu Song; Xiangdong Huang; Chen Wang; Jianmin Wang; |
| 93 | An LDP Compatible Sketch for Securely Approximating Set Intersection Cardinalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Directly applyingLocal Differential Privacy (LDP) techniques to safeguard the sketch collection results in extremely large estimation errors of set intersection cardinalities. To address this issue, we propose a novel sketch method that makes it easier to incorporate noise into the constructed sketch to achieve differential privacy. |
Pinghui Wang; Yitong Liu; Zhicheng Li; Rundong Li; |
| 94 | Spruce: A Fast Yet Space-saving Structure for Dynamic Graph Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from the basic operations of the van Emde Boas (vEB) tree in double-logarithmic time, we designed Spruce, a high-performance yet space-saving in-memory structure to store dynamic graphs. |
Jifan Shi; Biao Wang; Yun Xu; |
| 95 | Controllable Tabular Data Synthesis Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Subsequently, we introduce lightweight controllers to guide the unconditional generative model in generating synthetic data that satisfies different conditions. |
Tongyu Liu; Ju Fan; Nan Tang; Guoliang Li; Xiaoyong Du; |
| 96 | HERO: A Hierarchical Set Partitioning and Join Framework for Speeding Up The Set Intersection Over Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-level set intersection framework, namely hierarchical set partitioning and join (HERO), by using our well-designed set intersection bitmap tree (SIB-tree) index, which is independent of SIMD instructions and completely orthogonal to the merge intersection framework. |
Boyu Yang; Weiguo Zheng; Xiang Lian; Yuzheng Cai; X. Sean Wang; |
| 97 | Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior accuracy-privacy-memory efficiency tradeoff. |
Xiaochen Li; Weiran Liu; Jian Lou; Yuan Hong; Lei Zhang; Zhan Qin; Kui Ren; |
| 98 | Towards Buffer Management with Tiered Main Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study provides a systematic study for DBMS to build tiered memory buffer management with respect to a wide range of hardware performance characteristics. |
Xiangpeng Hao; Xinjing Zhou; Xiangyao Yu; Michael Stonebraker; |
| 99 | Parallel Algorithms for Hierarchical Nucleus Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a scalable parallel algorithm for hierarchy construction, with practical optimizations, such as interleaving the coreness computation with hierarchy construction and using a concurrent union-find data structure in an innovative way to generate the hierarchy. |
Jessica Shi; Laxman Dhulipala; Julian Shun; |
| 100 | GSWORD: GPU-accelerated Sampling for Subgraph Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the embarrassingly parallel nature of the samples, there are unique challenges in accelerating subgraph counting due to its irregular computation logic. To address these challenges, we introduce two GPU-centric optimizations: (1) sample inheritance, enabling threads to inherit samples from neighboring threads to avoid idling, and (2) warp streaming, effectively distributing workloads among threads through a streaming process. |
Chang Ye; Yuchen Li; Shixuan Sun; Wentian Guo; |
| 101 | Privacy Amplification By Sampling Under User-level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take the first step towards the study of privacy amplification by sampling under user-DP, and give the amplification results for two common user-DP sampling strategies: simple sampling and sample-and-explore. |
Juanru Fang; Ke Yi; |
| 102 | Time Series Representation for Visualization in Apache IoTDB Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel chunk merge free approach called M4-LSM to accelerate M4 representation and visualization. |
Lei Rui; Xiangdong Huang; Shaoxu Song; Yuyuan Kang; Chen Wang; Jianmin Wang; |
| 103 | Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel communication scheme called zero-sided RDMA, enabling data exchange as a native network service using a programmable switch. |
Matthias Jasny; Lasse Thostrup; Sajjad Tamimi; Andreas Koch; Zsolt Istv\'{a}n; Carsten Binnig; |
| 104 | PACE: Poisoning Attacks on Learned Cardinality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the first challenge, we propose a method of speculating and training a surrogate model, which transforms the black-box attack into a near-white-box attack. |
Jintao Zhang; Chao Zhang; Guoliang Li; Chengliang Chai; |
| 105 | Modeling Shifting Workloads for Learned Database Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how the notion of a replay buffer can be managed through online algorithms to build a concise yet representative model of the workload distribution — allowing for rapid adaptation and effective prediction of cardinalities and costs. |
Peizhi Wu; Zachary G. Ives; |
| 106 | Worst-Case-Optimal Similarity Joins on Graph Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We model the problem by superimposing the database graph with the kNN graph and show that a variant of Leapfrog TrieJoin (LTJ) implemented over a compact data structure called the Ring can be seamlessly extended to integrate similarity clauses with the equijoins in the LTJ query process, retaining worst-case optimality in many relevant cases. |
Diego Arroyuelo; Benjamin Bustos; Adri\'{a}n G\'{o}mez-Brand\'{o}n; Aidan Hogan; Gonzalo Navarro; Juan Reutter; |
| 107 | View-based Explanations for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose GVEX, a novel paradigm that generates Graph Views for GNN EXplanation. |
Tingyang Chen; Dazhuo Qiu; Yinghui Wu; Arijit Khan; Xiangyu Ke; Yunjun Gao; |
| 108 | SWIX: A Memory-efficient Sliding Window Learned Index Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes SWIX, a novel memory-efficient learned index for sliding windows. |
Liang Liang; Guang Yang; Ali Hadian; Luis Alberto Croquevielle; Thomas Heinis; |
| 109 | PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer Replacement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, however, most existing systems do not focus on disk I/O aspects and are vulnerable to performance degradation when the scale of input matrices and intermediate data grows large. To address this problem, we present a new out-of-core matrix computation system called PreVision. |
Kyoseung Koo; Sohyun Kim; Wonhyeon Kim; Yoojin Choi; Juhee Han; Bogyeong Kim; Bongki Moon; |
| 110 | Discovering Functional Dependencies Through Hitting Set Enumeration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: FDHits is based on several discovery optimizations that include a hybrid validation approach, effective hitting set enumeration techniques, one-pass candidate validations, and parallelization. |
Tobias Bleifu\ss{}; Thorsten Papenbrock; Thomas Bl\{a}sius; Martin Schirneck; Felix Naumann; |
| 111 | Cardinality Estimation Over Knowledge Graphs with Embeddings and Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose GNCE, a novel approach that leverages knowledge graph embeddings and Graph Neural Networks (GNN) to accurately predict the cardinality of conjunctive queries over KGs. |
Tim Schwabe; Maribel Acosta; |
| 112 | ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Rather than falling back to traditional approaches to trade off one criterion with another, we present a new learned approach that achieves all these. |
Kyoungmin Kim; Sangoh Lee; Injung Kim; Wook-Shin Han; |
| 113 | PLAQUE: Automated Predicate Learning at Query Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the challenge of learning predicates during query execution which are then exploited to accelerate execution. |
Yiming Lin; Sharad Mehrotra; |
| 114 | Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Limousine, a self-designing key-value storage engine, that can automatically morph to the near-optimal storage engine architecture shape given a workload, a cloud budget, and target performance. |
Subarna Chatterjee; Mark F. Pekala; Lev Kruglyak; Stratos Idreos; |
| 115 | Determining The Largest Overlap Between Tables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatically detecting these highly similar, matching tables would allow us to guarantee their consistency through data cleaning or change propagation, but also to eliminate redundancy to free up storage space or to save additional work for the editors.We present the first formal definition of this problem, and with it Sloth, our solution to efficiently detect the largest overlap between two tables. We experimentally demonstrate on real-world datasets its efficacy in solving this task, analyzing its performance and showing its impact on multiple use cases. |
Luca Zecchini; Tobias Bleifu\ss{}; Giovanni Simonini; Sonia Bergamaschi; Felix Naumann; |
| 116 | Machine Unlearning in Learned Databases: An Experimental Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With this work, for the first time to our knowledge, we pose and answer the following key questions: What is the effect of unlearning algorithms on NN-based DB models? |
Meghdad Kurmanji; Eleni Triantafillou; Peter Triantafillou; |
| 117 | Wred: Workload Reduction for Scalable Index Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces workload reduction, a new complementary technique aimed at expediting index tuning by decreasing individual what-if call time without significantly affecting the quality of index tuning. |
Matteo Brucato; Tarique Siddiqui; Wentao Wu; Vivek Narasayya; Surajit Chaudhuri; |
| 118 | CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. |
Hailin Zhang; Zirui Liu; Boxuan Chen; Yikai Zhao; Tong Zhao; Tong Yang; Bin Cui; |
| 119 | The Image Calculator: 10x Faster Image-AI Inference By Replacing JPEG with Self-designing Storage Format Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the Image Calculator across a diverse set of data, image analysis tasks, AI models, and hardware. |
Utku Sirin; Stratos Idreos; |
| 120 | Sibyl: Forecasting Time-Evolving Query Workloads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. |
Hanxian Huang; Tarique Siddiqui; Rana Alotaibi; Carlo Curino; Jyoti Leeka; Alekh Jindal; Jishen Zhao; Jes\'{u}s Camacho-Rodr\'{\i}guez; Yuanyuan Tian; |
| 121 | LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study cardinality estimation of LIKE-queries, i.e., queries that use the LIKE-operator to match a pattern with wildcards against string-valued attributes. |
Mehmet Aytimur; Silvan Reiner; Leonard W\{o}rteler; Theodoros Chondrogiannis; Michael Grossniklaus; |
| 122 | SkyPIE: A Fast \& Accurate Oracle for Object Placement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the static nature and pay-per-use pricing model of cloud environments to explore a different approach. |
Tiemo Bang; Chris Douglas; Natacha Crooks; Joseph M. Hellerstein; |
| 123 | Fast Shapley Value Computation in Data Assemblage Tasks As Cooperative Simple Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the challenging problem of Shapley value computation in data markets in a novel setting of data assemblage tasks with binary utility functions among data owners. |
Xuan Luo; Jian Pei; Cheng Xu; Wenjie Zhang; Jianliang Xu; |
| 124 | Cabin: A Compressed Adaptive Binned Scan Index Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The structures can be much larger than the base data column size.In this paper, we propose a novel scan index, Cabin, that exploits the following three techniques for better time-space tradeoff. |
Yiyuan Chen; Shimin Chen; |
| 125 | Dias: Dynamic Rewriting of Pandas Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop techniques for efficient rewrites in Dias, including checking the preconditions under which rewrites are correct, dynamically, at fine-grained program points. |
Stefanos Baziotis; Daniel Kang; Charith Mendis; |
| 126 | LST-Bench: Benchmarking Log-Structured Tables in The Cloud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, engines like Apache Spark and Trino can be configured to leverage the optimizations and controls offered by LSTs to meet specific business needs. Conventional benchmarks and tools are inadequate for evaluating the transformative changes in the storage layer resulting from these advancements, as they do not allow us to measure the impact of design and optimization choices in this new setting.In this paper, we propose a novel benchmarking approach and metrics that build upon existing benchmarks, aiming to systematically assess LSTs. |
Jes\'{u}s Camacho-Rodr\'{\i}guez; Ashvin Agrawal; Anja Gruenheid; Ashit Gosalia; Cristian Petculescu; Josep Aguilar-Saborit; Avrilia Floratou; Carlo Curino; Raghu Ramakrishnan; |
| 127 | A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we comprehensively review the methods in the current trend and experimentally confirm their advantage over prior approaches. |
Zhijie Zhang; Yujie Lu; Weiguo Zheng; Xuemin Lin; |
| 128 | On The Reasonable Effectiveness of Relational Diagrams: Explaining Relational Query Patterns and The Pattern Expressiveness of Relational Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a concrete example, we propose Relational Diagrams, a complete and sound diagrammatic representation of safe relational calculus that is provably (i) unambiguous, (ii) relationally complete, and (iii) able to represent all query patterns for unions of non-disjunctive queries. |
Wolfgang Gatterbauer; Cody Dunne; |
| 129 | RITA: Group Attention Is All You Need for Timeseries Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, quadratic time and space complexities limit Transformers’ scalability, especially for long timeseries. To address these issues, we develop a timeseries analytics tool, RITA, which uses a novel attention mechanism, named group attention, to address this scalability issue. |
Jiaming Liang; Lei Cao; Samuel Madden; Zack Ives; Guoliang Li; |
| 130 | Maximum K-Plex Computation: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the maximum k-plex computation problem from both theory and practice. |
Lijun Chang; Kai Yao; |
| 131 | WeBridge: Synthesizing Stored Procedures for Large-Scale Real-World Web Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Modern web applications use databases to store their data. When processing user requests, these applications retrieve and store data in the database server, which incurs network … |
Gansen Hu; Zhaoguo Wang; Chuzhe Tang; Jiahuan Shen; Zhiyuan Dong; Sheng Yao; Haibo Chen; |
| 132 | LeCo: Lightweight Compression Via Learning Serial Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance. |
Yihao Liu; Xinyu Zeng; Huanchen Zhang; |
| 133 | Approximate Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following recent work that applies transformers to cardinality estimation, we design a novel learning-based method to approximate the sketch of any arbitrary selection, enabling sketches for join queries with filter conditions. |
Brian Tsan; Asoke Datta; Yesdaulet Izenov; Florin Rusu; |
| 134 | DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we notice that this generates a large dependency graph that contains many redundant edges and its worst-case time complexity is quadratic to the number of requests in a workload. In order to solve these challenging problems, we formally propose four classes of dependency graphs for DRSs. |
Wonseok Lee; Jaehyun Ha; Wook-Shin Han; Changgyoo Park; Myunggon Park; Juhyeng Han; Juchang Lee; |
| 135 | STile: Searching Hybrid Sparse Formats for Sparse Deep Learning Operators Automatically Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often make a trade-off between search space and search time: their search spaces are limited in some cases, resulting in limited operator running efficiency they can achieve. In this paper, we try to extend the search space in its breadth (by doing flexible sparse tensor transformations) and depth (by enabling multi-level decomposition). |
Jingzhi Fang; Yanyan Shen; Yue Wang; Lei Chen; |
| 136 | SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle general range queries, we propose a 2D segment graph with average-case index size O(n log n) to compress n segment graphs, breaking the quadratic barrier. |
Chaoji Zuo; Miao Qiao; Wenchao Zhou; Feifei Li; Dong Deng; |
| 137 | In-Database Data Imputation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We adapt this method to exploit computation sharing and a ring abstraction for faster model training. |
Massimo Perini; Milos Nikolic; |
| 138 | Summarized Causal Explanations For Aggregate Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CauSumX, a framework for generating summarized causal explanations for the entire aggregate view. |
Brit Youngmann; Michael Cafarella; Amir Gilad; Sudeepa Roy; |
| 139 | TabEE: Tabular Embeddings Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most current methods treat these embedding models as black boxes making it difficult to understand the insights captured by the models. Our research proposes a novel approach to interpret these models, aiming to provide local and global explanations for the original data and detect potential flaws in the embedding models. |
Roni Copul; Nave Frost; Tova Milo; Kathy Razmadze; |
| 140 | Automated Data Visualization from Natural Language Via Large Language Models: An Exploratory Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the considerable efforts made by these approaches, challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations, and explore the effectiveness of in-context learning prompts for enhancing this task. |
Yang Wu; Yao Wan; Hongyu Zhang; Yulei Sui; Wucai Wei; Wei Zhao; Guandong Xu; Hai Jin; |
| 141 | Can Learned Indexes Be Built Efficiently? A Deep Dive Into Sampling Trade-offs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two error-bounded sampling schemes: Sample EB-PLA, and Sample EB-Histogram. |
Minguk Choi; Seehwan Yoo; Jongmoo Choi; |
| 142 | Missing Data Imputation with Uncertainty-Driven Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of missing data imputation, which is a fundamental task in the area of data quality that aims to impute the missing data to achieve the completeness of datasets. Though the recent distribution-modeling-based techniques (e.g., distribution generation and distribution matching) can achieve state-of-the-art performance in terms of imputation accuracy, we notice that (1) they deploy a sophisticated deep learning model that tends to be overfitting for missing data imputation; (2) they directly rely on a global data distribution while overlooking the local information.Driven by the inherent variability in both missing data and missing mechanisms, in this paper, we explore the uncertain nature of this task and aim to address the limitations of existing works by proposing an uNcertainty-driven netwOrk for Missing data Imputation, termed NOMI. |
Jianwei Wang; Ying Zhang; Kai Wang; Xuemin Lin; Wenjie Zhang; |
| 143 | Reservoir Sampling Over Joins Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of how to maintain a random sample over joins while the tuples are streaming in. |
Binyang Dai; Xiao Hu; Ke Yi; |
| 144 | A Counting-based Approach for Efficient K-Clique Densest Subgraph Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the efficiency, in this paper, we propose a novel framework based on the Frank-Wolfe algorithm, which only needs k-clique counting, rather than k-clique enumeration, where the former one is often much faster than the latter one. |
Yingli Zhou; Qingshuo Guo; Yixiang Fang; Chenhao Ma; |
| 145 | ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. |
Liana Patel; Peter Kraft; Carlos Guestrin; Matei Zaharia; |
| 146 | Akane: Perplexity-Guided Time Series Data Cleaning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle these drawbacks, we leverage inherent recurrent patterns in time series, analogize them as fixed combinations in textual data, and incorporate the concept of perplexity. The cleaning problem is thus transformed to minimize the perplexity of the time series under a given cleaning cost, and we design a four-phase algorithmic framework to tackle this problem. |
Xiaoyu Han; Haoran Xiong; Zhenying He; Peng Wang; Chen Wang; X. Sean Wang; |
| 147 | Auto-Formula: Recommend Formulas in Spreadsheets Using Contrastive Learning for Table Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers.Despite the success of spreadsheets, authoring complex formulas remains challenging, as non-technical users need to look up and understand non-trivial formula syntax. To address this pain point, we leverage the observation that there is often an abundance of similar-looking spreadsheets in the same organization, which not only have similar data, but also share similar computation logic encoded as formulas. |
Sibei Chen; Yeye He; Weiwei Cui; Ju Fan; Song Ge; Haidong Zhang; Dongmei Zhang; Surajit Chaudhuri; |
| 148 | Banzhaf Values for Facts in Query Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm, an anytime deterministic approximation algorithm with relative error guarantees, and an algorithm for ranking and top-k. |
Omer Abramovich; Daniel Deutch; Nave Frost; Ahmet Kara; Dan Olteanu; |
| 149 | CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We implement the prototype of CaaS-LSM based on RocksDB and evaluate it with different LSM-based distributed databases (Kvrocks and Nebula). |
Qiaolin Yu; Chang Guo; Jay Zhuang; Viraj Thakkar; Jianguo Wang; Zhichao Cao; |
| 150 | CAVE: Concurrency-Aware Graph Processing on SSDs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two key ways to parallelize graph traversal algorithms based on the graph structure and algorithm: intra-subgraph and inter-subgraph parallelization. |
Tarikul Islam Papon; Taishan Chen; Shuo Zhang; Manos Athanassoulis; |
| 151 | Certain and Approximately Certain Models for Statistical Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that it is possible to learn accurate models directly from data with missing values for certain training data and target models. |
Cheng Zhen; Nischal Aryal; Arash Termehchy; Amandeep Singh Chabada; |
| 152 | CodeS: Towards Building Open-source Language Models for Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task. |
Haoyang Li; Jing Zhang; Hanbing Liu; Ju Fan; Xiaokang Zhang; Jun Zhu; Renjie Wei; Hongyan Pan; Cuiping Li; Hong Chen; |
| 153 | Continual Observation of Joins Under Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, all existing works, with the exception of [28,51], have only studied the simple counting query and its derivatives. Join queries, which are arguably the most important class of queries in relational databases, have only been considered in [28,51], but the solutions offered there have two limitations: First, they only support a few specific graph pattern queries, which are special cases of joins. |
Wei Dong; Zijun Chen; Qiyao Luo; Elaine Shi; Ke Yi; |
| 154 | Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, combining the strengths of these methods to maintain sketches for multi-join queries while ensuring fast update times is a non-trivial task, and has remained an open problem for decades as highlighted in the existing literature. In this work, we successfully address this problem by introducing a novel sketching method which has fast updates, even for sketches capable of accurately estimating the cardinality of complex multi-join queries. |
Mike Heddes; Igor Nunes; Tony Givargis; Alex Nicolau; |
| 155 | Counterfactual Explanation at Will, with Zero Privacy Leakage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges, we propose CPC, a data-driven approach to counterfactual. |
Shuai An; Yang Cao; |
| 156 | Data Acquisition for Improving Model Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing the complexity of BA and SA, we introduce two efficient approximate methods, namely kNN-BA and kNN-SA, restricting data acquisition to promising subsets within the data pool. |
Yifan Li; Xiaohui Yu; Nick Koudas; |
| 157 | DecoPa: Query Decomposition for Parallel Complex Event Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They largely neglect the rates with which processing units may ingest and compare events for query evaluation.In this paper, we present an approach for parallel CEP that is based on a flexible decomposition of CEP queries. |
Samira Akili; Steven Purtzel; Matthias Weidlich; |
| 158 | Efficient and Provable Effective Resistance Computation on Large Graphs: An Index-based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in many real-life graphs, it is not always possible to find an easily reachable landmark node, which can significantly hinder the algorithm’s efficiency. To overcome this problem, we propose a novel multiple landmarks technique which involves selecting a set of landmark nodes Vl such that the other nodes in the graph can easily reach any one of a landmark node in Vl. |
Meihao Liao; Junjie Zhou; Rong-Hua Li; Qiangqiang Dai; Hongyang Chen; Guoren Wang; |
| 159 | Efficient Approximation of Kemeny’s Constant for Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two scalable Monte Carlo algorithms RefinedMC and ForestMC to approximate Kemeny’s constant. |
Haisong Xia; Zhongzhi Zhang; |
| 160 | Efficient Maximal Biplex Enumerations with Improved Worst-Case Time Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The state-of-the-art solutions for maximal k-biplex enumeration suffer from efficiency issues as k increases (k ≥ 2), with the time complexity of O(m 2n), where n (m) denotes the number of vertices (edges) in the bipartite graph. To address this issue, we propose two theoretically and practically efficient enumeration algorithms based on novel branching techniques. |
Qiangqiang Dai; Rong-Hua Li; Donghang Cui; Meihao Liao; Yu-Xuan Qiu; Guoren Wang; |
| 161 | FairHash: A Fair and Memory/Time-efficient Hashmap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose three families of algorithms to design fair hashmaps, suitable for different settings. |
Nima Shahbazi; Stavros Sintos; Abolfazl Asudeh; |
| 162 | Faster Algorithms for Fair Max-Min Diversification in Rd Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size k from a large collection of n data points (FairDiv). |
Yash Kurkure; Miles Shamo; Joseph Wiseman; Sainyam Galhotra; Stavros Sintos; |
| 163 | Fault Tolerance Placement in The Internet of Things Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a resource-aware fault-tolerance approach that takes the unique characteristics of the Edge into account to provide reliable stream processing. |
Anastasiia Kozar; Bonaventura Del Monte; Steffen Zeuch; Volker Markl; |
| 164 | FeatureLTE: Learning to Estimate Feature Importance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present FeatureLTE, a novel learning-based approach to FIS estimation. |
Tianping Zhang; Zhaoyang Wang; Chen Qian; Jian Li; Yin Lou; |
| 165 | Graph Summarization: Compactness Meets Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For efficiency, Mags is on average 11.1x and 4.2x faster than the two state-of-the-art algorithms with practical overheads, while Mags-DM can further reduce the running time by 13.4x compared with Mags. This shows that graph summarization algorithms can be made practical while still offering a compact summary. |
Deming Chu; Fan Zhang; Wenjie Zhang; Ying Zhang; Xuemin Lin; |
| 166 | GRF: A Global Range Filter for LSM-Trees with Shape Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we developed the Global Range Filter (GRF) for RocksDB that reduces the number of filter probes per query to one. |
Hengrui Wang; Te Guo; Junzhao Yang; Huanchen Zhang; |
| 167 | GTS: GPU-based Tree Index for Fast Similarity Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, these methods struggle to meet the demand for high throughput data management. To address these challenges, we propose GTS, a GPU-based tree index designed for the parallel processing of similarity search in general metric spaces, where only the distance metric for measuring object similarity is known. |
Yifan Zhu; Ruiyao Ma; Baihua Zheng; Xiangyu Ke; Lu Chen; Yunjun Gao; |
| 168 | High Precision ≠ High Cost: Temporal Data Fusion for Multiple Low-Precision Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our major contributions include (1) the problem formalization and NP-hardness analysis on finding the fusion result with the maximum likelihood w.r.t. local fusion models, (2) exact algorithms based on dynamic programming for tackling the problem, (3) efficient approximation methods with performance guarantees. |
Jingyu Zhu; Yu Sun; Shaoxu Song; Xiaojie Yuan; |
| 169 | Historical Embedding-Guided Efficient Large-Scale Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing FedGCN training techniques with neighbor sampling often produce extremely large communication and computation overhead and inaccurate node embeddings, leading to poor model performance. To bridge this gap, we propose the Federated Adaptive Attention-based Sampling (FedAAS) approach. |
Anran Li; Yuanyuan Chen; Jian Zhang; Mingfei Cheng; Yihao Huang; Yueming Wu; Anh Tuan Luu; Han Yu; |
| 170 | Hyper: A High-Performance and Memory-Efficient Learned Index Via Hybrid Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often face a fundamental trade-off between performance and memory consumption, especially in dynamic environments with frequent insert and delete operations. This trade-off stems from the construction approaches used in learned indexes: The top-down approach increases performance at the cost of significant memory overhead, while the bottom-up approach focuses on memory efficiency but introduces performance issues due to prediction errors. |
Shunkang Zhang; Ji Qi; Xin Yao; Andr\'{e} Brinkmann; |
| 171 | Implementation Strategies for Views Over Property Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider both virtual and materialized views, ways of rewriting queries, and structures for indexing data. |
Soonbo Han; Zachary G. Ives; |
| 172 | In-depth Analysis of Continuous Subgraph Matching in A Common Delta Query Compilation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework that generates CSM code from the logical and physical plans of delta queries with stacked views. |
Yukyoung Lee; Kyoungmin Kim; Wonseok Lee; Wook-Shin Han; |
| 173 | Learning-based Property Estimation with Polynomials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These studies also prove estimation errors with respect to the sample size. Motivated by the above polynomial estimation framework, we propose a learning-based estimation framework with polynomial approximation, which aims to learn the coefficients of the polynomial, providing theoretical guarantees to the learning framework. |
Jiajun Li; Runlin Lei; Sibo Wang; Zhewei Wei; Bolin Ding; |
| 174 | Lorentz: Learned SKU Recommendation Using Profile Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Lorentz, an intelligent SKU recommender for provisioning new compute resources that circumvents the need for workload traces. |
Nick Glaze; Tria McNeely; Yiwen Zhu; Matthew Gleeson; Helen Serr; Rajeev Bhopi; Subru Krishnan; |
| 175 | Low-Latency Adaptive Distributed Stream Join System Based on A Flexible Join Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new stream join model for processing arbitrary join predicates, called CoModel, which offers a flexible trade-off between memory and computing resource consumption. |
Qihang Wang; Decheng Zuo; Zhan Zhang; Yanjun Shu; Xin Liu; Mingxuan He; |
| 176 | Making In-Memory Learned Indexes Efficient on Disk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that it is feasible to create efficient disk-based learned indexes by applying a set of general transformations and optimizations to existing in-memory ones. |
Jiaoyi Zhang; Kai Su; Huanchen Zhang; |
| 177 | Materialized View Selection \& View-Based Query Planning for Regular Path Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the available memory is limited, we define the materialized view selection (MVS) problem for RPQs as minimizing the total workload query cost within a memory budget. |
Yue Pang; Lei Zou; Jeffrey Xu Yu; Linglin Yang; |
| 178 | MCR-Tree: An Efficient Index for Multi-dimensional Core Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing indexes suffer from several limitations, such as significant redundancy, lack of scalability with respect to the number of parameters, limited generality, and inadequate consideration of index maintenance. To address these limitations, in this paper, we thoroughly investigate the problem of multi-dimensional core search. |
Chengyang Luo; Yifan Zhu; Qing Liu; Yunjun Gao; Lu Chen; Jianliang Xu; |
| 179 | Nexus: Correlation Discovery Over Collections of Spatio-Temporal Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, performing accurate causal analysis on observational data is generally infeasible, and therefore, domain experts start exploration with the identification of correlations. The increased availability of data from open government websites, organizations, and scientific studies presents an opportunity to harness observational datasets in assisting domain experts during this exploratory phase.In this work, we introduce Nexus, a system designed to align large repositories of spatio-temporal datasets and identify correlations, facilitating the exploration of causal relationships. |
Yue Gong; Sainyam Galhotra; Raul Castro Fernandez; |
| 180 | Object-oriented Unified Encrypted Memory Management for Heterogeneous Memory Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the concept of Unified Encrypted Memory (UEM) management, a novel approach that provides unified object references essential for data management platforms, while simultaneously concealing the complexities of physical scheduling from developers. |
Mo Sha; Yifan Cai; Sheng Wang; Linh Thi Xuan Phan; Feifei Li; Kian-Lee Tan; |
| 181 | On Efficient Large Sparse Matrix Chain Multiplication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the RS-estimator, we propose a novel ordering algorithm for determining a good order of efficient SMCM. |
Chunxu Lin; Wensheng Luo; Yixiang Fang; Chenhao Ma; Xilin Liu; Yuchi Ma; |
| 182 | On Querying Historical Connectivity in Temporal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework and design a novel forest-based index for historical connectivity queries. |
Jingyi Song; Dong Wen; Lantian Xu; Lu Qin; Wenjie Zhang; Xuemin Lin; |
| 183 | Optimizing Disjunctive Queries with Tagged Execution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Careless creation of tags can lead to an exponential blowup in the tag space, with the overhead outweighing the benefits. To address this issue, we present a technique called tag generalization to minimize the space of tags. |
Albert Kim; Samuel Madden; |
| 184 | Optimizing Time Series Queries with Versions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an algebra consisting of version operators addresses the semantics for time-series applications to evaluate and optimize physical query plans. |
Rui Kang; Shaoxu Song; |
| 185 | OTClean: Data Cleaning for Conditional Independence Violations Using Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce OTClean, a framework that harnesses optimal transport theory for data repair under CI constraints. |
Alireza Pirhadi; Mohammad Hossein Moslemi; Alexander Cloninger; Mostafa Milani; Babak Salimi; |
| 186 | PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real PIM hardware has recently become commercially accessible to the public. In this work, we leverage the real PIM hardware platform to build a graph pattern matching framework, PimPam, to benefit from its abundant computation and memory bandwidth resources. |
Shuangyu Cai; Boyu Tian; Huanchen Zhang; Mingyu Gao; |
| 187 | Play Like A Vertex: A Stackelberg Game Approach for Streaming Graph Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel streaming partitioning algorithm, the Skewness-aware Vertex-cut Partitioner (S5P ), designed to leverage the skewness characteristics of real graphs for achieving high-quality partitioning. |
Zezhong Ding; Yongan Xiang; Shangyou Wang; Xike Xie; S. Kevin Zhou; |
| 188 | PreLog: A Pre-trained Model for Log Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PreLog, a novel pre-trained model for log analytics. |
Van-Hoang Le; Hongyu Zhang; |
| 189 | Qr-Hint: Actionable Hints Towards Correcting Wrong SQL Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a system called Qr-Hint that, given a (correct) target query Q* and a (wrong) working query Q, both expressed in SQL, provides actionable hints for the user to fix the working query so that it becomes semantically equivalent to the target. |
Yihao Hu; Amir Gilad; Kristin Stephens-Martinez; Sudeepa Roy; Jun Yang; |
| 190 | Query Compilation Without Regrets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Query compilation provides excellent performance, but at the same time introduces significant system complexity, as it makes the engine hard to build, debug, and maintain. To overcome this complexity, we propose Nautilus, a framework that combines the ease of use of query interpretation and the performance of query compilation. |
Philipp M. Grulich; Aljoscha P. Lepping; Dwi P. A. Nugroho; Varun Pandey; Bonaventura Del Monte; Steffen Zeuch; Volker Markl; |
| 191 | Query Refinement for Diverse Top-k Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define and study the problem of modifying the selection conditions of an ORDER BY query so that the result of the modified query closely fits some user-defined notion of diversity while simultaneously maintaining the intent of the original query. |
Felix S. Campbell; Alon Silberstein; Julia Stoyanovich; Yuval Moskovitch; |
| 192 | RaBitQ: Quantizing High-Dimensional Vectors with A Theoretical Error Bound for Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their empirical success, we note that these methods do not have a theoretical error bound and are observed to fail disastrously on some real-world datasets. Motivated by this, we propose a new randomized quantization method named RaBitQ, which quantizes D-dimensional vectors into D-bit strings. |
Jianyang Gao; Cheng Long; |
| 193 | Relational Algorithms for Top-k Query Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce relational algorithms, a paradigm where each algorithmic step is expressed by a relational operator. |
Qichen Wang; Qiyao Luo; Yilei Wang; |
| 194 | Revisiting B-tree Compression: An Experimental Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct the first experimental evaluation of seven widely used B-tree compression techniques using both synthetic and real datasets. |
Chuqing Gao; Shreya Ballijepalli; Jianguo Wang; |
| 195 | ROME: Robust Query Optimization Via Parallel Multi-Plan Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a non-intrusive approach to robust query processing that can be used on top of any SQL execution engine. |
Ziyun Wei; Immanuel Trummer; |
| 196 | Scalable Distributed Inverted List Indexes in Disaggregated Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Connected via high-speed RDMA-enabled networks, compute nodes can directly access remote memory. This setting often requires complex protocols with many network roundtrips as memory nodes have near-zero compute power.In this paper, we design a scalable distributed inverted list index for disaggregated memory architectures. |
Manuel Widmoser; Daniel Kocher; Nikolaus Augsten; |
| 197 | SchemaPile: A Large Collection of Relational Database Schemas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It contains 1.7 million tables with 10 million column definitions, 700 thousand foreign key relationships, seven million integrity constraints, and data content for more than 340 thousand tables. We conduct an in-depth analysis on the millions of schema metadata properties in our corpus, as well as its highly diverse language and topic distribution. |
Till D\{o}hmen; Radu Geacu; Madelon Hulsebos; Sebastian Schelter; |
| 198 | Settling Time Vs. Accuracy Tradeoffs for Clustering Big Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. |
Andrew Draganov; David Saulpic; Chris Schwiegelshohn; |
| 199 | SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present SIMPLE, a versatile system designed to address the major efficiency bottleneck in training existing T-GNNs on a large scale. |
Shihong Gao; Yiming Li; Xin Zhang; Yanyan Shen; Yingxia Shao; Lei Chen; |
| 200 | Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in A Colossal Configuration Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They typically follow fixed patterns to specify the level capacity and the number of sorted runs per-level. This confines their designs to a restricted space, limiting opportunities for broader optimizations.To address this challenge, we consider a more flexible configuration that enables independent adjustments of the number of runs per-level, size ratio, and Bloom filter settings at each LSM-tree level. |
Junfeng Liu; Fan Wang; Dingheng Mo; Siqiang Luo; |
| 201 | Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new emphtable fine-tuning ” paradigm, where we continue to train/fine-tune language models like GPT-3.5 and ChatGPT, using diverse table-tasks synthesized from real tables as training data, which is analogous to instruction fine-tuning”, but with the goal of enhancing language models’ ability to understand tables and perform table tasks. |
Peng Li; Yeye He; Dror Yashar; Weiwei Cui; Song Ge; Haidong Zhang; Danielle Rifinski Fainman; Dongmei Zhang; Surajit Chaudhuri; |
| 202 | Temporal JSON Keyword Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a system called Temporal JSON Keyword Search (TJKS) for search in a collection of JSON documents that vary over time. |
Curtis Dyreson; Amani Shatnawi; Sourav S. Bhowmick; Vishal Sharma; |
| 203 | Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. |
Guanlin Mo; Shihong Song; Hu Ding; |
| 204 | UBlade: Efficient Batch Processing for Uncertainty Graph Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce uBlade, an efficient batch-processing framework for uncertain graph queries on multi-core CPUs. |
Siyuan Yao; Yuchen Li; Shixuan Sun; Jiaxin Jiang; Bingsheng He; |
| 205 | Understanding The Performance Implications of The Design Principles in Storage-Disaggregated Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, many critical research questions remain unclear, such as the performance impact of storage-disaggregation, the log-as-the-database design, shared-storage, and various log-replay methods.In this paper, we investigate the performance implications of the design principles that are widely adopted in storage-disaggregated databases for the first time. |
Xi Pang; Jianguo Wang; |
| 206 | Unstructured Data Fusion for Schema and Data Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel problem, which we term Semistructured Schema and Data Extraction (SDE). |
Kaiwen Chen; Nick Koudas; |
| 207 | Wii: Dynamic Budget Reallocation In Index Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in considerable waste of the budget, as these what-if calls are unnecessary. In this paper, we propose Wii, a lightweight mechanism that aims to avoid such spurious what-if calls. |
Xiaoying Wang; Wentao Wu; Chi Wang; Vivek Narasayya; Surajit Chaudhuri; |
| 208 | GE2: A General and Efficient Knowledge Graph Embedding Learning System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose a general execution model that encompasses various negative sampling algorithms. |
Chenguang Zheng; Guanxian Jiang; Xiao Yan; Peiqi Yin; Qihui Zhou; James Cheng; |
| 209 | Language-Model Based Informed Partition of Databases to Speed Up Pattern Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While there have been a lot of advances in the field, due to the NP-hard nature of the problem, the main approaches still struggle when they are faced with large databases with large and sparse vocabularies, such as the ones obtained from graph propositionalizations. There have been efforts to propose parallel algorithms, but, so far, the goal has not been to tackle this source of complexity (i.e., vocabulary size), thus, in this paper, we propose to parallelize frequent itemset mining algorithms by partitioning the database horizontally (i.e., transaction-wise) while not neglecting all the possible vertical information (i.e., item-wise). |
Carlos Bobed Lisbona; Jordi Bernad; Pierre Maillot; |
| 210 | StarfishDB: A Query Execution Engine for Relational Probabilistic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce StarfishDB, a query execution engine optimized for relational probabilistic programming. |
Ouael Ben Amara; Sami Hadouaj; Niccol\`{o} Meneghetti; |
| 211 | ThalamusDB: Approximate Query Processing on Multi-Modal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ThalamusDB, a novel approximate query processing system that processes complex SQL queries on multi-modal data. |
Saehan Jo; Immanuel Trummer; |
| 212 | Vexless: A Serverless Vector Data Management System Using Cloud Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on vector databases, which have recently gained significant attention partly due to large language models. |
Yongye Su; Yinqi Sun; Minjia Zhang; Jianguo Wang; |
| 213 | Keep It Simple: Testing Databases Via Differential Query Plans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We studied TQS’s bug reports, and found that 14 of 15 unique bugs were reported by showing discrepancies in executing the same query with different query plans. Therefore, in this work, we propose a simple alternative approach to TQS. |
Jinsheng Ba; Manuel Rigger; |