Paper Digest: SIGMOD 2023 Papers & Highlights
Interested users can choose to read all SIGMOD-2023 papers in our digest console, which supports more features.
To search for papers presented at SIGMOD-2023 on a specific topic, please make use of the search by venue (SIGMOD-2023) service. To summarize the latest research published at SIGMOD-2023 on a specific topic, you can utilize the review by venue (SIGMOD-2023) service. To synthesizes the findings from SIGMOD 2023 into comprehensive reports, give a try to SIGMOD-2023 Research. If you are interested in browsing papers by author, we have a comprehensive list of all SIGMOD-2023 authors & their papers.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: SIGMOD 2023 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | AWARE: Workload-aware, Redundancy-exploiting Linear Algebra Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing work on lossless compression and compressed linear algebra (CLA) enable such exploitation to a degree, but face challenges for general applicability. In this paper, we address these limitations with a workload-aware compression framework, comprising a broad spectrum of new compression schemes and kernels. |
Sebastian Baunsgaard; Matthias Boehm; |
| 2 | Unsupervised Hashing with Semantic Concept Mining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In real-world scenarios, each image is associated with some concepts, and the similarity between two images will be larger if they share more identical concepts. Inspired by the above intuition, in this work, we propose a novel Unsupervised Hashing with Semantic Concept Mining, called UHSCM, which leverages a VLP model to construct a high-quality similarity matrix. |
Rong-Cheng Tu; Xian-Ling Mao; Kevin Qinghong Lin; Chengfei Cai; Weize Qin; Wei Wei; Hongfa Wang; Heyan Huang; |
| 3 | CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop CompressGraph, an efficient rule-based graph analytics engine that leverages data redundancy in graphs to achieve both performance boost and space reduction for common graph applications. |
Zheng Chen; Feng Zhang; JiaWei Guan; Jidong Zhai; Xipeng Shen; Huanchen Zhang; Wentong Shu; Xiaoyong Du; |
| 4 | A New Sparse Data Clustering Method Based On Frequent Items Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike most existing clustering algorithms, which adopt Euclidean distance as the similarity measure, k-FreqItems uses the popular Jaccard distance for comparing sets.Since the efficiency and effectiveness of k-FreqItems are highly dependent on an initial set of representative seeds, we introduce a new randomized initialization method, SILK, to deal with the seeding problem of k-FreqItems. |
Qiang Huang; Pingyi Luo; Anthony K. H. Tung; |
| 5 | Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on predicting the pump probability of all coins listed in the target exchange before a scheduled pump time, which we refer to as the target coin prediction task. |
Sihao Hu; Zhen Zhang; Shengliang Lu; Bingsheng He; Zhao Li; |
| 6 | Virtual-Memory Assisted Buffer Management Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose vmcache, a buffer manager design that instead uses hardware-supported virtual memory to translate page identifiers to virtual memory addresses. |
Viktor Leis; Adnan Alhomssi; Tobias Ziegler; Yannick Loeck; Christian Dietrich; |
| 7 | IFlipper: Label Flipping for Individual Fairness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we show that label flipping is an effective pre-processing technique for improving individual fairness.Our system iFlipper solves the optimization problem of minimally flipping labels given a limit to the individual fairness violations, where a violation occurs when two similar examples in the training data have different labels. |
Hantian Zhang; Ki Hyun Tae; Jaeyoung Park; Xu Chu; Steven Euijong Whang; |
| 8 | SANTOS: Relationship-based Semantic Table Union Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of the union search. |
Aamod Khatiwada; Grace Fan; Roee Shraga; Zixuan Chen; Wolfgang Gatterbauer; Ren\'{e}e J. Miller; Mirek Riedewald; |
| 9 | LadderFilter: Filtering Infrequent Items with Small Memory and Time Overhead Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This increases memory and time overhead. To reduce this overhead, we propose LadderFilter, which candiscard infrequent items efficiently in terms of both memory and time. |
Yuanpeng Li; Feiyu Wang; Xiang Yu; Yilong Yang; Kaicheng Yang; Tong Yang; Zhuo Ma; Bin Cui; Steve Uhlig; |
| 10 | Effectiveness Perspectives and A Deep Relevance Model for Spatial Keyword Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the finding, we propose a Deep relevance with Weight learning (DrW) model to further improve the effectiveness of the retrieval ranking. |
Shang Liu; Gao Cong; Kaiyu Feng; Wanli Gu; Fuzheng Zhang; |
| 11 | Circinus: Fast Redundancy-Reduced Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a subgraph matching system, Circinus, which enables effective computation sharing by a new compression-based backtracking method. |
Tatiana Jin; Boyang Li; Yichao Li; Qihui Zhou; Qianli Ma; Yunjian Zhao; Hongzhi Chen; James Cheng; |
| 12 | Composite Object Normal Forms: Parameterizing Boyce-Codd Normal Form By The Number of Minimal Keys Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that n quantifies a trade-off between access variety and update complexity. |
Zhuoxing Zhang; Wu Chen; Sebastian Link; |
| 13 | EAR-Oracle: On Efficient Indexing for Distance Queries Between Arbitrary Points on Terrain Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an indexing structure, namely Efficient Arbitrary Point-to-Arbitrary Point Distance Oracle (EAR-Oracle), with theoretical guarantee on the accuracy, oracle building time, oracle size and query time. |
Bo Huang; Victor Junqiu Wei; Raymond Chi-Wing Wong; Bo Tang; |
| 14 | Fast Continuous Subgraph Matching Over Streaming Graphs Via Backtracking Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. |
Rongjian Yang; Zhijie Zhang; Weiguo Zheng; Jeffrey Xu Yu; |
| 15 | Efficient Estimation of Pairwise Effective Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Given an undirected graph G, the effective resistance r(s,t) measures the dissimilarity of node pair s,t in G, which finds numerous applications in real-world problems, such as … |
Renchi Yang; Jing Tang; |
| 16 | Time2State: An Unsupervised Framework for Inferring The Latent States in Time Series Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the computational cost, we present Time2State, a scalable framework that utilizes a sliding window and an encoder to greatly reduce the length of raw time series. |
Chengyu Wang; Kui Wu; Tongqing Zhou; Zhiping Cai; |
| 17 | Incremental Tabular Learning on Heterogeneous Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, classic incremental learning models may hinder their effectiveness. In this paper, we propose a new method, incremental tabular learning on heterogeneous feature space (ILEAHE) to solve this issue. |
Hanmo Liu; Shimin Di; Lei Chen; |
| 18 | One-shot Garbage Collection for In-memory OLTP Through Temporality-aware Version Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: OneShotGC leverages the temporal correlations across versions to opportunistically cluster them into contiguous memory blocks that can be released in one shot. We implement OneShotGC in Proteus and use YCSB and TPC-C to experimentally evaluate its performance with respect to the state-of-the-art, where we observe an improvement of up to 2x in transactional throughput. |
Aunn Raza; Periklis Chrysogelos; Angelos Christos Anadiotis; Anastasia Ailamaki; |
| 19 | AutoOD: Automatic Outlier Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. |
Lei Cao; Yizhou Yan; Yu Wang; Samuel Madden; Elke A. Rundensteiner; |
| 20 | A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Several data-for-good projects [1, 5, 12] initiated by major companies (e.g., Meta, Google) release to the public spatio-temporal datasets to benefit COVID-19 spread modeling … |
Ritesh Ahuja; Sepanta Zeighami; Gabriel Ghinita; Cyrus Shahabi; |
| 21 | Most Expected Winner: An Interpretation of Winners Over Uncertain Voter Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an alternative winner interpretation, selecting the Most Expected Winner (MEW) according to the expected performance of the candidates.We separate the uncertainty in voter preferences into the generation step and the observation step, which gives rise to a unified voting profile combining both incomplete and probabilistic voting profiles. |
Haoyue Ping; Julia Stoyanovich; |
| 22 | Grouping Time Series for Efficient Columnar Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thereby, we propose a heuristic algorithm for automatically grouping time series for efficient columnar storage. |
Chenguang Fang; Shaoxu Song; Haoquan Guan; Xiangdong Huang; Chen Wang; Jianmin Wang; |
| 23 | How To Optimize My Blockchain? A Multi-Level Recommendation Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since blockchains handle multiple aspects ranging from organizational governance to smart contract design, a holistic approach that encompasses all the different layers of a given blockchain system is required to ensure that all optimization opportunities are taken into consideration. In this vein, we define a multi-level optimization recommendation approach that identifies optimization opportunities within a blockchain at the system, data, and user level. |
Jeeta Ann Chacko; Ruben Mayer; Hans-Arno Jacobsen; |
| 24 | Personalized PageRank on Evolving Graphs with An Incremental Index-Update Scheme Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing index-update schemes can not achieve a sub-linear update time. Motivated by this, we present an efficient indexing scheme for single-source PPR queries on evolving graphs. |
Guanhao Hou; Qintian Guo; Fangyuan Zhang; Sibo Wang; Zhewei Wei; |
| 25 | Transaction Scheduling: From Conflicts to Runtime Conflicts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that if transactions in each cluster are properly scheduled, transactions that are traditionally considered conflicting can be executed without conflicts at runtime. In light of this, we propose to schedule transactions and reduce runtime conflicts, instead of partitioning based on the conventional notion of conflicts. |
Yang Cao; Wenfei Fan; Weijie Ou; Rui Xie; Wenyue Zhao; |
| 26 | ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a novel data structure and GPU-friendly parallel algorithms for efficient computation of all the operations of SimRank on GPU. |
Tianhao Wu; Ji Cheng; Chaorui Zhang; Jianfeng Hou; Gengjian Chen; Zhongyi Huang; Weixi Zhang; Wei Han; Bo Bai; |
| 27 | Speeding Up End-to-end Query Execution Via Learning-based Progressive Cardinality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel Learning-based Progressive Cardinality Estimator (LPCE), which adopts a query re-optimization methodology. |
Fang Wang; Xiao Yan; Man Lung Yiu; Shuai LI; Zunyao Mao; Bo Tang; |
| 28 | Distributed GPU Joins on Fast RDMA-capable Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel pipelined GPU join that accelerates the performance of distributed DBMSs by leveraging GPU resources on fast networks. |
Lasse Thostrup; Gloria Doci; Nils Boeschen; Manisha Luthra; Carsten Binnig; |
| 29 | GitTables: A Large-Scale Corpus of Relational Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. |
Madelon Hulsebos; \c{C}agatay Demiralp; Paul Groth; |
| 30 | DbET: Execution Time Distribution-based Plan Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we complement existing plan selection methods by proposing a new approach named ET, which produces execution time distributions for query plans utilizing conformal predictions. |
Yifan Li; Xiaohui Yu; Nick Koudas; Shu Lin; Calvin Sun; Chong Chen; |
| 31 | Ground Truth Inference for Weakly Supervised Entity Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tailor the labeling model for EM, we formulate an approach to ensure that the final predictions of the labeling model satisfy the transitivity property required in EM, utilizing an exact solution where possible and an ML-based approximation in remaining cases. |
Renzhi Wu; Alexander Bendeck; Xu Chu; Yeye He; |
| 32 | Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel updatability framework (DDUp). |
Meghdad Kurmanji; Peter Triantafillou; |
| 33 | I/O-Efficient Butterfly Counting at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study I/O-efficient algorithms for doing butterfly counting on hierarchical memory. |
Zhibin Wang; Longbin Lai; Yixue Liu; Bing Shui; Chen Tian; Sheng Zhong; |
| 34 | LiteHST: A Tree Embedding Based Method for Similarity Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new construction algorithm with lower time complexity than existing methods and prove the optimality of LiteHST in the distance bound. |
Yuxiang Zeng; Yongxin Tong; Lei Chen; |
| 35 | Raster Intervals: An Approximation Technique for Polygon Intersection Joins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The refinement step has been shown notoriously expensive, especially for polygon-polygon joins, constituting the bottleneck of the entire process. We propose a novel approximation technique for polygons, which (i) rasterizes them using a fine grid, (ii) models groups of nearby cells that intersect a polygon as an interval, and (iii) encodes each interval by a bitstring that captures the overlap of each cell in it with the polygon. |
Thanasis Georgiadis; Nikos Mamoulis; |
| 36 | Optimizing Tensor Programs on Flexible Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe a system that allows users to define flexible storage formats in a declarative tensor query language, similar to the language used by the tensor program. |
Maximilian Schleich; Amir Shaikhha; Dan Suciu; |
| 37 | LinCQA: Faster Consistent Query Answering with Linear Time Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify a class of acyclic select-project-join (SPJ) queries for which CQA can be solved via SQL rewriting with a linear time guarantee. |
Zhiwei Fan; Paraschos Koutris; Xiating Ouyang; Jef Wijsen; |
| 38 | Probabilistic Reasoning at Scale: Trigger Graphs to The Rescue Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) — a notion recently introduced for the non-probabilistic setting. |
Efthymia Tsamoura; Jaehun Lee; Jacopo Urbani; |
| 39 | Foreign Keys Open The Door for Faster Incremental View Maintenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Knowing that one of the join keys is a foreign-key would allow us to prune all but one of the UNION ALL branches and obtain a more efficient IVM script. In this work, we explore ways of incorporating knowledge about foreign key into IVM in order to speed up its performance. |
Christoforos Svingos; Andre Hernich; Hinnerk Gildhoff; Yannis Papakonstantinou; Yannis Ioannidis; |
| 40 | FactorJoin: A New Cardinality Estimation Framework for Join Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They either rely on simplified assumptions leading to ineffective cardinality estimates or build large models to understand the complicated data distributions, leading to long planning times and a lack of generalizability across queries.In this paper, we propose a new framework FactorJoin for estimating join queries. |
Ziniu Wu; Parimarjan Negi; Mohammad Alizadeh; Tim Kraska; Samuel Madden; |
| 41 | FlexER: Flexible Entity Resolution for Multiple Intents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve MIER. |
Bar Genossar; Roee Shraga; Avigdor Gal; |
| 42 | MRVs: Enforcing Numeric Invariants in Parallel Updates to Hotspots with Randomized Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is particularly challenging in emerging large-scale database systems, as latency increases the probability of conflicts, state-of-the-art lock-based mitigations are not available, and most alternatives provide only weak consistency and cannot enforce lower bound invariants. We address this challenge with Multi-Record Values (MRVs), a technique that can be layered on existing database systems and that uses randomization to split and access numeric values in multiple records such that the probability of conflict can be made arbitrarily small. |
Nuno Faria; Jos\'{e} Pereira; |
| 43 | Polaris: Enabling Transaction Priority in Optimistic Concurrency Control Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transaction priority is a critical feature for real-world database systems. Under high contention, certain classes of transactions should be given a higher chance to commit than … |
Chenhao Ye; Wuh-Chwen Hwang; Keren Chen; Xiangyao Yu; |
| 44 | An Efficient Algorithm for Distance-based Structural Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the new definition brings challenges in the computation of final clustering results. To tackle this efficiency issue, we propose DistanceSCAN, an efficient approximate algorithm for solving the distance-based SCAN problem. |
Kaixin Liu; Sibo Wang; Yong Zhang; Chunxiao Xing; |
| 45 | SplinterDB and Maplets: Improving The Tradeoffs in Key-Value Store Compaction Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And, with fast storage devices, the CPU costs of querying filters in a lazy compacting system can be significant.In this work, we present Mapped SplinterDB, a key-value store that achieves excellent insertion performance, query performance, space efficiency, and scalability by replacing filters with maplets, space-efficient data structures that act as lossy maps with false positives. |
Alex Conway; Mart\'{\i}n Farach-Colton; Rob Johnson; |
| 46 | IcebergHT: High Performance Hash Tables Through Stability and Low Associativity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes two design objectives, stability and low-associativity, that enable us to build hash tables that minimize cache-line accesses for all operations. |
Prashant Pandey; Michael A. Bender; Alex Conway; Martin Farach-Colton; William Kuszmaul; Guido Tagliavini; Rob Johnson; |
| 47 | Efficient Sampling Approaches to Shapley Value Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we treat the sampling approach to Shapley value approximation as a stratified sampling problem. |
Jiayao Zhang; Qiheng Sun; Jinfei Liu; Li Xiong; Jian Pei; Kui Ren; |
| 48 | Maximum K-Biplex Search on Bipartite Graphs: A Symmetric-BK Branching Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing proposals of tackling this problem impose constraints on the number of vertices of each MBP to be enumerated, yet they are still not sufficient (e.g., they require to specify the constraints, which is often not user-friendly, and cannot control the number of MBPs to be enumerated directly). Therefore, in this paper, we study the problem of finding K MBPs with the most edges called MaxBPs, where K is a positive integral user parameter. |
Kaiqiang Yu; Cheng Long; |
| 49 | Spatio-Temporal Denoising Graph Autoencoders with Data Augmentation for Photovoltaic Data Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel Spatio-Temporal Denoising Graph Autoencoder STD-GAE framework to impute missing PV Power Data. |
Yangxin Fan; Xuanji Yu; Raymond Wieser; David Meakin; Avishai Shaton; Jean-Nicolas Jaubert; Robert Flottemesch; Michael Howell; Jennifer Braid; Laura Bruckman; Roger French; Yinghui Wu; |
| 50 | TED: Towards Discovering Top-k Edge-Diversified Patterns in A Graph Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we propose the Top-k Edge-Diversified Patterns Discovery problem to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called Ted which achieves a guaranteed approximation ratio to the optimal result. |
Kai Huang; Haibo Hu; Qingqing Ye; Kai Tian; Bolong Zheng; Xiaofang Zhou; |
| 51 | Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations, we propose Orca, a novel framework that accelerates T-GNN training by non-trivially caching and reusing intermediate embeddings. |
Yiming Li; Yanyan Shen; Lei Chen; Mingxuan Yuan; |
| 52 | SafeBound: A Practical System for Generating Cardinality Bounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SafeBound, the first practical system for generating cardinality bounds. |
Kyle B. Deeds; Dan Suciu; Magdalena Balazinska; |
| 53 | Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel τ-monotonic graph (τ- MG) to address the limitations. |
Yun Peng; Byron Choi; Tsz Nam Chan; Jianye Yang; Jianliang Xu; |
| 54 | Detecting Logic Bugs of Join Optimizations in DBMS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. |
Xiu Tang; Sai Wu; Dongxiang Zhang; Feifei Li; Gang Chen; |
| 55 | TreeSensing: Linearly Compressing Sketches with Flexibility Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TreeSensing, an accurate, efficient, and flexible framework to linearly compress sketches. |
Zirui Liu; Yixin Zhang; Yifan Zhu; Ruwen Zhang; Tong Yang; Kun Xie; Sha Wang; Tao Li; Bin Cui; |
| 56 | A Universal Question-Answering Platform for Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. |
Reham Omar; Ishika Dhall; Panos Kalnis; Essam Mansour; |
| 57 | Fast Density-Based Clustering: Geometric Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a bottleneck of DBSCAN is its O(n2) worst-case time complexity. In this paper, we propose an algorithm called GAP-DBC, which exploits the geometric relationships between points to solve this problem. |
Xiaogang Huang; Tiefeng Ma; |
| 58 | MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MorphStream, which adopts a novel approach by decomposing scheduling strategies into three dimensions and then strives to make the right decision along each dimension, based on analyzing the decision trade-offs under varying workload characteristics. |
Yancan Mao; Jianjun Zhao; Shuhao Zhang; Haikun Liu; Volker Markl; |
| 59 | An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, the MPC-FM protocol is computationally expensive, which limits its applications to data holders with limited computation resources. To address the above issues, in this paper we propose a novel protocol DP-DICE, which is computationally efficient and differentially private for solving the problem of PDCE. |
Pinghui Wang; Chengjin Yang; Dongdong Xie; Junzhou Zhao; Hui Li; Jing Tao; Xiaohong Guan; |
| 60 | Towards Generating Hop-constrained S-t Simple Path Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. |
Yuzheng Cai; Siyuan Liu; Weiguo Zheng; Xuemin Lin; |
| 61 | GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a strongly consistent OLTP database GeoGauss with full replica multi-master architecture. |
Weixing Zhou; Qi Peng; Zijie Zhang; Yanfeng Zhang; Yang Ren; Sihao Li; Guo Fu; Yulong Cui; Qiang Li; Caiyi Wu; Shangjun Han; Shengyi Wang; Guoliang Li; Ge Yu; |
| 62 | The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. |
Tu Gu; Kaiyu Feng; Gao Cong; Cheng Long; Zheng Wang; Sheng Wang; |
| 63 | Robust and Transferable Log-based Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To perform a comprehensive analysis of log messages, we introduce an adaptive relation modeling technique, which captures feature interactions among log information fields selectively and dynamically for effective and interpretable log representations. |
Peng Jia; Shaofeng Cai; Beng Chin Ooi; Pinghui Wang; Yiyuan Xiong; |
| 64 | Matching Roles from Temporal Data: Why Joe Biden Is Not Only President, But Also Commander-in-Chief Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present role matching, a novel, fine-grained integrity constraint on temporal fact data, i.e., (subject, predicate, object, timestamp)-quadruples. |
Leon Bornemann; Tobias Bleifu\ss{}; Dmitri V. Kalashnikov; Fatemeh Nargesian; Felix Naumann; Divesh Srivastava; |
| 65 | Toward Efficient Homomorphic Encryption for Outsourced Databases Through Parallel Caching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this work is to mitigate the performance overhead incurred by the HE module in outsourced databases. |
Olamide Timothy Tawose; Jun Dai; Lei Yang; Dongfang Zhao; |
| 66 | Runtime Variation in Big Data Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an innovative 2-step approach to predict job runtime distribution by characterizing typical distribution shapes combined with a classification model with an average accuracy of >96\%, using an innovative interpretable machine-learning algorithm out-performing traditional regression models and better capturing long tails. |
Yiwen Zhu; Rathijit Sen; Robert Horton; John Mark Agosta; |
| 67 | Efficient Resistance Distance Computation: The Power of Landmark-based Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To efficiently solve these problems, we first establish several interesting connections among resistance distance, a new concept called v-absorbed random walk, random spanning forests, and a newly-developed v-absorbed push procedure. Based on such new connections, we propose three novel and efficient sampling-based algorithms as well as a deterministic algorithm for single-pair query; and we develop an online and two index-based approximation algorithms for single-source query. |
Meihao Liao; Rong-Hua Li; Qiangqiang Dai; Hongyang Chen; Hongchao Qin; Guoren Wang; |
| 68 | Scaling Up K-Clique Densest Subgraph Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the k-clique densest subgraph problem, which detects the subgraph that maximizes the ratio between the number of k-cliques and the number of vertices in it. |
Yizhang He; Kai Wang; Wenjie Zhang; Xuemin Lin; Ying Zhang; |
| 69 | Discovering Top-k Rules Using Subjective and Objective Criteria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the bi-criteria model, we develop a top-k algorithm to discover top-ranked REEs, and an any-time algorithm for successive discovery via lazy evaluation. |
Wenfei Fan; Ziyan Han; Yaoshu Wang; Min Xie; |
| 70 | FINEX: A Fast Index for Exact \& Flexible Density-Based Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (d) Inflexibility in terms of applicable data types and distance functions. We propose FINEX, a linear-space index that overcomes these limitations. |
Konstantin Emil Thiel; Daniel Kocher; Nikolaus Augsten; Thomas H\{u}tter; Willi Mann; Daniel Schmitt; |
| 71 | DBPA: A Benchmark for Transactional Database Performance Anomalies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem raises the demand of a benchmark for anomaly reproduction and data collection.In this paper, we propose DBPA, a benchmark for transactional database performance anomalies. |
Shiyue Huang; Ziwei Wang; Xinyi Zhang; Yaofeng Tu; Zhongliang Li; Bin Cui; |
| 72 | Efficiently Computing Join Orders with Heuristic Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we provide a strong theoretical framework, in which we reduce join order optimization to the shortest path problem. |
Immanuel Haffner; Jens Dittrich; |
| 73 | T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from A Big Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. |
Lyuheng Yuan; Da Yan; Wenwen Qu; Saugat Adhikari; Jalal Khalil; Cheng Long; Xiaoling Wang; |
| 74 | Discovering Similarity Inclusion Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Sawfish, the first algorithm to discover all similarity inclusion dependencies in a given dataset efficiently. |
Youri Kaminsky; Eduardo H. M. Pena; Felix Naumann; |
| 75 | Effective and Efficient PageRank-based Positioning for Graph Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, most existing measures are computationally inefficient, incurring a long response time when visualizing large graphs. To overcome such deficiencies, we propose a new node distance measure, PDist, geared towards graph visualization by exploiting a well-known node proximity measure,personalized PageRank. |
Shiqi Zhang; Renchi Yang; Xiaokui Xiao; Xiao Yan; Bo Tang; |
| 76 | Maximal Defective Clique Enumeration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve better practical efficiency, we propose a branch-and-bound algorithm with a novel pivoting technique. |
Qiangqiang Dai; Rong-Hua Li; Meihao Liao; Guoren Wang; |
| 77 | Efficient Biclique Counting in Large Bipartite Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The striking feature of EPivoter is that it can count (p,q)-bicliques for all pairs of (p,q) using a combinatorial technique, instead of exhaustively enumerating all (p,q)-bicliques. |
Xiaowei Ye; Rong-Hua Li; Qiangqiang Dai; Hongchao Qin; Guoren Wang; |
| 78 | Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we work on finding global top-K in multiple disjoint data streams. |
Yikai Zhao; Wenchen Han; Zheng Zhong; Yinda Zhang; Tong Yang; Bin Cui; |
| 79 | Managing Conflicting Interests of Stakeholders in Influencer Marketing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the challenge of the extremely large searching space of the hiring prices of the influencers, we solve this problem by firstly considering a restrictive searching sub-space and then gradually expanding the searching sub-space to the whole space in the end (specifically, from binary price choices to a set of integer prices and then to any price in the feasible price range). We propose effective yet efficient approximate algorithms for solving the problem in each of these settings. |
Shixun Huang; Junhao Gan; Zhifeng Bao; Wenqing Lin; |
| 80 | JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design a new sketch algorithm for accurate and unbiased inner-product estimation, namely JoinSketch. |
Feiyu Wang; Qizhi Chen; Yuanpeng Li; Tong Yang; Yaofeng Tu; Lian Yu; Bin Cui; |
| 81 | Regularized Pairwise Relationship Based Analytics for Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we seek to explicitly model and regularize the pairwise relationships between attribute fields of structured data, in a field-adaptive manner, via a proposed attentive and interpretable framework called ATT-Reg. |
Zhaojing Luo; Shaofeng Cai; Yatong Wang; Beng Chin Ooi; |
| 82 | Together Is Better: Heavy Hitters Quantile Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The algorithms are rigorously analyzed, and we demonstrate SQUAD’s superiority using extensive~simulations on real-world traces. |
Rana Shahout; Roy Friedman; Ran Ben Basat; |
| 83 | Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Unicorn, a unified model for generally supporting common data matching tasks. |
Jianhong Tu; Ju Fan; Nan Tang; Peng Wang; Guoliang Li; Xiaoyong Du; Xiaofeng Jia; Song Gao; |
| 84 | Time Series Data Validity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the minimum change criteria in data repairing, we propose to study the minimum number of data points that need to be changed in order to satisfy the constraints, or equivalently, the maximum rate of data that can be reserved without change, as the validity measure. |
Yunxiang Su; Yikun Gong; Shaoxu Song; |
| 85 | Making It Tractable to Catch Duplicates and Conflicts in Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an approach for entity resolution (ER) and conflict resolution (CR) in large-scale graphs. |
Wenfei Fan; Wenzhi Fu; Ruochun Jin; Muyang Liu; Ping Lu; Chao Tian; |
| 86 | ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a three-stage pipelining computing framework, namely selection-conversion-extraction to abstract the distributed computing flow and implement it based on Apache Spark. |
Kaiqi Liu; Panrong Tong; Mo Li; Yue Wu; Jianqiang Huang; |
| 87 | Learned Data-aware Image Representations of Line Charts for Similarity Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the scenario that during query time, only line-chart images are available. |
Yuyu Luo; Yihui Zhou; Nan Tang; Guoliang Li; Chengliang Chai; Leixian Shen; |
| 88 | RLS Side Channels: Investigating Leakage of Row-Level Security Protected Data Through Query Execution Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We implement our solution in PostgreSQL and show that it achieves security with minimal performance impact. |
Chen Dar; Moshik Hershcovitch; Adam Morrison; |
| 89 | LightRW: FPGA Accelerated Graph Dynamic Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. |
Hongshi Tan; Xinyu Chen; Yao Chen; Bingsheng He; Weng-Fai Wong; |
| 90 | HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These two common practices are mutually complementary. In this paper, we study a new problem that, given an HI-pipeline and an AI-pipeline for the same ML task, can we combine them to get a new pipeline (HAI-pipeline) that is better than the provided HI-pipeline and AI-pipeline? |
Sibei Chen; Nan Tang; Ju Fan; Xuemi Yan; Chengliang Chai; Guoliang Li; Xiaoyong Du; |
| 91 | Ready to Leap (by Co-Design)? Join Order Optimisation on Quantum Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first quantum implementation of join ordering, one of the most investigated and fundamental query optimisation problems, based on a reformulation to quadratic binary unconstrained optimisation problems. |
Manuel Sch\{o}nberger; Stefanie Scherzinger; Wolfgang Mauerer; |
| 92 | Mining Geospatial Relationships from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present GTMiner, a novel framework capable of jointly modeling Geospatial and Textual information to construct a knowledge graph, by mining three useful spatial relationships from a geospatial database, in an end-to-end fashion. |
Pasquale Balsebre; Dezhong Yao; Gao Cong; Weiming Huang; Zhen Hai; |
| 93 | Grep: A Graph Learning Based Database Partitioning System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, they involve an expensive step to repetitively partition the data into different compute nodes in order to train a learned key-selection model, which is a waste of time and resources. To address these limitations, we propose a practical learned database partitioning system Grep. |
Xuanhe Zhou; Guoliang Li; Jianhua Feng; Luyang Liu; Wei Guo; |
| 94 | BALANCE: Bayesian Linear Attribution for Root Cause Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose BALANCE (BAyesian Linear AttributioN for root CausE localization), which formulates the problem of RCA through the lens of attribution in XAI and seeks to explain the anomalies in the target KPIs by the behavior of the candidate root causes. |
Chaoyu Chen; Hang Yu; Zhichao Lei; Jianguo Li; Shaokang Ren; Tingkai Zhang; Silin Hu; Jianchao Wang; Wenhui Shi; |
| 95 | Efficient Tree-SVD for Subset Node Embedding Over Large Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The state-of-the-art methods, e.g., DynPPE, still adopt a hashing-based method, while hashing-based solutions are shown to be less effective than matrix factorization (MF)-based methods in existing studies. At the same time, MF-based methods in the literature are too expensive to update the embedding when the graph changes, making them inapplicable on dynamic graphs.Motivated by this, we present Tree-SVD, an efficient and effective MF-based method for dynamic subset embedding. |
Xinyu Du; Xingyi Zhang; Sibo Wang; Zengfeng Huang; |
| 96 | AutoCTS+: Joint Neural Architecture and Hyperparameter Search for Correlated Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, automated CTS solutions remain in their infancy and are only able to find optimal architectures for predefined hyperparameters and scale poorly to large-scale CTS. To overcome these limitations, we propose AutoCTS+, a joint, scalable framework, to automatically devise effective CTS forecasting models. |
Xinle Wu; Dalin Zhang; Miao Zhang; Chenjuan Guo; Bin Yang; Christian S. Jensen; |
| 97 | When Private Blockchain Meets Deterministic Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a comprehensive analysis to uncover the connections between private blockchain and deterministic database. |
Ziliang Lai; Chris Liu; Eric Lo; |
| 98 | Hierarchical Residual Encoding for Multiresolution Time Series Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a simple, but effective multiresolution compression algorithm for time series data, where a single encoding can effectively be decompressed at multiple output resolutions. |
Bruno Barbarioli; Gabriel Mersy; Stavros Sintos; Sanjay Krishnan; |
| 99 | NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on modeling queries rather than data and train neural networks to learn the query answers. |
Sepanta Zeighami; Cyrus Shahabi; Vatsal Sharan; |
| 100 | INEv: In-Network Evaluation for Event Stream Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of collecting all events at one location for query evaluation, sub-queries are placed at network nodes to reduce the data transmission overhead. Yet, existing techniques either place such sub-queries at exactly one node in the network, which neglects the benefits of truly distributed evaluation, or are agnostic to the network structure, which ignores transmission costs due to the absence of direct network links.To overcome the above limitations, we propose INEV graphs for in-network evaluation of CEP queries with rich semantics, including Kleene closure and negation. |
Samira Akili; Steven Purtzel; Matthias Weidlich; |
| 101 | Graph Learning for Interactive Threat Detection in Heterogeneous Smart Home Rule Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Glint, the first graph learning-based system for interactive threat detection in smart homes. |
Guangjing Wang; Nikolay Ivanov; Bocheng Chen; Qi Wang; ThanhVu Nguyen; Qiben Yan; |
| 102 | DsJSON: A Distributed SQL JSON Processor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The complexity of JSON schema makes it challenging to parse arbitrary files in a modern distributed system while producing records with unified schema that can be processed with SQL. To address these challenges, this paper introduces dsJSON, a state-of-the-art distributed JSON processor that overcomes limitations in existing systems and scales to big and complex data. |
Majid Saeedan; Ahmed Eldawy; Zhijia Zhao; |
| 103 | Efficient and Effective Cardinality Estimation for Skyline Family Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a unified skyline family w.r.t. various skyline variants. |
Xiaoye Miao; Yangyang Wu; Jiazhen Peng; Yunjun Gao; Jianwei Yin; |
| 104 | When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent Memories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that in tree-based PM indexes, because of the smaller performance gap between writes and random reads on real PM devices, the read-intensive tree traversal phase dominates the overall latency. This observation calls for further optimizations on existing indexing structures for PM.In this paper, we propose Extendible Radix Tree (ERT), an efficient indexing structure for PM that significantly reduces tree heights to minimize random reads, while still maintaining fast in-node search speed. |
Ke Wang; Guanqun Yang; Yiwei Li; Huanchen Zhang; Mingyu Gao; |
| 105 | Pontus: Finding Waves in Data Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define the wave, a data stream pattern with a serious deviation from the stable arrival rate for a period of time. |
Zhengxin Zhang; Qing Li; Guanglin Duan; Dan Zhao; Jingyu Xiao; Guorui Xie; Yong Jiang; |
| 106 | FEAST: A Communication-efficient Federated Feature Selection Framework for Relational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a federated feature selection framework, called FEAST, which leverages conditional mutual information (CMI) to select more informative features while having low redundancy. |
Rui Fu; Yuncheng Wu; Quanqing Xu; Meihui Zhang; |
| 107 | Pea Hash: A Performant Extendible Adaptive Hashing Index Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that there is a conflict between performance and memory utilization goals. |
Zhuoxuan Liu; Shimin Chen; |
| 108 | Kepler: Robust Learning for Parametric Query Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. |
Lyric Doshi; Vincent Zhuang; Gaurav Jain; Ryan Marcus; Haoyu Huang; Deniz Altinb\{u}ken; Eugene Brevdo; Campbell Fraser; |
| 109 | FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training Via Dynamic Device Placement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generally, MoEs are becoming a new data analytics paradigm in the data life cycle and suffering from unique challenges at scales, complexities, and granularities never before possible.In this paper, we propose a novel DNN training framework, FlexMoE, which systematically and transparently address the inefficiency caused by dynamic dataflow. |
Xiaonan Nie; Xupeng Miao; Zilong Wang; Zichao Yang; Jilong Xue; Lingxiao Ma; Gang Cao; Bin Cui; |
| 110 | Dumpy: A Compact and Adaptive Index for Large Data Series Collections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify two problems of the iSAX index family that adversely affect the overall performance. |
Zeyu Wang; Qitong Wang; Peng Wang; Themis Palpanas; Wei Wang; |
| 111 | Design and Analysis of A Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, to exploit the high potential of PIM on commodity PIM-enabled DIMMs, we need a new join algorithm designed and optimized for the DIMMs and their architectural characteristics.In this paper, we design and analyze Processing-In-DIMM Join (PID-Join), a fast in-memory join algorithm which exploits UPMEM DIMMs, currently the only publicly-available PIM-enabled DIMMs. |
Chaemin Lim; Suhyun Lee; Jinwoo Choi; Jounghoo Lee; Seongyeon Park; Hanjun Kim; Jinho Lee; Youngsok Kim; |
| 112 | Parallel Strong Connectivity Based on Faster Reachability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: SCC is challenging in the parallel setting and is particularly hard on large-diameter graphs. Many existing parallel SCC implementations can be even slower than Tarjan’s sequential algorithm on large-diameter graphs.To tackle this challenge, we propose an efficient parallel SCC implementation using a new parallel reachability approach. |
Letong Wang; Xiaojun Dong; Yan Gu; Yihan Sun; |
| 113 | ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Timeseries management systems play an important role in IoT and performance monitoring. As the data volume scales up, absorbing data memory efficiently with high throughput … |
Zhiqi Wang; Zili Shao; |
| 114 | Efficient and Effective Attributed Hypergraph Clustering Via K-Nearest Neighbor Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. |
Yiran Li; Renchi Yang; Jieming Shi; |
| 115 | DARQ Matter Binds Everything: Performant and Composable Cloud Programming Via Resilient Steps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Composable Resilient Steps (CReSt), a new abstraction for resilient cloud applications. |
Tianyu Li; Badrish Chandramouli; Sebastian Burckhardt; Samuel Madden; |
| 116 | BtrBlocks: Efficient Columnar Compression for Data Lakes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With this work we present BtrBlocks, an open columnar storage format designed for data lakes. |
Maximilian Kuschewski; David Sauerwein; Adnan Alhomssi; Viktor Leis; |
| 117 | Practical Differentially Private and Byzantine-resilient Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there have been extensive studies on privacy and Byzantine security in their own track, solutions that consider both remain sparse. This is due to difficulties in reconciling privacy-preserving and Byzantine-resilient algorithms.In this work, we propose a solution to such a two-fold issue. |
Zihang Xiang; Tianhao Wang; Wanyu Lin; Di Wang; |
| 118 | GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats By Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate work on automatic matrix and frame reader generation by example. |
Saeed Fathollahzadeh; Matthias Boehm; |
| 119 | Efficient and Portable Einstein Summation in SQL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. |
Mark Blacher; Julien Klaus; Christoph Staudt; S\{o}ren Laue; Viktor Leis; Joachim Giesen; |
| 120 | Prerequisite-driven Fair Clustering on Heterogeneous Information Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the problem of fair clustering on heterogeneous information networks (HINs) by considering constraints on structural and sensitive attributes. We propose a Prerequisite-driven Fair Clustering (PDFC ) algorithm to solve this problem. |
Juntao Zhang; Sheng Wang; Yuan Sun; Zhiyong Peng; |
| 121 | Better Than Composition: How to Answer Multiple Relational Queries Under Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we observe that this may yield an error bound that could be a d0.5-factor worse from the optimal, where d is the number of queries. In this paper, we present a different, more holistic approach that closes this gap. |
Wei Dong; Dajun Sun; Ke Yi; |
| 122 | A Step Toward Deep Online Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the existing techniques-called Online Aggregation (OLA)-are limited to a single operation; that is, we cannot obtain the estimates for op(op(data)) or op(…(op(data))). If this Deep OLA becomes possible, data analysts will be able to explore data more interactively using complex cascade operations.In this work, we take a step toward Deep OLA with evolving data frames (edf), a novel data model to offer OLA for nested ops-op(…(op(data)))-by representing an evolving structured data (with converging estimates) that is closed under set operations. |
Nikhil Sheoran; Supawit Chockchowwat; Arav Chheda; Suwen Wang; Riya Verma; Yongjoo Park; |
| 123 | LightCTS: A Lightweight Framework for Correlated Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve this goal, we characterize popular CTS forecasting models and yield two observations that indicate directions for lightweight CTS forecasting. On this basis, we propose the LightCTS framework that adopts plain stacking of temporal and spatial operators instead of alternate stacking that is much more computationally expensive. |
Zhichen Lai; Dalin Zhang; Huan Li; Christian S. Jensen; Hua Lu; Yan Zhao; |
| 124 | RkHit: Representative Query with Uncertain Preference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In 3D space, assuming a uniform preference distribution, we propose a (1-1/e)-approximation algorithm 3DH based on space partitioning. |
Xingxing Xiao; Jianzhong Li; |
| 125 | HR-Index: An Effective Index Method for Historical Reachability Queries Over Evolving Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of historical reachability query on evolving graphs. |
Yajun Yang; Hanxiao Li; Xiangju Zhu; Junhu Wang; Xin Wang; Hong Gao; |
| 126 | Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach employs pipeline patches to specify changes to the data, operators and models of a pipeline. |
Stefan Grafberger; Paul Groth; Sebastian Schelter; |
| 127 | A Framework for Privacy Preserving Localized Graph Pattern Query Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to outsource the localized graph pattern queries (LGPQs) on the SP side with privacy preservation. |
Lyu Xu; Byron Choi; Yun Peng; Jianliang Xu; Sourav S Bhowmick; |
| 128 | T-Rex: Optimizing Pattern Search on Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conducted experiments using 5 real-world datasets and 11 query templates, including those from existing works. |
Silu Huang; Erkang Zhu; Surajit Chaudhuri; Leonhard Spiegelberg; |
| 129 | Design Guidelines for Correct, Efficient, and Scalable Synchronization Using One-Sided RDMA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even worse, some schemes do not correctly synchronize, resulting in rare and hard-to-detect data corruption. Motivated by these observations, we conduct the first comprehensive analysis of one-sided synchronization techniques and provide general principles for correct synchronization using one-sided RDMA. |
Tobias Ziegler; Jacob Nelson-Slivon; Viktor Leis; Carsten Binnig; |
| 130 | Theories and Principles Matter: Towards Visually Appealing and Effective Abstraction of Property Graph Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing LAG-based query interfaces do not embrace HCI principles and psychology theories to inform their design and as a result may have adverse impact on their usability and aesthetics. In this paper, we depart from the classical theory- and principles-oblivious LAG abstraction to present a novel theory-informed visual abstraction called labeled composite graph (LCG) to address this limitation. |
Jiebing Ma; Sourav S Bhowmick; Byron Choi; Lester Tay; |
| 131 | Efficient Star-based Truss Maintenance on Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on dynamic graphs with star insertions/deletions, where a star insertion can represent a newly joined user with friend connections in social networks or a recently published paper with cited references in citation networks. |
Zitan Sun; Xin Huang; Qing Liu; Jianliang Xu; |
| 132 | QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the current Spark architecture is unfit to process workloads made of a large number of small queries optimally due to excessive I/Os with small computations. We present a technique, called QaaD, that addresses this problem fundamentally by applying i) transparent conversion of workloads made of small queries into one with large queries and ii) dynamic partition size adjustment for runtime overhead minimization. |
Yeonsu Park; Byungchul Tak; Wook-Shin Han; |
| 133 | Exploratory Training: When Annotators Learn About Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build theoretical underpinnings and design algorithms to develop systems that collaborate with users to learn the target model accurately and efficiently. |
Rajesh Shrestha; Omeed Habibelahian; Arash Termehchy; Paolo Papotti; |
| 134 | Predicate Pushdown for Data Science Pipelines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present MagicPush, which decides predicate pushdown using a search-verification approach.MagicPush searches for candidate predicates on pipeline input, which is often not the same as the predicate to be pushed down, and verifies that the pushdown does not change pipeline output with full correctness guarantees. |
Cong Yan; Yin Lin; Yeye He; |
| 135 | High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To speed it up, we propose a randomized algorithm named ADSampling which runs in logarithmic time wrt the dimensionality for the majority of DCOs and succeeds with high probability. |
Jianyang Gao; Cheng Long; |
| 136 | Hereditary Cohesive Subgraphs Enumeration on Bipartite Graphs: The Power of Pivot-based Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the problem of mining cohesive subgraphs from a bipartite graph that satisfy a hereditary property. |
Qiangqiang Dai; Rong-Hua Li; Xiaowei Ye; Meihao Liao; Weipeng Zhang; Guoren Wang; |
| 137 | Updatable Learned Indexes Meet Disk-Resident DBMS – From Evaluations to Design Choices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although many updatable learned indexes have been proposed in recent years, whether they can outperform traditional approaches on disk remains unknown. In this study, we revisit and implement four state-of-the-art updatable learned indexes on disk, and compare them against the B+-tree under a wide range of settings. |
Hai Lan; Zhifeng Bao; J. Shane Culpepper; Renata Borovica-Gajic; |
| 138 | InfiniFilter: Expanding Filters to Infinity and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In many applications, however, the data size is not known in advance, requiring filters to dynamically expand. This paper shows that existing methods for expanding filters exhibit at least one of the following flaws: (1) they entail an expensive scan over the whole data set, (2) they require a lavish memory footprint, (3) their query, delete and/or insertion performance plummets, (4) their false positive rate skyrockets, and/or (5)~they cannot expand indefinitely.We introduce InfiniFilter, a new method for expanding filters that addresses these shortcomings. |
Niv Dayan; Ioana Bercea; Pedro Reviriego; Rasmus Pagh; |
| 139 | Shortest Paths Discovery in Uncertain Networks Via Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study finding the most probable shortest path which has the highest probability of being the shortest path between a given pair of nodes in an uncertain network. |
Shixun Huang; Zhifeng Bao; |
| 140 | PrivLava: Synthesizing Relational Data with Foreign Keys Under Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, it is challenging to extend the existing single-relation solutions to the case of multiple relations, because they are unable to model the complex correlations induced by the foreign keys. Therefore, multi-relational data synthesis with strong privacy guarantees is an open problem.In this paper, we address the above open problem by proposing PrivLava, the first solution for synthesizing relational data with foreign keys under differential privacy, a rigorous privacy framework widely adopted in both academia and industry. |
Kuntai Cai; Xiaokui Xiao; Graham Cormode; |
| 141 | Scalable and Efficient Full-Graph GNN Training for Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. |
Xinchen Wan; Kaiqiang Xu; Xudong Liao; Yilun Jin; Kai Chen; Xin Jin; |
| 142 | ML2DAC: Meta-Learning to Democratize AutoML for Clustering Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While experienced analysts might address these challenges using their domain knowledge and experience, especially novice analysts struggle with them. In this paper, we propose a new meta-learning approach to address these challenges. |
Dennis Treder-Tschechlov; Manuel Fritz; Holger Schwarz; Bernhard Mitschang; |
| 143 | OM3: An Ordered Multi-level Min-Max Representation for Interactive Progressive Visualization of Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel multi-level representation of time series called OM3 that facilitates efficient interactive progressive visualization of large data stored in a database and supports various interactions such as resizing, panning, zooming, and visual query. |
Yunhai Wang; Yuchun Wang; Xin Chen; Yue Zhao; Fan Zhang; Eugene Wu; Chi-Wing Fu; Xiaohui Yu; |
| 144 | Scapin: Scalable Graph Structure Perturbation By Augmented Influence Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Scapin, a data-driven methodology that opens up a new perspective by connecting graph structure perturbation for GNNs with augmented influence maximization-to either facilitate desirable spreads or curtail undesirable ones by adding or deleting a small set of edges. |
Yexin Wang; Zhi Yang; Junqi Liu; Wentao Zhang; Bin Cui; |
| 145 | Few-shot Text-to-SQL Translation Using Structure and Content Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, when there is limited training data on new datasets, existing few-shot Text-to-SQL techniques, even with carefully designed textual prompts on pre-trained language models (PLMs), tend to be ineffective. In this paper, we present a divide-and-conquer framework to better support few-shot Text-to-SQL translation, which divides Text-to-SQL translation into two stages (or sub-tasks), such that each sub-task is simpler to be tackled. |
Zihui Gu; Ju Fan; Nan Tang; Lei Cao; Bowen Jia; Sam Madden; Xiaoyong Du; |
| 146 | Detock: High Performance Multi-region Transactions at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the design of concurrency control and deadlock resolution protocols, built within a practical, complete implementation of a geographically replicated database system called Detock, that enables processing strictly-serializable multi-region transactions with near-zero performance degradation at extremely high conflict and order of magnitude higher throughput relative to state-of-the art geo-replication approaches, while improving latency by up to a factor of 5. |
Cuong D. T. Nguyen; Johann K. Miller; Daniel J. Abadi; |
| 147 | Measuring Re-identification Risk Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. |
CJ Carey; Travis Dick; Alessandro Epasto; Adel Javanmard; Josh Karlin; Shankar Kumar; Andres Mu\~{n}oz Medina; Vahab Mirrokni; Gabriel Henrique Nunes; Sergei Vassilvitskii; Peilin Zhong; |
| 148 | Free Join: Unifying Worst-Case Optimal and Traditional Joins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new framework, called Free Join, that unifies the two paradigms. |
Yisu Remy Wang; Max Willsey; Dan Suciu; |
| 149 | Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We generalize the bulk-synchronous parallel (BSP) processing model to make it better support agent-based simulations. |
Zilu Tian; Peter Lindner; Markus Nissl; Christoph Koch; Val Tannen; |
| 150 | Exploiting Structure in Regular Expression Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents BLARE, Blazingly Fast Regular Expression, a regular expression matching framework that is inspired by the mechanisms that are used in database engines, which use a declarative framework to explore multiple equivalent execution plans, all of which produce the correct final result. |
Ling Zhang; Shaleen Deep; Avrilia Floratou; Anja Gruenheid; Jignesh M. Patel; Yiwen Zhu; |
| 151 | Computing The Difference of Conjunctive Queries Efficiently Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new approach by exploiting the structural property of input queries and rewriting the original query by pushing the difference operator down as much as possible. |
Xiao Hu; Qichen Wang; |
| 152 | Global and Local Differentially Private Release of Count-Weighted Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge the gap between DP and count-weighted graph data release, considering both graph structure and edge weights as private information. |
Felipe T. Brito; Victor A. E. Farias; Cheryl Flynn; Subhabrata Majumdar; Javam C. Machado; Divesh Srivastava; |
| 153 | QHL: A Fast Algorithm for Exact Constrained Shortest Path Search on Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose by far the fastest algorithm called QHL, which fully utilizes the pruning power of the CSP query information. |
Libin Wang; Raymond Chi-Wing Wong; |
| 154 | XInsight: EXplainable Data Analysis Through The Lens of Causality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study promotes a transparent and explicable perspective on data analysis, called eXplainable Data Analysis (XDA). For this reason, we present XInsight, a general framework for XDA. |
Pingchuan Ma; Rui Ding; Shuai Wang; Shi Han; Dongmei Zhang; |
| 155 | GoodCore: Data-effective and Data-efficient Machine Learning Through Coreset Selection Over Incomplete Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose the GoodCore framework towards selecting a good coreset over incomplete data with low cost. |
Chengliang Chai; Jiabin Liu; Nan Tang; Ju Fan; Dongjing Miao; Jiayi Wang; Yuyu Luo; Guoliang Li; |
| 156 | Incentive-Aware Decentralized Data Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The absence of centralized parameter servers further exacerbates the problem of evaluating the contribution of each individual party. Therefore, an effective incentive mechanism is essential to promote data collaboration.In this paper, we propose a novel Incentive-aware Decentralized fEderated leArning (IDEA) framework for facilitating data collaboration. |
Yatong Wang; Yuncheng Wu; Xincheng Chen; Gang Feng; Beng Chin Ooi; |
| 157 | Deep Active Alignment of Knowledge Graph Entities and Schemata Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new KG alignment approach, called DAAKG, based on deep learning and active learning. |
Jiacheng Huang; Zequn Sun; Qijin Chen; Xiaozhou Xu; Weijun Ren; Wei Hu; |
| 158 | Efficient Personalized PageRank Computation: The Power of Variance-Reduced Monte Carlo Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Monte Carlo sampling procedure, however, often has a relatively-large variance, thus reducing the performance of the PPR computation algorithms. To overcome this issue, we develop two novel variance-reduced Monte Carlo techniques for PPR computation. |
Meihao Liao; Rong-Hua Li; Qiangqiang Dai; Hongyang Chen; Hongchao Qin; Guoren Wang; |
| 159 | Using Cloud Functions As Accelerator for Elastic Data Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the query engine, we propose several optimizations to improve the performance and scalability of the CF-based operators and a cost-based optimizer to select the appropriate algorithm and parallelism for the physical query plan. |
Haoqiong Bian; Tiannan Sha; Anastasia Ailamaki; |
| 160 | Data Stream Clustering: An In-depth Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, it is difficult for researchers to improve upon the state-of-the-art. In this paper, we conduct such a study of DSC on its four key design aspects. |
Xin Wang; Zhengru Wang; Zhenyu Wu; Shuhao Zhang; Xuanhua Shi; Li Lu; |
| 161 | EARLY: Efficient and Reliable Graph Neural Network for Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient and reliable graph neural network, namely EARLY, to update node representations for dynamic graphs. |
Haoyang Li; Lei Chen; |
| 162 | Popularity Ratio Maximization: Surpassing Competitors Through Influence Propagation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an algorithmic study on how to surpass competitors in popularity by strategic promotions in social networks. |
Hao Liao; Sheng Bi; Jiao Wu; Wei Zhang; Mingyang Zhou; Rui Mao; Wei Chen; |
| 163 | FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Embedding-based deep recommendation models (EDRMs), which contain small dense models and large embedding tables, are widely used in industry. Embedding communication constitutes the main cost for the distributed training of EDRMs, and thus we propose two strategies to improve its efficiency, i.e.,embedding tiering andpre-fetching. |
Kaihao Ma; Xiao Yan; Zhenkun Cai; Yuzhen Huang; Yidi Wu; James Cheng; |
| 164 | DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with The GPU Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Researchers have proposed several solutions to accelerate the mini-batch generation, however, they (1) fail to exploit the locality of the adjacency matrix, (2) cannot fully utilize the GPU memory, and (3) suffer from the poor adaptability to diverse workloads. In this work, we propose DUCATI, aDual-Cache system to overcome these drawbacks. |
Xin Zhang; Yanyan Shen; Yingxia Shao; Lei Chen; |
| 165 | GuP: Fast Subgraph Matching By Guard-based Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose GuP, a subgraph matching algorithm with pruning based on guards. |
Junya Arai; Yasuhiro Fujiwara; Makoto Onizuka; |
| 166 | DeltaBoost: Gradient Boosting Decision Trees with Efficient Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As machine learning (ML) has been widely developed in real-world applications, the privacy of ML models draws an increasing concern. In this paper, we study how to forget specific data records from ML models to preserve the privacy of these data. |
Zhaomin Wu; Junhui Zhu; Qinbin Li; Bingsheng He; |
| 167 | Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm based on graph decomposition, and it is likely to give a solution that is at least 0.8 of the optimal density in our experiments, while the state-of-the-art method can only ensure a solution with density at least 0.5 of the optimal density. |
Yichen Xu; Chenhao Ma; Yixiang Fang; Zhifeng Bao; |
| 168 | On Querying Connected Components in Large Temporal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we introduce the concepts of window-CCs and window-SCCs on undirected and directed temporal graphs, respectively. |
Haoxuan Xie; Yixiang Fang; Yuyang Xia; Wensheng Luo; Chenhao Ma; |
| 169 | LightTS: Lightweight Time Series Classification with Adaptive Ensemble Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extend the applicability of ensemble learning, we propose the LightTS framework that compresses large ensembles into lightweight models while ensuring competitive accuracy. |
David Campos; Miao Zhang; Bin Yang; Tung Kieu; Chenjuan Guo; Christian S. Jensen; |
| 170 | Data-Sharing Markets: Model, Protocol, and Algorithms to Incentivize The Formation of Data-Sharing Consortia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, many opportunities to create valuable data-sharing consortia never materialize, and the value of data remains locked.We introduce a new sharing model, market protocol, and algorithms to incentivize the creation of data-sharing markets. The combined contributions of this paper, which we call DSC, incentivize the creation of data-sharing markets that unleash the value of data for its participants. |
Raul Castro Fernandez; |
| 171 | Ghost: A General Framework for High-Performance Online Similarity Queries Over Distributed Trajectory Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we describe Ghost, a distributed stream processing framework that enables generic, efficient, and scalable online trajectory similarity search and join.We propose a novel incremental online similarity computation (IOSC) mechanism to accelerate pair-wise streaming trajectory distance calculation, which supports a broad range of trajectory distance metrics. |
Ziquan Fang; Shenghao Gong; Lu Chen; Jiachen Xu; Yunjun Gao; Christian S. Jensen; |
| 172 | LAQy: Efficient and Reusable Query Approximations Via Lazy Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show the main parameters that affect the sample creation time and propose lazy sampling to overcome the unpredictability issues that cause fast-but-specialized samples to be query-specific. |
Viktor Sanca; Periklis Chrysogelos; Anastasia Ailamaki; |
| 173 | Mitigating Filter Bubbles Under A Competitive Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate an optimization problem for mitigating filter bubbles under our model. |
Prithu Banerjee; Wei Chen; Laks V.S. Lakshmanan; |
| 174 | SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the existing works rely on some unrealistic pre-settings to capture spatial correlations, which limits their performance in real scenarios. To tackle this issue, we propose the SSIN, which is a novel data-driven self-supervised learning framework for rainfall spatial interpolation by mining latent spatial patterns from historical observation data. |
Jia Li; Yanyan Shen; Lei Chen; Charles Wang Wai Ng; |
| 175 | Maestro: Automatic Generation of Comprehensive Benchmarks for Question Answering Over Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Maestro, a benchmark generation system for question answering over knowledge graphs. |
Abdelghny Orogat; Ahmed El-Roby; |
| 176 | Selection Pushdown in Column Stores Using Bit Manipulation Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generic predicate pushdown approach that supports arbitrary predicates by leveraging selection pushdown to reduce decoding costs. |
Yinan Li; Jianan Lu; Badrish Chandramouli; |
| 177 | Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to evaluate how many generated texts have near-duplicates (e.g., only differ by a couple of tokens out of 100) in the training corpus. |
Zhencan Peng; Zhizhi Wang; Dong Deng; |
| 178 | Query-Guided Resolution in Uncertain Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework for uncertain data management. |
Osnat Drien; Matanya Freiman; Antoine Amarilli; Yael Amsterdamer; |
| 179 | Efficient GPU-Accelerated Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the effectiveness of current GPU-based filtering and ordering methods is limited, and the result enumeration often runs out of memory quickly. To address these problems, we propose EGSM, an efficient approach to GPU-based subgraph matching. |
Xibo Sun; Qiong Luo; |
| 180 | Hamming Tree: The Case for Energy-Aware Indexing for NVMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In Hamming Tree, we propose a software-level memory-aware solution that picks the memory segment of where a write operation is applied judiciously to minimize bit flipping. |
Saeed Kargar; Faisal Nawab; |
| 181 | DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning Over Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often use a restricted search space of data preprocessing pipelines which limits the potential performance gains, and they are often too slow as they require training the ML model multiple times. In this paper, we propose DiffPrep, a method that can automatically and efficiently search for a data preprocessing pipeline for a given tabular dataset and a differentiable ML model such that the performance of the ML model is maximized. |
Peng Li; Zhiyi Chen; Xu Chu; Kexin Rong; |
| 182 | GraphINC: Graph Pattern Mining at Network Speed Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper takes a diametrically opposite approach: we suggest a framework that concentrates rather than divides the skewed areas.Our framework, called GraphINC, relies on two key innovations. |
Rana Hussein; Alberto Lerner; Andre Ryser; Lucas David B\{u}rgi; Albert Blarer; Philippe Cudre-Mauroux; |
| 183 | Efficient Query Re-optimization with Judicious Subquery Selections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify a key weakness in existing re-optimization algorithms: their subquery division and re-optimization trigger strategies rely heavily on the optimizer’s initial plan, which can be far away from optimal. We, therefore, propose QuerySplit, a novel re-optimization algorithm that skips the potentially misleading global plan and instead generates subqueries directly from the logical plan as the basic re-optimization units. |
Junyi Zhao; Huanchen Zhang; Yihan Gao; |
| 184 | A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a decision is difficult to make since the distribution of the reward (i.e., performance improvement) corresponding to each agent is unknown and non-stationary. In this paper, we study the above question and present a unified coordinating framework to efficiently utilize existing ML-based agents. |
Xinyi Zhang; Zhuo Chang; Hong Wu; Yang Li; Jia Chen; Jian Tan; Feifei Li; Bin Cui; |
| 185 | WISK: A Workload-aware Learned Index for Spatial Keyword Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. |
Yufan Sheng; Xin Cao; Yixiang Fang; Kaiqi Zhao; Jianzhong Qi; Gao Cong; Wenjie Zhang; |
| 186 | DAMR: Dynamic Adjacency Matrix Representation Learning for Multivariate Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to capturing the dynamics of spatial correlations between geographical locations as a composition of the constant, long-term trends and periodic patterns. |
Xiaobin Ren; Kaiqi Zhao; Patricia J. Riddle; Katerina Taskova; Qingyi Pan; Lianyan Li; |
| 187 | Presto: A Decade of SQL Analytics at Meta Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss several successful evolutions in recent years that have improved Presto latency as well as scalability by several orders of magnitude in production at Meta. |
Yutian Sun; Tim Meehan; Rebecca Schlussel; Wenlei Xie; Masha Basmanova; Orri Erling; Andrii Rosa; Shixuan Fan; Rongrong Zhong; Arun Thirupathi; Nikhil Collooru; Ke Wang; Sameer Agarwal; Arjun Gupta; Dionysios Logothetis; Kostas Xirogiannopoulos; Amit Dutta; Varun Gajjala; Rohit Jain; Ajay Palakuzhy; Prithvi Pandian; Sergey Pershin; Abhisek Saikia; Pranjal Shankhdhar; Neerad Somanchi; Swapnil Tailor; Jialiang Tan; Sreeni Viswanadha; Zac Wen; Biswapesh Chattopadhyay; Bin Fan; Deepak Majeti; Aditi Pandit; |
| 188 | Keep Your Distributed Data Warehouse Consistent at A Minimal Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large data warehouses store interdependent tables that are updated independently in response to business logic changes or late arrival of critical data. To keep the warehouse … |
Zhichen Xu; Ying Gao; Andrew Davidson; |
| 189 | GeaFlow: A Graph Extended and Accelerated Dataflow System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose new state backends and streaming operators that facilitate processing on dynamic graph-structured datasets, reducing space consumed by states. |
Zhenxuan Pan; Tao Wu; Qingwen Zhao; Qiang Zhou; Zhiwei Peng; Jiefeng Li; Qi Zhang; Guanyu Feng; Xiaowei Zhu; |
| 190 | Disaggregating RocksDB: A Production Experience Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extended RocksDB [26], a widely used open-source storage engine designed and built for local SSDs, to leverage disaggregated storage. |
Siying Dong; Shiva Shankar P; Satadru Pan; Anand Ananthabhotla; Dhanabal Ekambaram; Abhinav Sharma; Shobhit Dayal; Nishant Vinaybhai Parikh; Yanqin Jin; Albert Kim; Sushil Patil; Jay Zhuang; Sam Dunster; Akanksha Mahajan; Anirudh Chelluri; Chaitanya Datye; Lucas Vasconcelos Santana; Nitin Garg; Omkar Gawde; |
| 191 | GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have applied GoldMiner to industrial workloads, and our evaluation shows that GoldMiner can transform unmodified training programs to use data workers, accelerating individual training jobs by up to 12.1x. GoldMiner also improves average job completion time and aggregate GPU utilization by up to 2.5x and 2.1x in a 64-GPU cluster, respectively, by scheduling data workers with elasticity. |
Hanyu Zhao; Zhi Yang; Yu Cheng; Chao Tian; Shiru Ren; Wencong Xiao; Man Yuan; Langshi Chen; Kaibo Liu; Yang Zhang; Yong Li; Wei Lin; |
| 192 | VeDB: A Software and Hardware Enabled Trusted Relational Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As opposed to conventional ledger DBMSes, we design VeDB – a high-performance verifiable software (Ve-S) and hardware (Ve-H) enabled DBMS with rigorous auditability for better user options and broad applications. |
Xinying Yang; Ruide Zhang; Cong Yue; Yang Liu; Beng Chin Ooi; Qun Gao; Yuan Zhang; Hao Yang; |
| 193 | Apache IoTDB: A Time Series Database for IoT Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a time series database management system, Apache IoTDB. |
Chen Wang; Jialin Qiao; Xiangdong Huang; Shaoxu Song; Haonan Hou; Tian Jiang; Lei Rui; Jianmin Wang; Jiaguang Sun; |
| 194 | What’s The Difference? Incremental Processing with Change Queries in Snowflake Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: CHANGES queries and STREAMs have been in use within Snowflake for three years, and see broad adoption across our customers. We describe the semantics of these primitives, discuss the implementation challenges, present an analysis of their usage at Snowflake, and contrast with other offerings. |
Tyler Akidau; Paul Barbier; Istvan Cseri; Fabian Hueske; Tyler Jones; Sasha Lionheart; Daniel Mills; Dzmitry Pauliukevich; Lukas Probst; Niklas Semmler; Dan Sotolongo; Boyuan Zhang; |
| 195 | High-Throughput Vector Similarity Search in Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). |
Jason Mohoney; Anil Pacaci; Shihabur Rahman Chowdhury; Ali Mousavi; Ihab F. Ilyas; Umar Farooq Minhas; Jeffrey Pound; Theodoros Rekatsinas; |
| 196 | PG-Schema: Schemas for Property Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. |
Renzo Angles; Angela Bonifati; Stefania Dumbrava; George Fletcher; Alastair Green; Jan Hidders; Bei Li; Leonid Libkin; Victor Marsault; Wim Martens; Filip Murlak; Stefan Plantikow; Ognjen Savkovic; Michael Schmidt; Juan Sequeda; Slawek Staworko; Dominik Tomaszuk; Hannes Voigt; Domagoj Vrgoc; Mingxi Wu; Dusan Zivkovic; |
| 197 | PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, driven by the increasing connectivity between data generation and analysis, users prefer a single database to efficiently process both OLTP and OLAP workloads, which enhances data freshness and reduces the complexity of data synchronization and the overall business cost.In this paper, we summarize five crucial design goals for a cloud-native HTAP database based on our experience and customers’ feedback, i.e., transparency, competitive OLAP performance, minimal perturbation on OLTP workloads, high data freshness, and excellent resource elasticity. As our solution to realize these goals, we present PolarDB-IMCI, a cloud-native HTAP database system designed and deployed at Alibaba Cloud. |
Jianying Wang; Tongliang Li; Haoze Song; Xinjun Yang; Wenchao Zhou; Feifei Li; Baoyue Yan; Qianqian Wu; Yukun Liang; ChengJun Ying; Yujie Wang; Baokai Chen; Chang Cai; Yubin Ruan; Xiaoyi Weng; Shibin Chen; Liang Yin; Chengzhong Yang; Xin Cai; Hongyan Xing; Nanlong Yu; Xiaofei Chen; Dapeng Huang; Jianling Sun; |
| 198 | Vineyard: Optimizing Data Sharing in Data-Intensive Analytics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When the intermediate data is large, it is mostly exchanged through files in standard formats (e.g., CSV and ORC), causing high I/O and (de)serialization overheads. To solve these problems, we develop Vineyard, a high-performance, extensible, and cloud-native object store, trying to provide an intuitive experience for users to share data across systems in complex real-life workflows. |
Wenyuan Yu; Tao He; Lei Wang; Ke Meng; Ye Cao; Diwen Zhu; Sanhong Li; Jingren Zhou; |
| 199 | Steered Training Data Generation for Learned Semantic Type Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce STEER to adapt learned semantic type extraction approaches to a new, unseen data lake. |
Sven Langenecker; Christoph Sturm; Christian Schalles Schalles; Carsten Binnig; |
| 200 | When Automatic Filtering Comes to The Rescue: Pre-Computing Company Competitor Pairs in Owler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents CPFilter, a system used in the filtering process of MW_CPFilter. |
Jinsong Guo; Aditya Jami; Markus Kr\{o}ll; Lukas Schweizer; Sergey Paramonov; Eric Aichinger; Stefano Sferrazza; Mattia Scaccia; St\'{e}phane Reissfelder; Eda Cicek; Giovanni Grasso; Georg Gottlob; |