Skip to content

Commit

Permalink
add one paper about Query Optimization Overview for May
Browse files Browse the repository at this point in the history
  • Loading branch information
paul356 committed May 31, 2024
1 parent b1e5427 commit fdca448
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 19 deletions.
18 changes: 9 additions & 9 deletions _org/2024-05-17-may-papers.org
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ nav_order: {{ page.date }}
---
#+END_EXPORT

|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------|
| Title | Authors | Synthesis | Publisher | Keywords |
|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------|
| The R*-tree: An Efficient and Robust AccessMethod for Points and Rectangles+ | Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger | R-Tree is a popular tree structure for managing spatial shapes. In the origional paper [[http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf][Gut84]]. The origion algorithm is framed that minimum area increase is set as the only criteria. But it is showed in some cases the origional algorithm will generate bad results. Reconsider the criterias of a R-Tree with optimal retrieval performance this paper introduces a new algorithm for steps ChooseSubTree and QudraticSplit. The result show it can improve the retrieval performance and robustness at the cost of slightly increasing the insert cost. | SIGMOD 90 | R-Tree |
| ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads | Pengfei Li, Wenqing Wei, Rong Zhu, Bolin Ding, Jingren Zhou, Hua Lu | ALECE is another learned based cardinality estimator which learns from true cardinalities. It takes featurized data distribution and queries as input. With two attention structures, one self attention for data features and one cross attention between data features and query features, it can achieve much better estimate than competitors. | VLDB 2023 | Cardinality Estimation, Attention |
| The Transaction Concept: Virtues and Limitations | Jim Gray | This technical report introduces where transactions evolves from. It also introduces the general transaction model, and how to implement transactions. Two methods are time-domain addressing and logging and locking. It also discusses some of the issues with current transaction implementations, for example, nested transactions and long living transactions. | Tandem TR 81.3 | Transaction, Time-Domain Address, Logging and Locking |
| A Critique of ANSI SQL Isolation Levels | Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, Patrick O'Neil | This technical report redefine the isolation levels defined by ANSI SQL standards. It add the missing phenomena *Dirty Write* and fix the loose phenomena defintions. Result is a new table for isolation levels, Read Uncommitted, Read Committed, Repeatable Read, Serializable. Plus it also introduce the other isolation levels, and how to achieve Serializable with Snapshot Isolation + First Committer Wins strategy. | SIGMOD 1995 | Isolation Level, Snapshot Isolation, First Committer Wins |
| | | | | |
|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------|
|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------|
| Title | Authors | Synthesis | Publisher | Keywords |
|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------|
| The R*-tree: An Efficient and Robust AccessMethod for Points and Rectangles+ | Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger | R-Tree is a popular tree structure for managing spatial shapes. In the origional paper [[http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf][Gut84]]. The origion algorithm is framed that minimum area increase is set as the only criteria. But it is showed in some cases the origional algorithm will generate bad results. Reconsider the criterias of a R-Tree with optimal retrieval performance this paper introduces a new algorithm for steps ChooseSubTree and QudraticSplit. The result show it can improve the retrieval performance and robustness at the cost of slightly increasing the insert cost. | SIGMOD 90 | R-Tree |
| ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads | Pengfei Li, Wenqing Wei, Rong Zhu, Bolin Ding, Jingren Zhou, Hua Lu | ALECE is another learned based cardinality estimator which learns from true cardinalities. It takes featurized data distribution and queries as input. With two attention structures, one self attention for data features and one cross attention between data features and query features, it can achieve much better estimate than competitors. | VLDB 2023 | Cardinality Estimation, Attention |
| The Transaction Concept: Virtues and Limitations | Jim Gray | This technical report introduces where transactions evolves from. It also introduces the general transaction model, and how to implement transactions. Two methods are time-domain addressing and logging and locking. It also discusses some of the issues with current transaction implementations, for example, nested transactions and long living transactions. | Tandem TR 81.3 | Transaction, Time-Domain Address, Logging and Locking |
| A Critique of ANSI SQL Isolation Levels | Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, Patrick O'Neil | This technical report redefine the isolation levels defined by ANSI SQL standards. It add the missing phenomena *Dirty Write* and fix the loose phenomena defintions. Result is a new table for isolation levels, Read Uncommitted, Read Committed, Repeatable Read, Serializable. Plus it also introduce the other isolation levels, and how to achieve Serializable with Snapshot Isolation + First Committer Wins strategy. | SIGMOD 1995 | Isolation Level, Snapshot Isolation, First Committer Wins |
| An Overview of Query Optimization in Relational Systems | Surajit Chaudhuri | This paper extensively review the optimization problems in database research. It first starts with an example of System R optimizer. It then covers the optimization space issues, including Join Reorder, Pull Up and Push Down of Group By, Merging Views and Sub-queries. It then discuss the issues about Statistics Collecting and Cost Estimation. It then gives two example enumeration architectures, Starbust and Volcano. Lastly it briefly metions the challenges with distributed systems, UDF and Materialized Views. | PODS 98 | Query Optimization, Query Plan Search, Cost Estimation, Select Project Join |
|----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------------------------------------------------------------------------|
4 changes: 2 additions & 2 deletions _posts/2022-12-31-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ SparkCatalog还依赖一些其他类实现其功能
## spark写流程

- AppendData –(ExtendedV2Writes strategy)–> AppendData(write = SparkWriteBuilder.build())
- Spark –> WriteToDataSourceV2Exec.run() –> V2TableWriteExec.writeWithV2
- Spark –> AppendDataExec.run() –> V2TableWriteExec.writeWithV2
- SparkWrite.asBatchAppend()
- BaseWrite.createBatchWriterFactory()
- BatchWrite.createBatchWriterFactory()
- WriterFactory.createWriter()
- DataWriter<InternalRow>.write(row)
- UnpartitionedDataWriter.write(row)
Expand Down
4 changes: 2 additions & 2 deletions _posts/2024-04-22-april-papers.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ tags: [Hybrid Logical Clock]
nav_order: {{ page.date }}
---

| Title | Authors | Synthesis | Publisher | Keywords |
| Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases | Sandeep Kulkarni , Murat Demirbas , Deepak Madeppa , Bharadwaj Avva, and Marcelo Leone | Logical clock is proposed by Lamport to ordering events in a distributed system, but it is not possible to relate logical clock to physical time. HLS is proposed to combine the advantage of logical clock and physical time, and it don't need the special hardware used in TrueTime and also have low space overhead. It builds on NTP and logical clock. It can be deployed in the cloud environment or private data centers which have NTP servers. It can be used to create consistent snapshots in distributed systems. | OPODIS 2014 | Logical Physical Clock, Logical Clock |
| Title | Authors | Synthesis | Publisher | Keywords |
| Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases | Sandeep Kulkarni , Murat Demirbas , Deepak Madeppa , Bharadwaj Avva, and Marcelo Leone | Logical clock is proposed by Lamport to ordering events in a distributed system, but it is not possible to relate logical clock to physical time. HLS is proposed to combine the advantage of logical clock and physical time, and it need not the special hardware used in TrueTime and also have low space overhead. It builds on NTP and logical clock. It can be deployed in the cloud environment or private data centers which have NTP servers. It can be used to create consistent snapshots in distributed systems. | OPODIS 2014 | Logical Physical Clock, Logical Clock |
Loading

0 comments on commit fdca448

Please sign in to comment.