The curious case of Pagination for Gremlin queries

阿新 • • 發佈：2018-12-28

Why pagination support is hard for the TinkerPop Graph Databases?

Let’s first look at the general requirements for “pagination” support, and then we will analyze the difficulties faced by the graph databases to implement such support.

Pagination Requirement 1. Given a query, a client application may request the results, one page at a time. The most common definitions of a page are: (a) a fixed count of results, (b) a fixed size (in bytes) of the results. Also, note that, there is typically no time limit within which a subsequent page can be requested.

Pagination Requirement 2. The client application may request to skip multiple pages and fetch the following page.

Supporting Pagination Requirement 1: Fetch one page at a time

To return results one page at a time, a database needs to maintain some state, which can later be used to resume the computation for the next page.

In the relational world, such state management is relatively simple. For example, for “Select * from Table” queries, the state can be a pointer to the most recent row that has been returned to the client. With more complex SQL, the state can be a bit more elaborate, but nonetheless very small in size (think bytes). Also, because of the small size of the state, it’s much easier to reload the context and resume the computation for the next page.

However, for graph queries, the state that needs to be managed can be arbitrarily complex and large. This is primarily due to the unstructured and often iterative nature of the graph traversals. To understand this argument, let’s look at an example. Let’s say that the query we need to paginate is: g.V(A).out().out(), with a requested page size of 5.

The figure 1 and figure 2 show the parts of the graph that was visited during the computation of the 1st page and 2nd page respectively, and what state we needed to store for computing their following pages.

Figure 1: State of the graph when the 1st page was computed. Note that, all of C’s neighbors were visited before all of B’s neighbors. This indicates that B and C’s neighbors were accessed in parallel.

Figure 2: State of the graph when page 2 was generated. Note that, even though A, B, C, and D doesn’t have any more neighbors to visit, we sill need to keep a pointer for their next edges, for the computation of 3rd page. We could have removed those state, but for that, we needed to re-traverse the graph and check that they indeed didn’t have any more neighbors. So it doesn’t matter when we pay that cost, whether at the end of page 2 or at the beginning of page 3. The beginning of page 3 is preferred, as page 3 may not be fetched at all.

Some of the key observations here are:

Observation 1: The state grows linearly w.r.t. to the number of non-result vertices (or intermediate vertices) visited during a traversal. While this conclusion seems to be very specific to this example, I like to argue that it’s true in general as well. While I didn’t considered deriving a formal proof, the basic intuition lies in the following analogy.

In relational world, “tables are first class citizens” and queries are written by directly referring the tables. We can’t really refer a “row” or “column” directly, and they have to be accessed by applying filtering conditions on tables. This limits the amount of the state one needs to be maintained. For example, when we write a query joining two tables, it is enough to keep one pointer for each table to compute the next page of the join. In other word, the state is roughly proportional to the number of first class citizens referred in the query.

However, in the world of graph databases, “the vertices are first class citizens” (in TinkerPop, edges and properties are as well), and one can directly refer them in the graph queries. Now, analogous to the relational world, when we execute a graph traversal we are effectively conducting a complex join among the vertices that take part in the traversal. In the general case, to support pagination, we need to store some state for each of the participating first class citizen, as each of them may contribute to the final output during the computation of the next page.

When we execute a graph traversal we are effectively conducting a complex join among the vertices that take part in the traversal.

Observation 2: Generating the state for computing the next page is roughly equivalent to running the traversal again. During a graph traversal, when a page worth of results has been generated, we need to compute the state, which can then be be used in the next phase. However, generating the state is non-trivial. We would need to unwind the operator stack to see at state the participating vertices are. Not only that, the state information need to be mapped to the operators in the execution tree. This is because, depending on the complexity of the graph, same node can appear in the same state via multiple paths.

Observation 3: The example shown above is really a very basic graph query on a very structurally well-behaved graph (in this case a tree). Gremlin offers 60 odds different steps including advance constructs like repeat..until, choose (effectively if-then-else), order by, group by, map, fold etc. With all these steps the state management is going to get arbitrarily complex.

Observation 4: In this specific case, some optimization can be done to reduce the pagination state, but that won’t work in the general case. For the given query, we can perform a depth-first traversal and make sure that the edges of a vertices are fetched in a fixed order, to reduce the state. However, this approach will limit parallelism, and eliminate batching opportunities (i.e., executing same step() for multiple vertices at the same time).

While this discussion can go on longer, I think the points discussed so far provide enough material for us to think why supporting pagination in a graph database is a complex beast. More importantly, even if the graph databases supported pagination, it doesn’t seem to be case that it would have saved anything for the client application, both in terms of latency and cost.

Supporting Pagination Requirement 2: Skip pages

Needless to say that this requirement is even more demanding. In this case, keeping around a pointer to the last result is not sufficient. The database needs the ability to efficiently compute the start pointer of a future page.

Again for simple, “Select * from table” queries the computation involves calculating the number of rows that needs to skipped and how to get to the row that will need to be the part of the result.

For graph queries, the starting point of a future page is impossible to compute without actually executing the query and discarding all the results corresponding to the skipped pages. So, supporting this requirement for a graph database won’t save the client application anything.

This summarizes our discussion on the challenges for supporting pagination in graph databases.

PS: In this discussion, we are assuming that computing all the results of a query and storing the results for arbitrarily long period of time, is not a viable solution. Resource governance of supporting such an approach would be very cumbersome. Moreover, the customer would pay a lot of unnecessary cost, especially when only few among the large number of pages are fetched.

The curious case of Pagination for Gremlin queries

Why pagination support is hard for the TinkerPop Graph Databases?Let’s first look at the general requirements for “pagination” support, and then we will an

The curious case of the disappearing Polish S

We fixed this last week, and it wasn’t a lot of work. After seeing the Medium bug reports filed in, and having remembered once asking to change what happen

Curious Case of Actuarial Science, Geocoding, and Machine Learning

This article illustrates how Geocoding uncovers the untapped value within generally overlooked insurance categories, such as Life and Annuity, and how it c

The Practical Uses of AI for Procurement

One of the discussions at eWorld recently came from Julien Nadaud, Chief Product Officer at Determine; he talked about the practical implications of AI in

4 Ways to Leverage the Fast Growth of Cryptocurrencies for Success

4 Ways to Leverage the Fast Growth of Cryptocurrencies for SuccessHad you bought a single Bitcoin in 2011 for $100, it would be worth $6,166 today. Sorry,

The Daily Presence of AI for Marketers: Navigating Today and Tomorrow

We are surrounded by AI every day and don’t pay much attention to it. Our lives are automated more and more as months and years pass, but as marketers, we

The maximum number of apps for free development profiles has been reached

當使用自己的id賬號測試執行專案的時候，會出現這個錯誤免費應用程式除錯最大限度蘋果免費App ID只能執行2個應用程式,當除錯第三個的時候就會報這個錯誤,我們必須把之前的應用程式刪除,就可

Django URLs error: view must be a callable or a list/tuple in the case of include()

mean see ews patch port code eve con pattern Django 1.10 no longer allows you to specify views as a string (e.g. ‘myapp.views.home‘) in y

Geometric regularity criterion for NSE: the cross product of velocity and vorticity 1: $v imes om$

math blog lar suitable don table pro 證明 uitable 在 [Chae, Dongho. On the regularity conditions of suitable weak solutions of the 3D Navier

Version 1.7.0_80 of the JVM is not suitable for this product.Version: 1.8 or greater is required.

2種方法 mar 技術 bin 啟動 product 分享圖片 1.7 Eclipse啟動失敗，設置eclipse啟動jdk有2種方法第一種：直接安裝eclipse對應的jdk版本，並設置環境變量第二種：修改eclipse配置文件eclipse.ini

【15】ES6 for Humans: The Latest Standard of JavaScript: ES2015 and Beyond

amazon idt keywords order line padding star ise spa 【15】ES6 for Humans共148頁：目前看到：已經全部閱讀。亞馬遜地址：魔芋：總結：我先看的是阮一峰的在線書籍。這本書的內容很多都與之重復的。居然賣￥463

ACM--三角形重心--HDOJ 2105--The Center of Gravity(FOR JAVA)

Problem Description Everyone know the story that how Newton discovered the Universal Gravitation. One day, Newton walked leisurely, suddenly

Structural Features for Predicting the Linguistic Quality of Text: Applications to Machine Translation, Automatic Summarization and Human-Authored Tex

abstract句子結構是文字語言質量的關鍵，我們記錄了以下實驗結果：句法短語統計和其他結構特徵對文字方面的預測能力。手工評估的句子fluency流利度用於機器翻譯評估和文字摘要質量的評估是黃金準則。我們發現和短語長度相關的結構特徵是弱特徵，但是與fluency強相關，基於整個結構特徵的分類器可以在句子flu

The curious case of Pagination for Gremlin queries

Why pagination support is hard for the TinkerPop Graph Databases?

Supporting Pagination Requirement 1: Fetch one page at a time

Supporting Pagination Requirement 2: Skip pages

The curious case of Pagination for Gremlin queries

The curious case of the disappearing Polish S

Curious Case of Actuarial Science, Geocoding, and Machine Learning

The Practical Uses of AI for Procurement

4 Ways to Leverage the Fast Growth of Cryptocurrencies for Success

The Daily Presence of AI for Marketers: Navigating Today and Tomorrow

The maximum number of apps for free development profiles has been reached

Django URLs error: view must be a callable or a list/tuple in the case of include()

Geometric regularity criterion for NSE: the cross product of velocity and vorticity 1: $v imes om$

Version 1.7.0_80 of the JVM is not suitable for this product.Version: 1.8 or greater is required.

【15】ES6 for Humans: The Latest Standard of JavaScript: ES2015 and Beyond

ACM--三角形重心--HDOJ 2105--The Center of Gravity(FOR JAVA)

Structural Features for Predicting the Linguistic Quality of Text: Applications to Machine Translation, Automatic Summarization and Human-Authored Tex

《Active Convolution- Learning the Shape of Convolution for Image Classification》論文閱讀

Version 1.5 of the JVM is not suitable for this product.Version:1.6 or greater is required

【NDN IoT】Caching in Named Data Networking for the Wireless Internet of Things

簡潔版 CVPR-2017論文筆記《Active Convolution: Learning the Shape of Convolution for Image Classification》

論文筆記《Active Convolution: Learning the Shape of Convolution for Image Classification》

Embed,encode,attend,predict:the new deep learning formula for state-of-the -art NLP models

Plans for the Next Iteration of Vue.js

The curious case of Pagination for Gremlin queries

Why pagination support is hard for the TinkerPop Graph Databases?

Supporting Pagination Requirement 1: Fetch one page at a time

Supporting Pagination Requirement 2: Skip pages

相關推薦