Dynamic Max Count

This contains the ideas and notes for a Dynamic Max Count (Dynamic Max-in-time) aggregate operator

Concept

Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to $$10~unit^6$$ size in a $$10000~unit^6$$ space we will have $$1\times10^{18}$$ buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index.

Here is a description of the index:

  1. Index defines
    1. Spatial dimensions
    2. Bucket Dimensions
    3. Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
    4. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. $$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$
  2. A theory to update, delete or insert points and the distributions based on changes to points.

The above has been implemented in C# 8/15/2005.

Thus we maintain a database of 4-dimensional points that we index using 6-dimensional, probability buckets.

Dynamic Max-Count

The dynamic property of Max-Count is based on the dynamic index structure described above and in my paper (not currently online).

Given two query points that describe a lower and upper moving query hyper-rectangle, we first create a time partition ordering of indicating the time each hyper-bucket enters and leaves the query rectangle. This is done in two stages.

  1. Entry, exit and valid function time segments are determined for each bucket.
    1. Each bucket's corner verticies are checked for intesection time segments $$[entry,exit]$$ by solving a linear equation between the query points and the corner point.
    2. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment $$[Entry_{bucket},Exit_{bucket}]$$ by intersecting the time segments.
    3. Add $$Entry_{bucket}$$, $$Exit_{bucket}$$ and all times t such that $$Entry_{bucket} \leq t \leq Exit_{bucket}$$ to a list to form the valid function time segments for this bucket.

  2. Add the valid function time segments entry and exit values to the time partition order list with a tag indicating that at this time hyper-bucket ID is entering or leaving the query space.

  3. Create a valid hyper-bucket list and add or delete hyper-buckets to it as you traverse the time partition order list. For each partition add to the parition function the valid hyper-bucket functions.
  4. Traverse the list again maximizing each partition and keeping track of the maximum.
  5. return the maximum.

Density Functions

Essentially we are going to have a constant $c$ that is a multiplied by function f which is made up of $$x[j]$$ and $$a[j]$$. Thus we have the density function F as follows:

$$$F = c(a[0] x_0 + b[0])(a[1] x_1 + b[1])...(a[5] x_5 + b[5])$$$

In the Journal paper the Density Function is given by the following:

$$$F_i = \frac{b_i f_i}{n \mathop{\displaystyle\int}\limits_{B_i} f_i d \phi}$$$

$$f_i$$ is defined as follows:

$$$f_i = \mathop{\displaystyle\prod}\limits_j (f_{i,j} + p_i)$$$

$$f_{i,j}$$ is a trend function which we will implement as a linear function:

$$$f_{i,j} = a x_j + c_j$$$

where $$c_j$$ is a constant. Thus the form of the Density function is given by (2).

DynamicMaxCount (last edited 2020-01-23 22:27:01 by scot)