Sliding windows are similar to Fixed windows, the size of the windows is measured in time and is fixed. The important difference from the Fixed window is the fact that it allows an element to be present in more than one window. The additional window slide parameter controls how frequently a sliding window is started. Hence, sliding windows will be overlapping and the slide should be smaller than the window length.
vertices: - name: my-udf udf: groupBy: window: sliding: length: duration slide: duration
NOTE: A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
length is the window size of the fixed window.
slide is the slide parameter that controls the frequency at which the sliding window is created.
To create a sliding window of length 1 minute which slides every 10 seconds, we can use the following snippet.
vertices: - name: my-udf udf: groupBy: window: sliding: length: 60s slide: 10s
The yaml snippet above contains an example spec of a reduce vertex that uses sliding window aggregation. As we can see, the length of the window is 60s and sliding frequency is once every 10s. This means there will be multiple windows active at any point in time.
time.now() in the pipeline is
2031-09-29T18:46:30Z the active window boundaries will be as follows (there
are total of 6 windows
[2031-09-29T18:45:40Z, 2031-09-29T18:46:40Z) [2031-09-29T18:45:50Z, 2031-09-29T18:46:50Z) # notice the 10 sec shift from the above window [2031-09-29T18:46:00Z, 2031-09-29T18:47:00Z) [2031-09-29T18:46:10Z, 2031-09-29T18:47:10Z) [2031-09-29T18:46:20Z, 2031-09-29T18:47:20Z) [2031-09-29T18:46:30Z, 2031-09-29T18:47:30Z)
The window start time is always be left inclusive and right exclusive. That is why
window is not considered active (it fell on the previous window, right exclusive) but
is an active (left inclusive).
The first window always ends after the sliding seconds from the
time.Now(), the start time of the window will be the
nearest integer multiple of the slide which is less than the message's event time. So the first window starts in the
past and ends _sliding_duration (based on time progression in the pipeline and not the wall time) from present. It is
important to note that regardless of the window boundary (starting in the past or ending in the future) the target element
set totally depends on the matching time (in case of event time, all the elements with the time that falls with in the
boundaries of the window, and in case of system time, all the elements that arrive from the
present until the end of
present + sliding)
From the point above, it follows then that immediately upon startup, for the first window, fewer elements may get aggregated depending on the current lateness of the data stream.