Pipeline¶
The Pipeline
represents a data processing job (a simpler version of this is called MonoVertex). The
most important concept in Numaflow, it defines:
- A list of vertices, which define the data processing tasks;
- A list of
edges
, which are used to describe the relationship between the vertices. Note an edge may go from a vertex to multiple vertices, and an edge may also go from multiple vertices to a vertex. This many-to-one relationship is possible via Join and Cycles
The Pipeline
is abstracted as a Kubernetes Custom Resource. A Pipeline
spec looks like below.
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
name: simple-pipeline
spec:
vertices:
- name: in
source:
generator:
rpu: 5
duration: 1s
- name: cat
udf:
builtin:
name: cat
- name: out
sink:
log: {}
edges:
- from: in
to: cat
- from: cat
to: out
To query Pipeline
objects with kubectl
:
kubectl get pipeline # or "pl" as a short name