Map UDF¶
Map in a Map vertex takes an input and returns 0, 1, or more outputs (also known as flat-map operation). Map is an element wise operator.
After building a docker image for the written UDF, specify the image as below in the vertex spec.
spec:
vertices:
- name: my-vertex
udf:
container:
image: my-python-udf-example:latest
Map supports three modes: Unary, Streaming, and Batch.
Unary Mode¶
Unary Map is the default mode where each input message is processed individually and returns 0, 1, or more outputs.
Check the links below to see the UDF examples for different languages.
Streaming Mode¶
In cases the map function generates more than one output (e.g., flat map), the UDF can be configured to run in a streaming mode where the messages will be pushed to the downstream vertices as soon as the output is generated instead of collecting all the responses and then sending them together at the end when the function returns.
Check the links below to see the UDF examples in streaming mode for different languages.
Batch Mode¶
BatchMap is an interface that allows developers to process multiple data items in a UDF single call, rather than each item in separate calls.
The BatchMap interface can be helpful in scenarios where performing operations on a group of data can be more efficient.
Important Considerations¶
When using BatchMap, there are a few important considerations to keep in mind:
- Ensure that the BatchResponses object is tagged with the correct request ID. Each Datum has a unique ID tag, which will be used by Numaflow to ensure correctness.
- Ensure that the length of the BatchResponses list is equal to the number of requests received. This means that for every input data item, there should be a corresponding response in the BatchResponses list.
- The total batch size can be up to
readBatchSizelong.
Check the links below to see the UDF examples in batch mode for different languages.
Available Environment Variables¶
Some environment variables are available in the user-defined function container, they might be useful in your own UDF implementation.
NUMAFLOW_NAMESPACE- Namespace.NUMAFLOW_POD- Pod name.NUMAFLOW_REPLICA- Replica index.NUMAFLOW_PIPELINE_NAME- Name of the pipeline.NUMAFLOW_VERTEX_NAME- Name of the vertex.
Configuration¶
To achieve ordering, please set readBatchSize to 1.
Configuration data can be provided to the UDF container at runtime multiple ways.
environment variablesargscommandvolumesinit containers