Tuesday, April 17, 2018

AWS Step Functions Early Best Practice Learnings

Let's be honest. Anybody who has used AWS Simple Workflow knows there is nothing simple about it.  The "simple" part is that you don't have to maintain the state engine but it is so feature rich that actually using it can be tricky, which is why there are so many frameworks built on top of it.  (I know, I built my own).

AWS Step Functions is a welcome addition from Amazon that can be thought about as a watered down Simple Workflow that provides a flow language, which removes the need to create custom "deciders".

I've found it covers about 90% of what I need but I do miss some SWF features: N-branches, kicking off child-flows, etc.

One large feature I hope they add to Step Functions is a way to manipulate the output json before sending it to the next step in a process.

Best Practices:

  • Lambda functions should be written to be general, not specific to a single step function flow.
    •  Instead of writing an ExportS3ToCSVFile consider writing a more generic ExportS3ToFile lambda that is configurable for several output file types.
  • Lambda functions should, if possible, add result fields to the input json and output that json
  • Lambda functions should take any json shape as input as long as it's required fields are present.
    • Do not throw a validation error if extra fields are present in the input json.
Rationale:

It is much easier to chain lambdas within a step function flow if the above practices are followed.

Missing Functionality: Json Transformation

One large piece of the puzzle that AWS Step Functions currently does not provide is a way to transform the output json of one lambda before passing it into the next lambda step.

This is unfortunate because it forces you to write your lambdas to be workflow-specific, which violates the "write generalized lambdas" best practice.

Workaround

Step Function does provide an under-appreciated step type called "Pass".  Pass allows you to mock up an output and inject it anywhere in the input json doc.  This injection inspired me on how to create a workaround json transform lambda:


The gist of the idea (sorry) is that you use a Pass step to inject a "transformScript" field into the incoming json document.  That field contains all the transform code in plain javascript.   Then your step function flow calls the JsonTransform Lambda to act on that script.

For example I may have a lambda that returns: 

{
  "trace": "abc123",
  "field1": "value1",
  "field2": "value2"
}

and I want to add a third field that combines field1 and field2 and a dateUpdated field.  So I use a Pass step to inject a transformScript field that produces:

{
  "trace": "abc123",
  "field1": "value1",
  "field2": "value2",
  "transformScript": "event.field3=event.field1 + \" \" + event.field2; event.dateUpdated=new Date()"
}

Forwarding this to the JsonTranfrom lambda produces the expected json:

{
  "trace": "abc123",
  "field1": "value1",
  "field2": "value2",
  "field3": "value1 value2",
  "dateUpdated": "2018-03-19T21:20:08.571Z"
}

Downsides?

Some won't like the code smell of injecting JavaScript into their step-function flow.  My counter argument is:
  • The script is run inside a node.js sandbox so nothing too funky can happen
  • The transforming of json from the output of one lambda to fit the expected shape of the next lambda is flow code since that transform only matters to that particular step function flow.
Another downside is that it takes two step function steps to do a transformation.
  • A "Pass" step to inject the transform script
  • A "Task" step to run the JsonTransform lambda.
Currently there is no way around this.  One thing I do to keep things straight is name both the steps as:

"CreateFields" -> "CreateFields!"

Using the same name but adding a "!" to indicate the transform.  This makes it easier to visualize where transforms are happening in the flow.

Summary

I wrote this up quickly but hopefully it will help inspire more discussion on Step Function best practices and missing functionality Amazon may introduce in the future.  Step Functions are great but very limited in this first release.