Throttling made easy- Back pressure in Akka streams.

Built-in protocol to attain overflow and underflow in fast data systems.

Big data is the buzz word all over lately, but Fast data is also gaining traction now days. if you are into Data Streaming , then you know it can be tedious if not done right and may result in data leaks / OutOfMemory exceptions. if you are building a service or product today, users are willing to pay high bucks to those who provide content with milliseconds latency.

Akka streams backpressure explained.

Akka Streams

is a streaming module which is a part of Akka toolkit, and is designed to work with data streams to achieve concurrency by leveraging Akka toolkit power without defining actor behaviors and methods explicitly and helps to conceal the abstraction by ignoring what is going under the hood and help you focus on the logic needed for business.Akka streams are compatible in handling almost all data sources and types present out there with its opensource module Alpakka .

when you think of a stream you picture a starting point from where the stream begins and an ending point where the stream ends.i want you to think about akka streams in a same way.

  • starting point of the stream is called — Publisher
  • ending point of the stream is called — Subscriber
Image for post

Akka streams uses operators as below to deal with ingestion, processing, transformation and storing of data.

  • Source — An operator with exactly one output, emitting data elements whenever downstream operators are ready to receive them.
  • Flow — An operator which has exactly one input and output, which connects its upstream and downstream by transforming the data elements flowing through it.
  • Sink — An operator with exactly one input, requesting and accepting data elements.
Image for post

lets write some code and break it into whats happening

import akka.stream.scaladsl.{Source, Flow, sink}

val source = Source(1 to 1000)
val multiply = Flow[Int].map(x => x * 10)
val sink = Sink.foreach[Int](println)

apply method of Source and Flow is as follows, Sink don’t have an apply methods

//Source
def apply[T](iterable: Iterable[T]): Source[T, NotUsed]
//Flow
def apply[T]: Flow[T, T, NotUsed]

here, our source emits values from 1 to 1000 in order and are transformed at the Flow operator by a map function in-turn printed to the console/terminal at the sink.this is quiet a small computation and will not result in latency issues or back pressure.

we can tie the operations with Akka’s rich library

source.via(multiply).to(sink).run()

you have create a graph here which is runnable.

A Flow that has both ends “attached” to a Source and Sink respectively is ready to be run() and is called a RunnableGraph.

How is this happening without you configuring and telling the program how to run the graph?

even after constructing the RunnableGraph by connecting all the source, sink and different operators, no data will flow through it.

this is where Materialization comes into action!

Stream Materialization

When constructing flows and graphs in Akka Streams think of them as preparing a blueprint/execution plan. Stream materialization is the process of taking a stream description and allocating all the necessary resources it needs in order to run. this means starting up Actors which power the processing, and much more under the hood depending on what the stream needs.

After running (materializing) the RunnableGraph we get back the materialized value of specified type. Every stream operator can produce a materialized value, and it is your responsibility to combine them to a new type. Akka has .toMat to indicate that we want to transform the materialized value of the source and sink, and you have convenience function (Keep.right/left/both/none) to say that we are only interested in the materialized value of the sink.

import akka.stream.scaladsl.Keep

//keep right
source.via(multiply).toMat(sink)(Keep.right).run()
//keep left
source.via(multiply).toMat(sink)(Keep.left).run()
//keep both
source.via(multiply).toMat(sink)(Keep.both).run()
//keep none
source.via(multiply).toMat(sink)(Keep.none).run()

you can use .viaMat to apply this on Flow.

//keep right
source.viaMat(multiply)(Keep.right).to(sink).run()
//keep left
source.viaMat(multiply)(Keep.left).to(sink).run()
//keep both
source.viaMat(multiply)(Keep.both).to(sink).run()
//keep none
source.viaMat(multiply)(Keep.none).to(sink).run()

By Default Akka streams keeps/returns left value(Source) of Materialized value.

if you are using Akka classic actors, you need to manually initialize materializer and make it an implicit.

implicit val actorMaterializer = ActorMaterializer()

but to Akka Typed this is deprecated as is implicitly imported into scope.

Note:
method apply in object ActorMaterializer is deprecated (since 2.6.0): Use the system wide materializer with stream attributes or configuration settings to change defaults .

lets talk about the magic that happens under the hood now –

code snippet-

import akka.actor.typed.scaladsl.Behaviors
import akka.actor.typed.ActorSystem
import akka.stream.scaladsl.{Flow, Sink, Source}object backPressure extends App { implicit val system: ActorSystem[Nothing] = ActorSystem(Behaviors.empty,"backPressure-example") val source = Source(1 to 10)
val multiply = Flow[Int].map(x => x* 10)
val sink = Sink.foreach[Int] (println) source.via(multiply).to(sink).run()
}

Back Pressure

lets say the source is producing data fast and sink is consuming the data slowly with 1 sec delay.

val source = Source(1 to 1000)
val multiply = Flow[Int].map(x => x* 10)
val sink = Sink.foreach[Int] {
Thread.sleep(1000)
(println)
}

here when the source is sending the data to the sink via flow, the sink sends back the signal to the source to slow down. flow will attempt slow down the data ingestion, if it fails to do so then it signals the source to slow down.all this happens under the hood.

stream max input buffer size is 16.
sink will buffer incoming data until the buffer size is exhausted and send the signal to the upstream to slow down.
here all the computation is happening on a single actor or thread.
we have a way to break this down and run the operations on different threads/Actors…

async-

its as simple as adding the term explicitly after the end of each operator as such using the .async method. Being run asynchronously means that an operator, after handing out an element to its downstream consumer is able to immediately process the next message.

source.via(multiply).async
.to(sink).async
.run()

The order is always guarantee but not sequential due to asynchronous nature.
for example –

Source(1 to 3)
.map { i =>
println(s"A: $i"); i
}
.async
.map { i =>
println(s"B: $i"); i
}
.async
.map { i =>
println(s"C: $i"); i
}
.async
.runWith(Sink.ignore)

the output will be

A: 1A: 2B: 1A: 3B: 2C: 1B: 3C: 2C: 3

A: 1 , A: 2, A: 3 will always printed in order but other terms like B: 1 might be outputted in between due to asynchronous nature.

Note that the order is not A:1, B:1, C:1, A:2, B:2, C:2, which would correspond to the normal fused synchronous execution model of flows where an element completely passes through the processing pipeline before the next element enters the flow. The next element is processed by an asynchronous operator as soon as it has emitted the previous one.

Akka Streams uses a windowedbatching back-pressure strategy internally. It is windowed because as opposed to a Stop-And-Wait protocol multiple elements might be “in-flight” concurrently with requests for elements. It is also batching because a new element is not immediately requested once an element has been drained from the window-buffer but multiple elements are requested after multiple elements have been drained. This batching strategy reduces the communication cost of propagating the back-pressure signal through the asynchronous boundary.

Conclusion-

Akka Streams is a proficient way to deal with data streams and is accountable for handling complexity to help you focus on the business to achieve targets.Akka streams comes with built-in rich library and operators which are built on top of Akka Actors resulting asynchronousy and concurrency and handles most of the jobs under the hood.

know more about Akka streams at official page — Akka Streams.

more about Steams and graphDSL coming soon….
Thanks for reading and the support.🍻
Happy Holidays!

Leave a Reply

Your email address will not be published. Required fields are marked *