Distributed systems are imperative standard to improve resiliency, adopt and compete in today’s everchanging virtual world. entire system can be built on different nodes or maintained under a single node by making them into smaller units. Building Distributed systems is state of art and can be complicated by nature. Preventing the systems from failure/exceptions should be one of the goals when implementing the logic, which can be tedious and can be a penalty to business if not taken care.

Akka supervsion using scala.

Today you are going to understand how to oversee and supervise actors.
I hope you are already aware of what is Akka and how to initialize the project. if not then i recommended to check Introduction.

When an actor throws an unexpected exception or a failure or crash’s while processing a message or during  initialization, the actor will by default be stopped.
if you have used Akka classic API, then you should be aware that Actors will be restarted by default.


Supervision is all about declaring what should happen when certain types of exceptions/failures are encountered inside an actor.

Actor based model contains hierarchy to delegate work to low level actors.

An actor in Akka always belongs to a parent. You create an actor by calling 

context.spawn(Job(), name = jobName)

The creator actor becomes the parent.
now the question arises, who is the parent of this actor which you are referring to create child actor?
all actors have a common parent, the /root guardian, which is defined and created when you initialize the ActorSystem.

image referred from Akka official page

Actor hierarchy-

  • / the  root guardian. This is the parent of all actors in the system, and the last one to stop when the system is terminated.
  • /system the system guardian. Akka or other libraries built on top of Akka may create actors in the system namespace.
  • /user the user guardian. This is the top level actor that you provide to start all other actors in your application(this is the actor you use to interact and work with).

guardian Actor is created when actor system is initialized, then later, at user requests, user actors are created/stopped(actors are not stopped automatically when no longer referenced, they need to be stopped explicitly). Whenever an actor is stopped, all of its children are recursively stopped too. This behavior greatly simplifies resource cleanup and helps avoid resource leaks such as those caused by open sockets and files.

method signature of Supervise-

def supervise[T](wrapped: Behavior[T]): Supervise[T]

Supervision comes into action when we face  failures and validation errors in the system behavior.

  • validation error– means that the behavior message sent to an actor is not valid, this should rather be modelled as a part of the actor protocol than make the actor throw exceptions, these are expected exceptions which can be handled with a regular try-catch or standard library tools.
  • failure– is something unexpected or outside the control of the actor itself, for example a broken database connection, network resource being unavailable, a disk write failing or perhaps a bug in the application logic.

Depending on the nature of the work to be supervised and the nature of the failure, Akka provides the following strategies:

  • Resume the actor- best option to implement when the error did not corrupt the state of the actor.
  • Restart the actor- clearing out its accumulated internal state, with a potential delay starting again, this will in-turn stops all the child actors avoiding resource leaks of creating new child actors each time the parent is restarted, guarantees actor state.
    An actor can stop itself by returning Behaviors.stopped as the next behavior
  • Stop the actor permanently- default strategy if the actor state is no longed reliable and will result in faulty outcomes.

supervision behavior is wrapped using 

//restart strategy

//resume strategy

it is possible to nest different exceptions with different strategic calls,


Actor signals-

Before a supervised actor is restarted/stopped it will send following signals for resource cleanup.

  • PreRestart– Lifecycle signal that is fired upon restart of the Actor before replacing the behavior with the fresh one (i.e. this signal is received within the behavior that failed).
  • PostStop– Lifecycle signal that is fired after this actor and all its child actors have terminated. The Terminated signal is only sent to registered watchers after this signal has been processed.

Good to know-

System signals are notifications that are generated by the system and delivered to the Actor behavior in a reliable fashion (i.e. they are guaranteed to arrive in contrast to the at-most-once semantics of normal Actor messages).

val dbConnector: Behavior[DbCommand] =     MongoClients.create("mongodb://localhost:27017")

val dbRestarts =
// handle all NonFatal exceptions

val dbSpecificResumes =
     .onFailure[IndexOutOfBoundsException](SupervisorStrategy.resume) // resume for IndexOutOfBoundsException exceptions

Bubbling up the failure-

In some scenarios we should let the parent actor handle the decision about what to do on a failure upwards in the Actor hierarchy.

Watching Actors-

In order to be notified when another actor terminates (i.e. stops permanently, not temporary failure and restart), an actor can watch another actor. It will receive the terminated signal upon termination of the watched actor.


for the parent to be notified, If the child was stopped because of a failure the ChildFailed signal will be received which will contain the cause. if your use case does not need to distinguish between stopping and failing you can handle both cases with the Terminated signal.

An alternative to watch is watchWith, which allows specifying a custom message instead of the Terminated. This is often preferred over using watch and the Terminated signal because additional information can be included in the message that can be used later when receiving it.

context.watchWith(job, "message")


Akka toolkit provides best in class supervision strategies to handle failures and exceptions for avoiding downtimes and memory leaks. this article is intended as an introduction, code examples and more details will be published soon.

Thanks for the support, please comment for any queries/ mistakes in the article feedback is always welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *