其实,actor的状态千差万别,怎么来统一恢复模型呢?比如有些场景下,actor的状态是一个整数,有些则是负责的类型。akka Persistence并没有直接保存actor的状态,而是另辟蹊径,保存了actor接收到的消息,在actor重启时,把接收到的所有消息重新发送给actor,由actor自己决定如何构建最新的状态。这其实就是事件溯源的典型应用,结合Actor模型,简直是完美。
在分析akka Persistence源码之前,我们先来介绍一下这个框架涉及到的几个重要概念。
Snapshot store。快照存储。如果actor收到的消息太多,在恢复的时候从头开始回放,会导致恢复时间过长。快照其实就是当前状态的截面,在恢复的时候,我们可以从最新的快照开始恢复,从而大大减少恢复的时间。
Event sourcing。事件溯源。akka对事件溯源应用开发提供了一个抽象,事件溯源提供了一种物化领域模型数据的持久化方式,它不直接存储最终状态,而是存储导致状态变化的事件,通过这些事件构建最终的状态。比如不直接记录账户的最终金额,而是记录账户的借贷记录(也就是+、-记录),通过这些借贷记录,计算最终的账户余额。
case class ExampleState(events: List[String] = Nil) { def updated(evt: Evt): ExampleState = copy(evt.data :: events) def size: Int = events.length override def toString: String = events.reverse.toString } class ExamplePersistentActor extends PersistentActor { override def persistenceId = "sample-id-1" var state = ExampleState() def updateState(event: Evt): Unit = state = state.updated(event) def numEvents = state.size val receiveRecover: Receive = { case evt: Evt ⇒ updateState(evt) case SnapshotOffer(_, snapshot: ExampleState) ⇒ state = snapshot } val snapShotInterval = 1000 val receiveCommand: Receive = { case Cmd(data) ⇒ persist(Evt(s"${data}-${numEvents}")) { event ⇒ updateState(event) context.system.eventStream.publish(event) if (lastSequenceNr % snapShotInterval == 0 && lastSequenceNr != 0) saveSnapshot(state) } case "print" ⇒ println(state) } }
/** * Asynchronously persists `event`. On successful persistence, `handler` is called with the * persisted event. It is guaranteed that no new commands will be received by a persistent actor * between a call to `persist` and the execution of its `handler`. This also holds for * multiple `persist` calls per received command. Internally, this is achieved by stashing new * commands and unstashing them when the `event` has been persisted and handled. The stash used * for that is an internal stash which doesn't interfere with the inherited user stash. * * An event `handler` may close over persistent actor state and modify it. The `sender` of a persisted * event is the sender of the corresponding command. This means that one can reply to a command * sender within an event `handler`. * * Within an event handler, applications usually update persistent actor state using persisted event * data, notify listeners and reply to command senders. * * If persistence of an event fails, [[#onPersistFailure]] will be invoked and the actor will * unconditionally be stopped. The reason that it cannot resume when persist fails is that it * is unknown if the event was actually persisted or not, and therefore it is in an inconsistent * state. Restarting on persistent failures will most likely fail anyway, since the journal * is probably unavailable. It is better to stop the actor and after a back-off timeout start * it again. * * @param event event to be persisted * @param handler handler for each persisted `event` */ def persist[A](event: A)(handler: A ⇒ Unit): Unit = { internalPersist(event)(handler) }
@InternalApi final private[akka] def internalPersist[A](event: A)(handler: A ⇒ Unit): Unit = { if (recoveryRunning) throw new IllegalStateException("Cannot persist during replay. Events can be persisted when receiving RecoveryCompleted or later.") pendingStashingPersistInvocations += 1 pendingInvocations addLast StashingHandlerInvocation(event, handler.asInstanceOf[Any ⇒ Unit]) eventBatch ::= AtomicWrite(PersistentRepr(event, persistenceId = persistenceId, sequenceNr = nextSequenceNr(), writerUuid = writerUuid, sender = sender())) }
@InternalApi final private[akka] def internalPersistAll[A](events: immutable.Seq[A])(handler: A ⇒ Unit): Unit = { if (recoveryRunning) throw new IllegalStateException("Cannot persist during replay. Events can be persisted when receiving RecoveryCompleted or later.") if (events.nonEmpty) { events.foreach { event ⇒ pendingStashingPersistInvocations += 1 pendingInvocations addLast StashingHandlerInvocation(event, handler.asInstanceOf[Any ⇒ Unit]) } eventBatch ::= AtomicWrite(events.map(PersistentRepr.apply(_, persistenceId = persistenceId, sequenceNr = nextSequenceNr(), writerUuid = writerUuid, sender = sender()))) } }
override protected[akka] def aroundReceive(receive: Receive, message: Any): Unit = currentState.stateReceive(receive, message)
/** * INTERNAL API. * * Can be overridden to intercept calls to this actor's current behavior. * * @param receive current behavior. * @param msg current message. */ @InternalApi protected[akka] def aroundReceive(receive: Actor.Receive, msg: Any): Unit = { // optimization: avoid allocation of lambda if (receive.applyOrElse(msg, Actor.notHandledFun).asInstanceOf[AnyRef] eq Actor.NotHandled) { unhandled(msg) } }
// safely null because we initialize it with a proper `waitingRecoveryPermit` state in aroundPreStart before any real action happens private var currentState: State = null
override protected[akka] def aroundPreStart(): Unit = { require(persistenceId ne null, s"persistenceId is [null] for PersistentActor [${self.path}]") require(persistenceId.trim.nonEmpty, s"persistenceId cannot be empty for PersistentActor [${self.path}]") // Fail fast on missing plugins. val j = journal; val s = snapshotStore requestRecoveryPermit() super.aroundPreStart() }
注释中说的也比较清楚,这个变量是在aroundPreStart中最先赋值的,而aroundPreStart又是Actor的一个生命周期hook。其实吧这个j和s变量的赋值,我是不喜欢的,居然用简单的赋值来初始化一个lazy val变量,进而判断对应的插件是否存在,真是恶心。
private def requestRecoveryPermit(): Unit = { extension.recoveryPermitter.tell(RecoveryPermitter.RequestRecoveryPermit, self) changeState(waitingRecoveryPermit(recovery)) }
/** * Initial state. Before starting the actual recovery it must get a permit from the * `RecoveryPermitter`. When starting many persistent actors at the same time * the journal and its data store is protected from being overloaded by limiting number * of recoveries that can be in progress at the same time. When receiving * `RecoveryPermitGranted` it switches to `recoveryStarted` state * All incoming messages are stashed. */ private def waitingRecoveryPermit(recovery: Recovery) = new State { override def toString: String = s"waiting for recovery permit" override def recoveryRunning: Boolean = true override def stateReceive(receive: Receive, message: Any) = message match { case RecoveryPermitter.RecoveryPermitGranted ⇒ startRecovery(recovery) case other ⇒ stashInternally(other) } }
/** * Recovery mode configuration object to be returned in [[PersistentActor#recovery]]. * * By default recovers from latest snapshot replays through to the last available event (last sequenceId). * * Recovery will start from a snapshot if the persistent actor has previously saved one or more snapshots * and at least one of these snapshots matches the specified `fromSnapshot` criteria. * Otherwise, recovery will start from scratch by replaying all stored events. * * If recovery starts from a snapshot, the persistent actor is offered that snapshot with a [[SnapshotOffer]] * message, followed by replayed messages, if any, that are younger than the snapshot, up to the * specified upper sequence number bound (`toSequenceNr`). * * @param fromSnapshot criteria for selecting a saved snapshot from which recovery should start. Default * is latest (= youngest) snapshot. * @param toSequenceNr upper sequence number bound (inclusive) for recovery. Default is no upper bound. * @param replayMax maximum number of messages to replay. Default is no limit. */ @SerialVersionUID(1L) final case class Recovery( fromSnapshot: SnapshotSelectionCriteria = SnapshotSelectionCriteria.Latest, toSequenceNr: Long = Long.MaxValue, replayMax: Long = Long.MaxValue)
private def startRecovery(recovery: Recovery): Unit = { val timeout = { val journalPluginConfig = this match { case c: RuntimePluginConfig ⇒ c.journalPluginConfig case _ ⇒ ConfigFactory.empty } extension.journalConfigFor(journalPluginId, journalPluginConfig).getMillisDuration("recovery-event-timeout") } changeState(recoveryStarted(recovery.replayMax, timeout)) loadSnapshot(snapshotterId, recovery.fromSnapshot, recovery.toSequenceNr) }
/** * Instructs the snapshot store to load the specified snapshot and send it via an [[SnapshotOffer]] * to the running [[PersistentActor]]. */ def loadSnapshot(persistenceId: String, criteria: SnapshotSelectionCriteria, toSequenceNr: Long): Unit = snapshotStore ! LoadSnapshot(persistenceId, criteria, toSequenceNr)
/** Snapshot store plugin actor. */ private[persistence] def snapshotStore: ActorRef
private[persistence] lazy val snapshotStore = { val snapshotPluginConfig = this match { case c: RuntimePluginConfig ⇒ c.snapshotPluginConfig case _ ⇒ ConfigFactory.empty } extension.snapshotStoreFor(snapshotPluginId, snapshotPluginConfig) }
# Absolute path to the default journal plugin configuration entry. akka.persistence.journal.plugin = "akka.persistence.journal.inmem" # Absolute path to the default snapshot store plugin configuration entry. akka.persistence.snapshot-store.plugin = "akka.persistence.snapshot-store.local"
akka.persistence.snapshot-store.local { # Class name of the plugin. class = "akka.persistence.snapshot.local.LocalSnapshotStore" # Dispatcher for the plugin actor. plugin-dispatcher = "akka.persistence.dispatchers.default-plugin-dispatcher" # Dispatcher for streaming snapshot IO. stream-dispatcher = "akka.persistence.dispatchers.default-stream-dispatcher" # Storage location of snapshot files. dir = "snapshots" # Number load attempts when recovering from the latest snapshot fails # yet older snapshot files are available. Each recovery attempt will try # to recover using an older than previously failed-on snapshot file # (if any are present). If all attempts fail the recovery will fail and # the persistent actor will be stopped. max-load-attempts = 3 }
/** * Response message to a [[LoadSnapshot]] message. * * @param snapshot loaded snapshot, if any. */ final case class LoadSnapshotResult(snapshot: Option[SelectedSnapshot], toSequenceNr: Long) extends Response
private val recoveryBehavior: Receive = { val _receiveRecover = try receiveRecover catch { case NonFatal(e) ⇒ try onRecoveryFailure(e, Some(e)) finally context.stop(self) returnRecoveryPermit() Actor.emptyBehavior } { case PersistentRepr(payload, _) if recoveryRunning && _receiveRecover.isDefinedAt(payload) ⇒ _receiveRecover(payload) case s: SnapshotOffer if _receiveRecover.isDefinedAt(s) ⇒ _receiveRecover(s) case RecoveryCompleted if _receiveRecover.isDefinedAt(RecoveryCompleted) ⇒ _receiveRecover(RecoveryCompleted) } }
# In-memory journal plugin. akka.persistence.journal.inmem { # Class name of the plugin. class = "akka.persistence.journal.inmem.InmemJournal" # Dispatcher for the plugin actor. plugin-dispatcher = "akka.actor.default-dispatcher" }
private def returnRecoveryPermit(): Unit = extension.recoveryPermitter.tell(RecoveryPermitter.ReturnRecoveryPermit, self)
private def transitToProcessingState(): Unit = { if (eventBatch.nonEmpty) flushBatch() if (pendingStashingPersistInvocations > 0) changeState(persistingEvents) else { changeState(processingCommands) internalStash.unstashAll() } }
/** * Command processing state. If event persistence is pending after processing a * command, event persistence is triggered and state changes to `persistingEvents`. */ private val processingCommands: State = new ProcessingState { override def toString: String = "processing commands" override def stateReceive(receive: Receive, message: Any) = if (common.isDefinedAt(message)) common(message) else try { Eventsourced.super.aroundReceive(receive, message) aroundReceiveComplete(err = false) } catch { case NonFatal(e) ⇒ aroundReceiveComplete(err = true); throw e } private def aroundReceiveComplete(err: Boolean): Unit = { if (eventBatch.nonEmpty) flushBatch() if (pendingStashingPersistInvocations > 0) changeState(persistingEvents) else unstashInternally(all = err) } override def onWriteMessageComplete(err: Boolean): Unit = { pendingInvocations.pop() unstashInternally(all = err) } }
private def aroundReceiveComplete(err: Boolean): Unit = { if (eventBatch.nonEmpty) flushBatch() if (pendingStashingPersistInvocations > 0) changeState(persistingEvents) else unstashInternally(all = err) }
private def flushBatch() { if (eventBatch.nonEmpty) { journalBatch ++= eventBatch.reverse eventBatch = Nil } flushJournalBatch() }
private def flushJournalBatch(): Unit = if (!writeInProgress && journalBatch.nonEmpty) { journal ! WriteMessages(journalBatch, self, instanceId) journalBatch = Vector.empty writeInProgress = true }
case WriteMessageSuccess(p, id) ⇒ // instanceId mismatch can happen for persistAsync and defer in case of actor restart // while message is in flight, in that case we ignore the call to the handler if (id == instanceId) { updateLastSequenceNr(p) try { peekApplyHandler(p.payload) onWriteMessageComplete(err = false) } catch { case NonFatal(e) ⇒ onWriteMessageComplete(err = true); throw e } }
private def peekApplyHandler(payload: Any): Unit = try pendingInvocations.peek().handler(payload) finally flushBatch()
override def onWriteMessageComplete(err: Boolean): Unit = { pendingInvocations.pop() match { case _: StashingHandlerInvocation ⇒ // enables an early return to `processingCommands`, because if this counter hits `0`, // we know the remaining pendingInvocations are all `persistAsync` created, which // means we can go back to processing commands also - and these callbacks will be called as soon as possible pendingStashingPersistInvocations -= 1 case _ ⇒ // do nothing } if (pendingStashingPersistInvocations == 0) { changeState(processingCommands) unstashInternally(all = err) } }
override def stateReceive(receive: Receive, message: Any) = if (common.isDefinedAt(message)) common(message) else stashInternally(message)
/** * Enqueues `envelope` at the first position in the mailbox. If the message contained in * the envelope is a `Terminated` message, it will be ensured that it can be re-received * by the actor. */ private def enqueueFirst(envelope: Envelope): Unit = { mailbox.enqueueFirst(self, envelope) envelope.message match { case Terminated(ref) ⇒ actorCell.terminatedQueuedFor(ref) case _ ⇒ } }