Designing and implementing a scalable and flexible notification delivery solution for AnimalFarm.
Background
Our company AnimalFarm operates the largest cat management platform worldwide with millions of users sharing awesome updates about their favorite pets day and night. Recently the management decided to focus more on driving user engagement, targeting an ambitious uplift of daily and weekly active users and number of interactions on the site.
One of our ideas was to lure back the inactive users to the platform with attractive content, hence we came up with the plan of introducing cat notifications.
The real-time push messages were intended to notify users about cute new cats added to the site by someone in their network. We were planning to send the user notifications via their previously configured delivery channels (e-mail, phone, etc.) with a personalized message and a link to the page of the newly added cat, so they could adore them instantly by visiting our site again.

Design requirements
Working together with the architects we identified the key non-functional requirements of the notification delivery system:
Loose coupling. The notification module must be independent of the core business functions and operate in isolation. It must be deployed and maintained as a separate entity without any entanglement with the main services of AnimalFarm. We want to avoid introducing any side effects, cross-dependencies, performance problems for our core use cases.
Scalability. We expect our users to generate millions of events per day globally once the adoption rate is ramping up, hence the solution must be easily scalable from the initial pilot program to the widely adopted production version without any major re-architecting efforts.
Flexible delivery options. We have plans to introduce more and more delivery channels as our technology evolves. In the first iteration we want to focus on email, Android and iOS push notifications, however, very soon we will add more options like AI assistant integration for the bleeding edge fans and fax for our more senior business clients. Users might have one or multiple notification delivery channels set up and could change their preferences any time.
Reliable delivery. The cat notifications will be critical for our most engaged users who want to be always up-to-date with the latest updates of the cat community. We have to ensure that our notification system is robust enough and offers highly reliable delivery capabilities. It must handle issues like overloaded consumers and temporary outages of the target systems (e.g. mail server is down).
Design solution
Based on the requirement discussions between business and the architects, we came up with an asynchronous, queue-based solution for the problem.
In the solution we are introducing an event queue to subscribe to all cat changes issued by CatService, our main component responsible for cat management. The event-driven architecture ensures a high degree of decoupling and minimizes our impact on the business critical system of CatService.

The events are processed by the CatReportService, which is the core part of our solution. Based on the incoming events and the user preferences of delivery channels and interests, it generates the personalized notifications to be sent out, then routes them to the appropriate output queues.
Messages are generated based on user preferences and tailored to the individual recipients, hence we need to optimize the performance of the condition checks and make sure that the user settings are cached on the service instances to maximize throughput and minimize the amount of additional I/O operations.
One event might result in many notifications to be sent out via multiple channels. For the delivery we are introducing a fan-out design, where CatReportService is putting the generated notifications to the various queues associated with the individual delivery channels. This way we are limiting the service’s scope to routing based on the user preferences and not overloading it with the responsibilities of the channel-specific dispatch and error handling logic.
The final step of the flow is the actual delivery of the generated notifications, implemented by MailService and PushNotificationService. They encapsulate the implementation details of the concrete delivery method and take care of the additional concerns of reliable delivery including error handling, retries, throttling, backoff policies and dead letter queues.
Further advantages of the delivery service separation are reusability for other types of notifications and scalability per delivery channel. High demand delivery services can be independently and automatically scaled up and down based on metrics like number of items waiting in the input queue.
Service prototype
As part of the design efforts we also delivered a proof of concept of the cat notification solution using Amazon SQS and Spring Boot services. Our first target was to implement a prototype for the CatReportService and its service boundaries, so we can validate the performance of the core of the system including personalized notification generation and routing to the output delivery queues.

We decided to implement the service on the principles of clean architecture, separating the external frameworks and infrastructure from the core business logic and applying the dependency inversion rule to ensure that the business domain does not rely on the implementation details of the I/O interfaces.
Following this structure proved to be very convenient from the perspective of testability and portability. The core of the service could be easily run locally with mock or fake implementations of the infrastructure code like the SQS client library.

Running the prototype in the cloud helped us to quickly validate our ideas about the individual service and the overall system design.
Having a clear separation of concerns for the dispatch and actual delivery allowed us to test the performance and scalability of the CatReportService prototype, fine-tune the input queues and cache parameters without implementing the actual notification delivery subsystem. End-to-end systems tests will be next, once the MailService and PushNotificationService prototypes are also ready for validation.

Conclusion
In this short exercise we demonstrated how AnimalFarm designed its scalable notification delivery system that supports all cat enthusiasts worldwide. A sneak peak into the service prototype showed us an example of clean architecture design and its benefits. We are looking forward to sharing more after the first production rollout 😸.