Dec 1, 2013

0 notes

An auto-syncing noBackend cloud storage app using only AWS infrastructure

Disclaimer: This is an experiment. This article assumes that you are familiar with Amazon S3, Amazon Simple Notification Service and Amazon Simple Queue Service

Goal

Develop a cloud storage app without a backend server!

Design

I did not want to write my own storage server but use one of the several options out there such as Amazon’s S3, Microsoft’s SkyDrive, Google’s Drive etc. After some digging, I found several applications for Amazon S3 that will keep a client in sync with a S3 bucket by running a periodic job to check if the files have changed and update them accordingly. This is a typical synchronous application, similar to rsync. This is no fun! I wanted to develop a system that will talk to/notify each other about file changes and update files in an async manner. This requires a server to mediate between them.

Looking deeper, I realized that I don’t need a server. All I need is a notification system. Amazon Simple Notification Service to the rescue! SNS offers a simple API to publish messages and subscribed clients get notified when a message is published. One of the drawbacks of SNS is that, once a client consumes a message, the message is destroyed. This means, once the first client reads the message, other client will go out of sync. One way to solve this - create a new topic for every client. Each client can update at their own pace. However, this means, each client should know about every other client and publish messages to each client’s SNS topic on every change. This is not scalable and it could leave clients left out of sync.

Then, I came across Amazon Simple Queue Service. Amazon SQS is a simple message queue service. Using a simple API call, a SQS queue can subscribe to a SNS topic. This will forward all messages published to a SNS topic to a SQS queue. This could lead to a better design. We can have a common SNS topic that all clients can publish to and each client can have their own SQS queue. When a client publishes a message, the message is forwarded to all the queues and clients can read from the queue as and when they want.

At a high level, the design roughly looks like this..

When a new client comes onboard, the following steps are done

When an event occurs - CRUD operations, the sequence looks something like this

This is a super high level article. I left out much of the details. Some of the issues/enhancements are

  • SQS retains messages only for 14 days max
  • Too chatty! An event is sent for every CRUD operation
  • Events can be coalesced(time based?) to minimize updates

Code available here. Let me know if you have questions.

Blog comments powered by Disqus

About