Which AWS service can be used to decouple the components of the application?
Take a moment to look back at what we’ve achieved so far. We started with a simple web application, with an infrastructure that was ready to fail. We took this app and used AWS services like Elastic Load Balancing and Auto Scaling to make it resilient, with certain components scaling out (and in) with traffic demand. We built an infrastructure (without any investment) that is capable of handling huge traffic at relatively low operational cost. Quite a feat! Show
Scaling beyond this amount of traffic requires more drastic changes in the application. We need to decouple, and for that we’ll use Amazon SimpleDB, Amazon Simple Notification Service (SNS), and Amazon Simple Queue Service (SQS), together with S3, which we have already seen in action. But these services are more versatile than just allowing us to scale. We can use the decoupling principle in other scenarios, either because we already have distinct components or because we can easily add functionality. In this chapter, we’ll be presenting many different use cases. Some are already in production, others are planned or dreamed of. The examples are meant to show what you can do with these services and help you to develop your own by using code samples in various languages. We have chosen to use real-world applications that we are working with daily. The languages are Java, PHP, and Ruby, and the examples should be enough to get you going in other languages with libraries available. In the SQS Developer Guide, you can read that “Amazon SQS is a distributed queue system that enables web service applications to quickly and reliably queue messages that one component in the application generates to be consumed by another component. A queue is a temporary repository for messages that are awaiting processing.” And that’s basically all it is! You can have many writers hitting a queue at the same time. SQS does its best to preserve order, but the distributed nature makes it impossible to guarantee it. If you really need to preserve order, you can add your own identifier as part of the queued messages, but approximate order is probably enough to work with in most cases. A trade-off like this is necessary in massively scalable services like SQS. This is not very different from eventual consistency, as seen in S3 and (as we will show soon) in SimpleDB. You can also have many readers, and SQS guarantees each message is delivered at least once. Reading a message is atomic—locks are used to keep multiple readers from processing the same message. Because in such a distributed system you can’t assume a message is not immediately deleted, SQS sets it to invisible. This invisibility has an expiration, called visibility timeout, that defaults to 30 seconds. If this is not enough, you can change it in the queue or per message, although the recommended way is to use different queues for different visibility timeouts. After processing the message, it must be deleted explicitly (if successful, of course). You can have as many queues as you want, but leaving them inactive is a violation of intended use. We couldn’t figure out what the penalties are, but the principle of cloud computing is to minimize waste. Message size is variable, and the maximum is 64 KB. If you need to work with larger objects, the obvious place to store them is S3. In our examples, we use this combination as well. One last important thing to remember is that messages are not retained indefinitely. Messages will be deleted after four days by default, but you can have your queue retain them for a maximum duration of two weeks. We’ll show a number of interesting applications of SQS. For Kulitzer, we want more flexibility in image processing, so we’ve decided to decouple the web application from the image processing. For Marvia, we want to implement delayed PDF processing: users can choose to have their PDFs processed later, at a cheaper rate. And finally, we’ll use Decaf to have our phone monitor our queues and notify when they are out of bounds. Example 1: Offloading Image Processing for Kulitzer (Ruby)Remember how we handle image processing with Kulitzer? We basically have the web server spawn a background job for asynchronous processing, so the web server (and the user) can continue their business. The idea is perfect, and it works quite well. But there are emerging conversations within the team about adding certain features to Kulitzer that are not easily implemented in the current infrastructure. Two ideas floating around are RAW images and video. For both these formats, we face the same problem: there is no ready-made solution to cater to our needs. We expect to have to build our service on top of multiple available free (and less free) solutions. Even though these features have not yet been requested and aren’t found on any road map, we feel we need to offer the flexibility for this innovation. For the postprocessing of images (and video) to be more flexible, we need to separate this component from the web server. The idea is to implement a postprocessing component, picking up jobs from the SQS queue as they become available. The web server will handle the user upload, move the file to S3, and add this message to the queue to be processed. The images (thumbnails, watermarked versions, etc.) that are not yet available will be replaced by a “being processed” image (with an expiration header in the past). As soon as the images are available, the user will see them. Figure 4-1 shows how this could be implemented using a different EC2 instance for image processing in case scalability became a concern. The SQS image queue and the image processing EC2 instance are introduced in this change. Figure 4-1. Offloading image processing We already have the basics in our application, and we just need to separate them. We will move the copying of the image to S3 out of the background jobs, because if something goes wrong we need to be able to notify the user immediately. The user will wait until the image has been uploaded, so he can be notified on the spot if something went wrong (wrong file image, type, etc.). This simplifies the app, making it easier to maintain. If the upload to S3 was successful, we add an entry to the 'https://s3-eu-west-1.amazonaws.com/production/templ_1.xml', 'assets' => 'https://s3-eu-west-1.amazonaws.com/production/assets/223', 'result' => 'https://s3-eu-west-1.amazonaws.com/production/pdfs/223'); $body = json_encode( $job_description); $sqs = new AmazonSQS(); $sqs->set_region($sqs::REGION_EU_W1); $high_priority_jobs_queue = $sqs->create_queue( $queue_name); $high_priority_jobs_queue->isOK() or die('could not create queue high-priority-jobs'); # add the message to the queue $response = $sqs->send_message( $high_priority_jobs_queue->body->QueueUrl(0), $body); pr( $response->body); function pr($var) { print ''; print_r($var); print '8 queue.
We are going to add a simple SQS browser to Decaf. It shows the queues in a region, and you can see the state of a queue by inspecting its attributes. The attributes we are interested in are set_region($sqs::REGION_EU_W1); $queue = $sqs->create_queue($queue_name); $queue->isOK() or die('could not create queue ' . $queue_name); $receive_response = $sqs->receive_message( $queue->body->QueueUrl(0)); # process the message... $delete_response = $sqs->delete_message( $queue->body->QueueUrl(0), (string)$receive_response->body->ReceiptHandle(0)); $body = json_decode($receive_response->body->Body(0)); pr( $body); function pr($var) { print ''; print_r($var); print '0 andset_region($sqs::REGION_EU_W1); $queue = $sqs->create_queue($queue_name); $queue->isOK() or die('could not create queue ' . $queue_name); $receive_response = $sqs->receive_message( $queue->body->QueueUrl(0)); # process the message... $delete_response = $sqs->delete_message( $queue->body->QueueUrl(0), (string)$receive_response->body->ReceiptHandle(0)); $body = json_decode($receive_response->body->Body(0)); pr( $body); function pr($var) { print ''; print_r($var); print '1. We already have all the mechanics in place to monitor certain aspects of your infrastructure automatically; we just need to add appropriate calls to watch the queues.
You can change the visibility timeout, policy, maximum message size, and message retention period by invoking the set_region($sqs::REGION_EU_W1); $queue = $sqs->create_queue($queue_name); $queue->isOK() or die('could not create queue ' . $queue_name); $receive_response = $sqs->receive_message( $queue->body->QueueUrl(0)); # process the message... $delete_response = $sqs->delete_message( $queue->body->QueueUrl(0), (string)$receive_response->body->ReceiptHandle(0)); $body = json_decode($receive_response->body->Body(0)); pr( $body); function pr($var) { print ''; print_r($var); print '4 action. |