Ingest Millions of Messages in the Queue using Serverless

While working on one of the Serverless projects, We came across a problem where I needed to ingest millions of messages from an external API to Azure Storage Queue. We made a series of changes in my approach to reach the solution which worked for us and I believe it can be helpful to anyone who’s struggling with the same problem.

We started with the approach where we were executing the Serverless function through the HTTP endpoint. We call it an Ingestion Function. The ingestion function would then hit an external API to get the messages and try to ingest it in Queue.

Here is how my initial architecture looked like:

If you see the architecture diagram, the single ingestion function is a bottleneck as it can only scale up-to specific limits. Also, we’re not utilizing the strength of the actual serverless function, which is multiple parallel small invocations. Therefore, we decided to go with the approach where I could scale up the ingestion serverless to multiple invocations so that I can get the scalability as much as needed.

The idea was to divide the total number of messages into multiple chunks (of 5000 in my case) and then pass those messages to the ingestion function so that it can finally ingest those messages to the Queue.

We created another serverless function, we call it Chunking Function, to divide the messages into chunks using this helper function:

def chunking_of_list(data):
"""return the list in 5000 chunks"""
  return [data[x:x+5000] for x in range(0, len(data), 5000)]

And then uploaded each chunk in a separate file into Azure Blob Storage using this code:

def upload_chunks(idLists):
  st = storage.Storage(container_name=constant.CHUNK_CONTAINER)

  for index, singleList in enumerate(idLists):
     logging.info('Uploading file{0} to the {1} blob storage'  .format(index, constant.CHUNK_CONTAINER))
     st.upload_chunk('file' + str(index), singleList)

Finally, we set up the ingestion function to listen to the Azure Blob Storage file upload events. As soon as the file gets uploaded, the ingestion function will download the file, read it, and ingest the messages into Queue. As desired we now have multiple invocations of ingestion functions to work in parallel therefore, we achieved scalability.

Here’s how the final architecture looked like:

We essentially followed a fan-out architecture, where we fan out our workload to multiple Serverless invocations instead of one.

Peace ✌

My Talk on Serverless Architecture

A few days back, I had a chance to give a talk on Serverless Architecture —  the topic about which I’m passionate about and have been working on for some time now.

It was my second time doing public speaking but unlike before most of the audience were from different software companies. The event was organized and managed by the Code Movement.

I love and promote OSS (open-source software) and have been OpenSourcing on serverless especially AWS Lambda for the last 2 years. It was the main motivation behind giving this talk.

The venue that we decided was Datum Square Hall at Software Technology Park 3, Islamabad on 18th January.

We covered the following topics in details:

  • Journey to Server — Why was there a need for serverless in the first place.
  • What is Serverless — How do the serverless works under the hood.
  • Why Serverless — Major benefits of using serverless.
  • Serverless Frameworks — Development of serverless functions using different serverless frameworks.
  • Serverless in Production — Some production use cases for serverless.
  • Demo — Demo using the AWS console and API Gateway. Another Demo using Serverless Framework.

Check the slides here.

Moments to Remember

Looking back at my experience, I’m very thankful to have taken the opportunity to share the knowledge with the people around. I felt nervous at first but those feelings went away after some time and everything was smooth afterward.

Peace

Persist Private IP in AWS Auto Scaling Group

Recently, we were moving our web application, which was on a single EC2 instance, to a highly available and fault-tolerant architecture. For that, we decide to pre-bake AMI and launch it with in an auto-scaling group and attach it to the target group behind an elastic load balancer. We had another server which was in the same VPC but not in the target group behind ELB. Requirements were to access our web application (single EC2 instance we launched using ASG and attached to ELB) privately within VPC due to HIPAA regulatory compliance.

In other words, we needed a way to persist a private IP to EC2 instance in case of scale in or scale out event. We did some research and came up with the solution to create the instance launch life cycle event. Capture the event using AWS CloudWatch Rules, and use AWS Lambda as a target to attach secondary ENI to new launched EC2 Instance in Auto Scaling Group.

Thanks to AWS knowledge center, we were able to modify their solution as per our needs.

But How Does It Work? 🤔

We needed a way to give secondary ENI description to lambda so that it can attach specific ENI to the newly launched instance. Therefore, we decided to get the description of ENI from EC2 instance tag. For that, we created a launch configuration for the auto scaling group with the tag named Eth1. And, give it a value which in actual is the description of secondary ENI.

1. Create life cycle hook on EC2 instance launch event.

life cycle hook

2. Creating AWS Cloud Watch Rule with target as a lambda.

Event pattern with Lambda as a Target

Here is the code to Lambda

https://github.com/MrHassanMurtaza/aws-attach-secondary-eni-lambda

Feel free to contribute and change as per your needs.

|     

My Talk on GitHub and Open Source

A couple of months ago, I got approached by a student of COMSATS University, Computer Science Society  —  asking me to give a talk on GitHub. At first, I thought, there is no way I’m going to speak up in front of people. I never talked to a group of people for more than a couple of minutes. After much thought, I decided to take the opportunity. As an alumnus of COMSATS, it was the least I could do for the students.

The venue and the time that we decided was Seminar Hall at COMSATS Electrical Engineering Department on 23rd April 2019.

Deep down inside I, being an Open Source Developer, wanted to inspire students to start using GitHub and contribute to Open Source Community. I wanted to teach students what I didn’t know during my junior and sophomore years. I told them GitHub is the best portfolio they can have in 2019.

These were the topics that we covered in details:

  • Why Git?
  • What is Git?
  • Git Clone
  • Git Add
  • Git Commit
  • Git Status
  • Git Push
  • Git Pull
  • Merge Conflicts
  • Git Log
  • Git Reset
  • Branching
  • Git Merge
  • Pull Requests

In the end, we did a small activity in which students contributed to my GitHub repository by creating a pull requests  — which I merged later on.

Moments To Remember

Looking back on my experience, I’m very thankful to have taken the opportunity to speak. I learned a lot during the preparation on talk and looking forward to more such opportunities. If you’ll ask me, I was nervous? Hell, Yes. I barely slept that night due to stress. But again, if you are doing something for the very first time, it is like getting yourself out of comfort zone. And, this is where you do wonders.

Thanks to Wajahat Ali Abid for helping me out during preparation and then giving the talk. And, special thanks to Hadia Jalil, who herself is a very good iOS developer, for all the support and helping me in making this event a successful one.

Peace ✌


👋Hey There!

I’ve been trying to write this post from some time now but never got a chance. It’s always hard to get started on writing especially if you’re doing it for the first time but let’s do this.

👦INTRODUCTION

In case you haven’t perceived yet, I’m Hassan Murtaza. I graduated from COMSATS University, Islamabad last year (2018). Currently, I’m a DevOps Engineer at OneByteLLC. I credit myself for being hardworking, resilient, goal-oriented and focused. I love spending time with my family and, friends mean a lot to me.

⚡AWS CLOUD ENGINEER

Nowadays, most of my time spent on learning and building automation workflows on AWS Cloud. Building CI/CD pipelines using AWS code services such as code pipeline and infrastructure automation using CloudFormation are my favorite recipes . Doing automation in the cloud using serverless architecture (Lambda) fascinates me a lot. In short, right now I’m managing a cloud architecture with more than a ton of resources such as EC2, RDS, SNS, Lambda, etc and looking forward to growing from here.

🔥DEVOPS ENGINEER

Because of my interest in CI/CD pipelines, I’ve gone above and beyond and learned tools such as Terraform to do infrastructure provisioning and Jenkins to automate software delivery process. I have set up an Elastic Stack to monitor the logs from a fleet of EC2 Instances. I work closely with a team of developers and promote DevOps culture.

🎩OPEN SORCERER (pun intended :D)

The whole concept behind open source intrigues me a lot. Like how you can help the people around the world and make a difference through your knowledge. Special thanks to Ahmad Awais for being a huge motivation for me when it comes to open sourcing new and awesome stuff 💯. I regularly open source stunning stuff on serverless architecture and AWS services. Feel free to follow me here MrHassanMurtaza

🏃‍♂️HIKER

I believe there are three things about your life that you should always prioritize. And, these are health, diet, and exercise.
I love hiking. I do it every once in a while to get some time out of my busy routine and refocus on my goals and progress. Other than that, I do running every morning to keep myself energetic for the whole day.

✉️GET IN TOUCH

Get in touch with me on LinkedIn, Twitter, Facebook, Instagram, or shoot me an email.