Designing with Simplicity in Mind

Before jumping into development, the first and most important step was to deeply understand the requirements. The goal wasn’t just to make it work — but to design the simplest possible solution that is reliable, scalable, and easy to maintain.

The initial requirements were clear:

Ingest a list of the top 250 movies from an S3 source
Filter the top 10 entries
Enrich each movie with metadata from an external API
Store the final, enriched data back in S3

Core Architecture: The Minimal Viable Flow

To meet these goals effectively, I designed a minimal yet complete serverless data pipeline, composed of two core components:

1. `GetMoviesAndSendToQueue` (Lambda Function)

This function is responsible for:

Reading the original dataset from a public S3 bucket
Filtering the top 10 movies
Sending each movie as a message to an Amazon SQS queue

This approach ensures asynchronous processing and decouples ingestion from enrichment.

2. `EnrichAndStoreMovie` (Lambda Function)

Triggered by each message in the queue, this function:

Calls the OMDb API using the IMDb ID
Enriches the movie data with additional details
Stores the final, enriched movie JSON object into a designated S3 bucket

Key Benefits of the Core Design

Simplicity: Just two Lambda functions and an SQS queue
Scalability: Fully event-driven and asynchronous
Extendable: Serves as the foundation for more advanced features (e.g., medallion layers, scheduled batches, BI)

Designing with Simplicity in Mind ​

Core Architecture: The Minimal Viable Flow ​

1. GetMoviesAndSendToQueue (Lambda Function) ​

2. EnrichAndStoreMovie (Lambda Function) ​

Key Benefits of the Core Design ​

Designing with Simplicity in Mind

Core Architecture: The Minimal Viable Flow

1. `GetMoviesAndSendToQueue` (Lambda Function)

2. `EnrichAndStoreMovie` (Lambda Function)

Key Benefits of the Core Design