Overview: IMDb Serveless ETL

Project Context

The goal is to build a serverless pipeline that fetches movie data, enriches it via an external API, and stores the final results. The initial data source is a list of the top 250 movies by IMDb rating, from which we focus on extracting and processing only the top 10 movies.

The project is implemented in Python, leveraging AWS Free Tier services to ensure cost-effectiveness and scalability.

Pipeline Architecture

Technology Stack

Language: Python
AWS Lambda: Serverless compute functions
Amazon S3: Storage for raw and enriched data
Amazon SQS: Message queue for orchestration
AWS SDK (Boto3): Python SDK for AWS integration
External API: OMDb API for data enrichment

Essential Requirements

Function 1 — Data Retrieval and Filtering (`GetMoviesAndSendToQueue`)

Read the JSON file containing the top 250 movies from the S3 bucket:
https://top-movies.s3.eu-central-1.amazonaws.com/Top250Movies.json
Filter and extract the top 10 movies.
Send a message containing these movies to an SQS queue to trigger the next step.

Function 2 — Enrichment and Storage (`EnrichAndStoreMovie`)

Triggered by messages from the SQS queue (one message per movie).
For each movie, call the OMDb API endpoint:
https://www.omdbapi.com/?apikey=[your_key]&i=[IMDb_ID]
Enrich the movie object with the returned API data.
Store the enriched movie JSON object in an S3 bucket (to be created).

Desired (Bonus) Requirements

Schedule the pipeline to run once daily (e.g., using AWS EventBridge).
Define the entire infrastructure as code (IaC), e.g., using AWS SAM.
Securely manage the OMDb API key (e.g., with AWS Secrets Manager).
Provide clear documentation for the project.

What's Next?

Now that we understand the purpose, context, and core capabilities of the project, it's time to dive deeper into how everything works under the hood.

In the next section, we'll explore the full architecture behind the pipeline — including the serverless components, data flow, and key design decisions that power this solution.

Continue to Architecture →

Overview: IMDb Serveless ETL ​

Project Context ​

Technology Stack ​

Essential Requirements ​

Function 1 — Data Retrieval and Filtering (GetMoviesAndSendToQueue) ​

Function 2 — Enrichment and Storage (EnrichAndStoreMovie) ​

Desired (Bonus) Requirements ​

What's Next? ​

References ​