Lambda Function: ProcessSilverToGold
The ProcessSilverToGold
function is the final step in the ETL pipeline. It transforms the normalized Silver Layer dataset into analytical outputs in the Gold Layer, enabling rich insights and data exploration.
This function is responsible for:
- Reading structured CSV data from the Silver bucket.
- Performing a series of analytical transformations using
pandas
. - Storing insights and aggregated results as separate CSV files in the Gold bucket.
How It Works
Reads environment variables defined by the SAM template:
S3_BUCKET_SOURCE
S3_BUCKET_TARGET
MAX_RETRIES
BASE_DELAY_SECONDS
Triggered when the normalized dataset is created under the Silver bucket.
Loads the file
silver/movies_normalized.csv
using theS3Service
.Executes the
SilverToGoldProcessor
, which performs the following transformations to generate analytical datasets:- Top 10 Movies by IMDb Rating
- Movies by Genre
- Movies by Country
- Movies per Year
- Total Box Office Revenue per Year
- Top 5 Directors by Movie Count
Saves each output as a CSV file in the Gold bucket under the
gold/
prefix.
Defined by AWS SAM
This Lambda function is defined using the AWS Serverless Application Model (SAM) in the template.yaml
file.
AWS IAM Role
The function uses a scoped IAM role to perform its actions securely, following the principle of least privilege, allowing:
- Write access to silver bucket (
silver/*
) - Read access to gold bucket (
gold/*
)
Permissions granted include:
- Amazon S3
s3:ListBucket
on the Silver buckets3:GetObject
tosilver/*
on Silver buckets3:PutObject
togold/*
on Gold bucket
For full provisioning details, see the Infrastructure as Code (IaC) page.
Trigger
This function is triggered by Amazon S3 events whenever any file is created in the Silver bucket by the ProcessBronzeToSilver
lambda function.
This event-driven behavior ensures a chained and automated execution of the pipeline: once the normalized data is written to the Silver layer, this function is automatically invoked to generate the analytical Gold layer outputs — no manual intervention required.
Source Code
You can view and explore the full implementation here: