Kaza Razat - Amazon Health Services

Amazon Health Services

Project: Lifestyle Image Tool (LIT)
Roles: Lead Design Technologist
Year: 2025

Description: Concept and development of an AI application for the generation and modification of lifestyle images used in large-scale healthcare marketing experiments.

Problem Statement

Marketing managers for Amazon Health Services (AHS) needed to conduct high-volume, data-driven marketing experiments quickly. Their workflow from creative brief, production and creative review took two to four weeks. Optimizing the workflow required a creative and easy solution to enable marketing managers to produce and modify high-quality lifestyle images.

Ideation and Design

The ideation and design process started by talking to the end users and understanding their goals. We considered leveraging existing design tools from Figma, Adobe and others but that approach required training, time, security approvals and was ultimately less cost-effective. Once the stakeholders and I aligned on prototype I worked backwards from their goals and requirements to determine the technical stack. I began to document the workflows, acquire use cases and draft engineering designs.

Goal 1: Produce multiple lifestyle images that match the style and tone of our studio and stock photography
Idea: Use text-to-image and text-image-to-image prompts to generate images from multiple instructed models in parallel

For creative image generation we look for variety in generative models, just like a brand team would look for variety in photographers. Some models respond to certain prompts better than others. By showing six or more model responses we enable the type of choice expected in the creative production process.

Goal 2: Enable modification of existing lifestyle images that are already pre-approved for marketing
Idea: Manual and automatic image inpainting from structured prompts with brand guidelines as context

Some existing images were well suited for background replacement. By leveraging the primary and secondary brand color palettes we would be able to exponentially scale the number of new images that could be used for experimentation.

High-quality image modification often requires detailed system instructions behind the scenes. Because end-users may not have the knowledge or ability to do this, our prototype could abstract this process and present a range of subtle variants for them to choose from.

Goal 3: Optimize the production and review process for AI generated lifestyle images
Idea 1: Generate quality, brand and policy compliance scores based on real-time LLM evaluations to speed up reviews
Idea 2: Include basic editing tools for end-users who lacked training or access to design programs

Each generated image is evaluated by an LLM using a managed prompt with quality, brand and policy guidelines from a knowledge base as context. The response includes scores for each criteria, an overall score and short reasoning summary to help end users determine appropriate images. These scores also help accelerate the human review.

Some ad formats that are run on Amazon channels require specific image crops, rounded corners and gradient overlays. During ideation we determined that offering editing features within the application would reduce production time by up to four days.

While these edit features do not require AI, they resolved a pain point by making small design tasks easier for marketing managers without training.

Research and Development

Real-time image evaluation and scoring in the application was inspired and influenced by recent research. Specifically I leveraged the "LLM as Judge" concept to develop algorithms and workflows to derive evaluation scoring. I also took inspiration from a research presentation from Pierre Boyeau, where he discussed how to evaluate the performance of human-algorithm systems.

This work provides statistical evidence that a human-algorithm judgement system can be more accurate than human review alone. An additional part of the workflow not shown utilizes Amazon SageMaker Ground Truth to collect human evaluations on generated images.

A high-level architectural diagram above shows an engineering pattern with an AWS lambda serverless function for each generative model. This pattern enables scalable, parallel processing with set and forget calls and front-end polling. The lambdas store images on S3 and write inference responses to a dynamoDB table that triggers the evaluations. The tech stack is React.JS for the front-end with Material 3 as the design system for the UI. APIs via AWS APIGateway connects the front-end to back-end which uses open-source models from Qwen and Flux as well as models on Amazon Bedrock.

The real power of AI applications is how one or more structured prompts can be triggered at the click of a button. In the example below we trigger prompts that automatically generate a range of background scene variations. All the end-user has to do is upload an image, select drop-down menu items and generate without prompting.

Prototype and Results

The video below shows the application generating images from six different models based on one user prompt. As each image is generated an evaluation process on the image is triggered. This is represented by the scores. Where scores are pending the evaluation process hasn't completed yet.

The video below shows the application generating new images with different background colors from a source image. The prompts to change the background to various hex values is abstracted from the user. All the user has to do is use the drop down and click the generate variants button.

Methodology

This project required rapid prototyping so I engineered the back-end and relied on an AI coding assistant for the front-end. The cloud architecture was configured to be plug and play to swap different GenAI models in and out based on capability and availability. The engineering was solid enough to easily scale the prototype from five to fifty end-users.

With React.js and Material Design 3 my Cline coding assistant powered by Claude Sonnet 4.x was able to quickly build and test the UI based on my UX instructions. My AI managment methodology was to trust but verify, with the assistant writing extensive tests and documentation.

Technical Challenges

Creating short video from a static image was pretty high on the stakeholder wishlist. The availability and scalability of high-quality and approved models was the only technical challenge that went unsolved by the time the prototype launched. As a mitigation we planned a small data collection to fine-tune with an internal model.

Inpainting models that matched our exact hex color values were often not good at preserving shadows. The models that were good at preserving shadows often didn't match our hex color values. I ended up with an ensemble approach using the strengths of the respective models collectively.

Learnings

One model isn't enough. It takes two or more for most creative tasks
Latency matters in AI applications. Parallelization and polling patterns help accomplish this
Human end-users appreciate surfacing the context and reasoning behind generative outputs, especially for creative tasks
Patience pays off. Model performance improves so fast that this has to be a factor in planning

Amazon Health Services

Project: Lifestyle Image Tool (LIT)Roles: Lead Design TechnologistYear: 2025

Description: Concept and development of an AI application for the generation and modification of lifestyle images used in large-scale healthcare marketing experiments.

Problem Statement

Ideation and Design

Goal 1: Produce multiple lifestyle images that match the style and tone of our studio and stock photography Idea: Use text-to-image and text-image-to-image prompts to generate images from multiple instructed models in parallel

Goal 2: Enable modification of existing lifestyle images that are already pre-approved for marketing Idea: Manual and automatic image inpainting from structured prompts with brand guidelines as context

Some existing images were well suited for background replacement. By leveraging the primary and secondary brand color palettes we would be able to exponentially scale the number of new images that could be used for experimentation.

High-quality image modification often requires detailed system instructions behind the scenes. Because end-users may not have the knowledge or ability to do this, our prototype could abstract this process and present a range of subtle variants for them to choose from.

Goal 3: Optimize the production and review process for AI generated lifestyle images Idea 1: Generate quality, brand and policy compliance scores based on real-time LLM evaluations to speed up reviews Idea 2: Include basic editing tools for end-users who lacked training or access to design programs

Research and Development

This work provides statistical evidence that a human-algorithm judgement system can be more accurate than human review alone. An additional part of the workflow not shown utilizes Amazon SageMaker Ground Truth to collect human evaluations on generated images.

Prototype and Results

The video below shows the application generating images from six different models based on one user prompt. As each image is generated an evaluation process on the image is triggered. This is represented by the scores. Where scores are pending the evaluation process hasn't completed yet.

The video below shows the application generating new images with different background colors from a source image. The prompts to change the background to various hex values is abstracted from the user. All the user has to do is use the drop down and click the generate variants button.

Methodology

Technical Challenges

Inpainting models that matched our exact hex color values were often not good at preserving shadows. The models that were good at preserving shadows often didn't match our hex color values. I ended up with an ensemble approach using the strengths of the respective models collectively.

Learnings

One model isn't enough. It takes two or more for most creative tasks

Latency matters in AI applications. Parallelization and polling patterns help accomplish this

Human end-users appreciate surfacing the context and reasoning behind generative outputs, especially for creative tasks

Patience pays off. Model performance improves so fast that this has to be a factor in planning

–

Project: Lifestyle Image Tool (LIT)
Roles: Lead Design Technologist
Year: 2025

Goal 1: Produce multiple lifestyle images that match the style and tone of our studio and stock photography
Idea: Use text-to-image and text-image-to-image prompts to generate images from multiple instructed models in parallel

Goal 2: Enable modification of existing lifestyle images that are already pre-approved for marketing
Idea: Manual and automatic image inpainting from structured prompts with brand guidelines as context

Goal 3: Optimize the production and review process for AI generated lifestyle images
Idea 1: Generate quality, brand and policy compliance scores based on real-time LLM evaluations to speed up reviews
Idea 2: Include basic editing tools for end-users who lacked training or access to design programs