Amazon Health Services
Project: Lifestyle Image Tool (LIT)
Roles: Lead Design Technologist
Year: 2025
Description: Concept and development of an AI application for the generation and modification of lifestyle images used in large-scale healthcare marketing experiments.
Problem Statement
Marketing managers for Amazon Health Services (AHS) needed to conduct high-volume, data-driven marketing experiments quickly. Their workflow from creative brief, production and creative review took two to four weeks. Optimizing the workflow required a creative and easy solution to enable marketing managers to produce and modify high-quality lifestyle images.
Ideation and Design
The ideation and design process started by talking to the end users and understanding their goals. We considered leveraging existing design tools from Figma, Adobe and others but that approach required training, time, security approvals and was ultimately less cost-effective. Once the stakeholders and I aligned on prototype I worked backwards from their goals and requirements to determine the technical stack. I began to document the workflows, acquire use cases and draft engineering designs.
Goal 1: Produce multiple lifestyle images that match the style and tone of our studio and stock photography
Idea: Use text-to-image and text-image-to-image prompts to generate images from multiple instructed models in parallel
For creative image generation we look for variety in generative models, just like a brand team would look for variety in photographers. Some models respond to certain prompts better than others. By showing six or more model responses we enable the type of choice expected in the creative production process.
Goal 2: Enable modification of existing lifestyle images that are already pre-approved for marketing
Idea: Manual and automatic image inpainting from structured prompts with brand guidelines as context
Some existing images were well suited for background replacement. By leveraging the primary and secondary brand color palettes we would be able to exponentially scale the number of new images that could be used for experimentation.
High-quality image modification often requires detailed system instructions behind the scenes. Because end-users may not have the knowledge or ability to do this, our prototype could abstract this process and present a range of subtle variants for them to choose from.
Goal 3: Optimize the production and review process for AI generated lifestyle images
Idea 1: Generate quality, brand and policy compliance scores based on real-time LLM evaluations to speed up reviews
Idea 2: Include basic editing tools for end-users who lacked training or access to design programs
Each generated image is evaluated by an LLM using a managed prompt with quality, brand and policy guidelines from a knowledge base as context. The response includes scores for each criteria, an overall score and short reasoning summary to help end users determine appropriate images. These scores also help accelerate the human review.
Some ad formats that are run on Amazon channels require specific image crops, rounded corners and gradient overlays. During ideation we determined that offering editing features within the application would reduce production time by up to four days.
While these edit features do not require AI, they resolved a pain point by making small design tasks easier for marketing managers without training.
Research and Development
Real-time image evaluation and scoring in the application was inspired and influenced by recent research. Specifically I leveraged the "LLM as Judge" concept to develop algorithms and workflows to derive evaluation scoring. I also took inspiration from a research presentation from Pierre Boyeau, where he discussed how to evaluate the performance of human-algorithm systems.
This work provides statistical evidence that a human-algorithm judgement system can be more accurate than human review alone. An additional part of the workflow not shown utilizes Amazon SageMaker Ground Truth to collect human evaluations on generated images.
A high-level architectural diagram above shows an engineering pattern with an AWS lambda serverless function for each generative model. This pattern enables scalable, parallel processing with set and forget calls and front-end polling. The lambdas store images on S3 and write inference responses to a dynamoDB table that triggers the evaluations. The tech stack is React.JS for the front-end with Material 3 as the design system for the UI. APIs via AWS APIGateway connects the front-end to back-end which uses open-source models from Qwen and Flux as well as models on Amazon Bedrock.
The real power of AI applications is how one or more structured prompts can be triggered at the click of a button. In the example below we trigger prompts that automatically generate a range of background scene variations. All the end-user has to do is upload an image, select drop-down menu items and generate without prompting.
Prototype and Results
The video below shows the application generating images from six different models based on one user prompt. As each image is generated an evaluation process on the image is triggered. This is represented by the scores. Where scores are pending the evaluation process hasn't completed yet.
The video below shows the application generating new images with different background colors from a source image. The prompts to change the background to various hex values is abstracted from the user. All the user has to do is use the drop down and click the generate variants button.
Methodology
This project required rapid prototyping so I engineered the back-end and relied on an AI coding assistant for the front-end.
The cloud architecture was configured to be plug and play to swap different GenAI models in and out based on capability and availability. The engineering was solid enough to easily scale the prototype from five to fifty end-users.
With React.js and Material Design 3 my Cline coding assistant powered by Claude Sonnet 4.x was able to quickly build and test the UI based on my UX instructions.
My AI managment methodology was to trust but verify, with the assistant writing extensive tests and documentation.
Technical Challenges
Creating short video from a static image was pretty high on the stakeholder wishlist. The availability and scalability of high-quality and approved models was the only technical challenge that went unsolved by the time the prototype launched. As a mitigation we planned a small data collection to fine-tune with an internal model.


Inpainting models that matched our exact hex color values were often not good at preserving shadows. The models that were good at preserving shadows often didn't match our hex color values. I ended up with an ensemble approach using the strengths of the respective models collectively.


Learnings
-
One model isn't enough. It takes two or more for most creative tasks
-
Latency matters in AI applications. Parallelization and polling patterns help accomplish this
-
Human end-users appreciate surfacing the context and reasoning behind generative outputs, especially for creative tasks
-
Patience pays off. Model performance improves so fast that this has to be a factor in planning
–