Amazon Web Services

Product: Responsible AI Programs
Roles: Technical Program Manager
Year: 2022-2025
Description: The Responsible AI team supports and evaluates all generative and classic AI services within AWS.

Goals

  • Help improve the performance of AI services with respect to responsible AI dimensions such as fairness, veracity and bias.
  • Build world class datasets to support evaluations across all model modalities.
  • Earn customer trust with transparnecy documentation such as AI service cards in partnership with applied scientists.

Technical Programs for Responsible AI

AWS AI SERVICE CARDS

I managed the programs to produce the scientific evaluations and performance metrics that helped customers make informed decisions when choosing AWS AI services. Some of the programs that I managed and cards that I authored include: Amazon Q Business, AWS HealthScribe and Amazon Titan Premier.

Producing just one service card required coordinating the efforts of engineers and applied scientist over three to four months. The drafted cards were then reviewed with principle scientists, senior science managers, lawyers and service general managers.


EVALUATION DATASETS

AI systems cannot be properly evaluated without the appropriate datasets. These datasets were defined by myself and the scientists based on the hypotheses of error disparities and an independent risk assessment. My role included the design and production the datasets. Typically several datasets were in production simultaneously with help of contracted data vendors.

Coordinating the efforts of teams of scientists, engineers and data vendors requires an efficient workflow. After two years and dozens of datasets produced the team needed to move faster. I modeled our existing workflow and designed an optimization that allowed us to double our production.


MANAGING RESOURCES

Resource management of the technical programs for responsible AI starts by coordinating the various evaluation datasets in production. Gantt charts helped manage those production workstreams with the science and engineering resources that were available.


In addition to managing vendors who sourced data there were also vendors that performed quality analysis on the data. Selection and managment of these QA vendors was a high priority in that it helped our data collections meet a high bar of quality. If for example we were checking the quality of a speech audio collection in British English, we would craft qualification tests across the various speech dialects (in this case regional) and determine the best QA workers by dialect. I used Amazon QuickSight dashboards to track and report these metrics to leadership.

The QA workstreams were actively tracked for worker completion time, performance, consensus and overall cost. This allowed us to make realtime adjustments when needed and replace underperforming workers.


CUSTOMER IMPACT

Within AWS a considerable amount of effort was dedicated to ensuring customer trust in their use of our AI services. The AWS Responsible AI webpage experienced high traffic with the launch of each new AI service card.

During my time on the team, we received numerous requests from AI service product managers and customers for AI service cards. The first AWS AI cards were released in the fall of 2022 before other competitors were releasing any equivalent transparency documentation. To date AWS has put its customers first by publishing the most comprehensive collection of cards that feature scientifically rigorous performance metrics for key AI services with respect to responsible AI dimensions.

an image from a AWS community blogger's profile and a quote of his text