AI Lab Report

Exploring Demographic Bias in AI-Generated Occupational Imagery: 

A Study on Craiyon

Heriberto Cabral, Sarah Lee, Justin Mei, Qiuming Wang

ENGL 21007: Writing for Engineering

Professor Pamela Stemberg

November 11, 2024

Abstract

This study examines gender, age, and racial biases in AI-generated occupational images using Craiyon AI. The experiment generated 400 images across four occupations (fashion designer, chef, janitor, and postal worker), comparing the demographics depicted in the images with real-world data on Zippia.com. Findings revealed significant gender bias, with a notable male overrepresentation in traditionally male-dominated roles, and an age bias skewing towards younger depictions. Racially, Craiyon showed a “White default,” with limited diversity for certain roles. These findings are consistent with previous studies, highlighting persistent demographic biases in AI-generated images.

Introduction

The rapid growth of artificial intelligence (AI) in image generation has led to its increasing integration in media, advertising, and professional applications. Despite this progress, AI models have been shown to reproduce and even exacerbate societal biases, particularly related to gender, race, and age. Several studies suggest that AI-generated images, rather than presenting a diverse or unbiased view, often reflect entrenched stereotypes in ways that may impact how various demographics perceive occupational roles. 

Studies have consistently shown that gender, race, and age biases are prevalent across various AI image generation models, including popular tools like DALL-E 2, Midjourney, and Stable Diffusion. For instance, Ali et al. (2024) found that image generation models tend to depict highly stereotyped portrayals in professional fields, with Midjourney and Stable Diffusion overwhelmingly representing surgeons as White and male. DALL-E 2 performed slightly better but still fell short of accurately depicting demographic diversity among surgeons, particularly underrepresenting women and non-White individuals in surgical roles. 

Gender bias in AI-generated images has also been noted in other studies. Currie et al. (2024) found that in response to prompts depicting “typical” medical students, DALL-E 3 generated 92% male images, despite the fact that more than half of Australian medical students are female. And similarly, García-Ull and Melero-Lázaro (2023) reported that AI models like DALL-E 2 frequently depicted certain roles with rigid gender stereotypes, showing women almost exclusively in roles like nurses and teachers, while men were shown in technical fields such as engineering and piloting. This suggests that the model’s training data may reflect historical and cultural biases that continue to associate specific professional roles with particular gender.

In addition to gender bias, racial biases are common in AI image generation. Studies indicate a “White default” effect, where models frequently default to depicting White individuals for neutral prompts. Park (2024) found that, unless race was specified, DALL-E 2 and similar models would predominantly generate White individuals for professional roles, thereby reinforcing a Western-centric view of occupation.. The concept of a “White default” highlights the challenges of developing AI that respects and reflects true diversity, as models often project Whiteness as the societal standard (Park, 2024). 

Although previous studies have not focused much on age bias, it can still be found in the study of García-Ull et al. that DALL-E 2 (Craiyon’s init

al model) frequently depicts younger individuals, particularly in professions traditionally associated with women, such as education, nursing, and service roles.

Experiment Hypothesis

Building on the findings from these studies, this research hypothesizes that the Craiyon AI model, when tasked with generating images for the occupations “fashion designer”, “chef”, “janitor”, and “postal worker” will exhibit biases in line with societal stereotypes regarding gender, race, and age. Specifically, it is expected that:

Gender: The model will depict certain occupations as predominantly male or female, with “chef” and “janitor” likely to be male-dominated, while “fashion designer” may be predominantly female.

Race: A White default will likely be evident, with most images for these occupations featuring White individuals.

Age: Youth may be overrepresented in depictions, even for professions with significant age diversity like “janitor,”.

Methods

Our study investigates potential gender, age, and race biases or stereotypes in AI-generated images of individuals in four distinct occupational roles. The Craiyon AI model was used to generate images based on four neutral occupational terms: “fashion designer,” “chef,” “janitor,” and “postal worker.” The demographic characteristics—gender, race, and age—of the individuals depicted in the generated images were analyzed. The results were compared with real-world demographic data from Zippia.com, which provided global workforce statistics for the same occupations, to determine if any noticeable biases or trends existed in the Craiyon-generated images.

Procedure

  • Selected four neutral occupational terms: “fashion designer,” “chef,” “janitor,” and “postal worker.”
  • Used Craiyon AI to generate 100 images for each of the four occupations, resulting in 400 images.
  • Analyzed the images for demographic characteristics: gender, race, and age.
  • Collected real-world global demographic data for each occupation from Zippia.com.
  • Compared the AI-generated demographic data with the real-world statistics.

Results

Fashion Designer

Table 1

Demographic characteristics of fashion designer

Demographic characteristicsExperimentalData from Zippia (global)
Gender %%
  Female8982.67
  Male817.33
  Non-Binary3NA
Age%M
35 or younger4634
35 or older24
Race%%
  White6263.04
  Black or Africa157.34
  Asian1111.65
  Latino or Hispanic1211.9
  American Indian and Alaska NativeNA0.51

Gender

In terms of gender distribution, the Craiyon-generated images leaned heavily toward female fashion designers, with 89% of images representing females, 8% for males, and 3% for non-binary. Comparatively, Zippia’s data reflects a workforce with 82.67% females and 17.33% males, with a missing classification for non-binary individuals.

It’s worth noting that the judgment of individuals classified as non-binary (3%) was influenced by our visual ambiguity in the generated images, where certain facial or body structure features combined traditionally masculine and feminine characteristics. Given Craiyon’s generation process, it likely lacks explicit representation mechanisms for non-binary individuals and instead produces a mix of male and female traits due to randomness or limitations in training data which implies that the data may not accurately reflect the the true demographic characteristics of the fashion designer workforce.

The overrepresentation of females in Craiyon’s results suggests that the image generation model may amplify gender trends present in the real-world data, possibly due to stereotypical associations of fashion design with femininity.

Additionally, the gender data from Zippia, which only includes male and female  categories, presents a significant limitation in analyzing gender representation, especially given the growing recognition of fluid and non-binary identities and the fact that the fashion industry has historically embraced gender fluidity more than many other sectors.

Age

The age distribution in Craiyon’s output also demonstrated a slight skew toward younger fashion designers. Craiyon generated 76% of images representing designers under 35, with only 24% over 35. The average age of a fashion designer, according to Zippia, is approximately 34 years, suggesting a balanced representation across age groups. Craiyon’s model, however, shows a preference for younger designers, which might be a reflection of popular culture and media portrayals that often associate fashion with youth. This could imply a minor bias in age depiction, although it somewhat aligns with Zippia’s average age.

Race

Craiyon’s racial distribution for fashion designers is relatively consistent with Zippia’s statistics, showing 62% White, 15% Black or African American, 11% Asian, and 12% Latino or Hispanic, with American Indian and Alaska Native designers underrepresented. Zippia’s data indicates 63.04% White, 7.34% Black or African American, 11.65% Asian, and 11.9% Latino or Hispanic, with a small percentage for American Indian and Alaska Native designers (0.51%). Our results slightly overrepresent Black or African American designers which suggests that Craiyon may be responsive to diversity.

The lack of American Indian and Alaska Native identification in our categories simply stem from difficulty in recognizing these traits visually. Thus, this absence in the data might not indicate an actual lack of representation but rather a limitation in our ability to classify.

General Conclusion

The experiment indicates that Craiyon’s image generation for fashion designers tends to reflect certain demographic biases, though it showed closer alignment with global workforce statistics from Zippia.

Chef

Table 2

Demographic characteristics of chef

Demographic characteristicsExperimentalData from Zippia (global)
Gender %%
  Female225.2
  Male9874.8
Age%M
35 or younger3041
35 or older70
Race%%
  White4555
  Black or AfricaNA10.3
  Asian5211.4
  Latino or Hispanic317.1
  OthersNA6.2

Gender

In terms of gender distribution, Craiyon generated images lean heavily male being chef’s. A huge difference can be spotted, 98% presenting male, 2% for female. Meanwhile, Zippia’s data reflects a workforce of 25.2% females and 74.8% males. The overrepresentation of males in Craiyon can suggest  that the image generation model may have collected datasets that heavily favors males being chefs over females. 

Age

According to Craiyon, 70% of chefs are over the age of 40, while 30% are under 40. Comparing it to the data provided on Zippia, the average age of a chef is 40 years and up. This shows that both data are similar to one another, thus not having as much of a bias towards Age. 

Race

The outcome for Race on Craiyon was completely different than it was on Zippia. According to Ai, 52% of chefs are Asian, 45% White, and 3% Latino or Hispanaic. It did not provide any data for other races. Meanwhile on Zippia, most chefs were actually white, leading with 55%. 11.4% were Asians, 17.1% Latino or Hispanic, 10.3% Black or African American, and 6.2% Others. Here we can see two totally different outcomes. Crayon favors more chefs being Asian and White, while Zippia shows that it is actually more diverse. From this we can say that Craiyon has bias when dealing with race, and lacks diversity. 

General Conclusion

The experiment indicates that Craiyon’s image generation for chef’s tends to reflect certain demographic biases, such as race and gender. However, it aligns with other data sets like Zippia, when dealing with age. 

Janitor

Table 3

Demographic characteristics of janitors

Demographic characteristicsExperimentalData from Zippia (global)
Gender %%
  Female1236.9
  Male8863.1
Age%M
35 or younger9251
35 or older8
Race%%
  White10058.5
  Black or AfricaN/A11.6
  AsianN/A4.4
  Latino or HispanicN/A20.1
  American Indian and Alaska NativeN/A1.3
  UnknownN/A4.1

Gender

Craiyon generated images depicting janitors as 88% male and only 12% female. In reality, according to Zippia, 36.9% of janitors globally are female, while 63.1% are male. Craiyon exaggerates the presence of males and downplays females by a significant margin.

This disparity suggests a gender bias in the generated images, reinforcing traditional gender stereotypes by over-associating the janitorial role with men.

Age

Craiyon generated images representing 92% of janitors as 35 or younger, with only 8% depicted as older. In contrast, Zippia data shows the average age of a janitor is 51, indicating that the real-world janitorial workforce is significantly older on average. The AI-generated images fail to reflect this, instead depicting janitors as overwhelmingly young. Craiyon’s underrepresentation of older workers indicates a strong age bias, likely arising from skewed or insufficient training data. 

Race

Craiyon generated images exclusively depicting White individuals as janitors, with no representation of the other races while Zippia data shows that janitorial work is racially diverse: 58.5% White, 11.6% Black, 4.4% Asian, 20.1% Latino or Hispanic, 1.3% American Indian and Alaska Native, and 4.1% of Unknown race.

This lack of racial diversity in Craiyon’s outputs reveals a clear racial bias, excluding nearly half of the racial groups represented in the actual workforce.

General Conclusion

The generated images of janitors reveal significant biases in all demographic characteristics.Compared to real-world demographics, the generated images suggest a heavily skewed perception of janitors.

Postal Worker

Table 4

Demographic characteristics of postal worker

Demographic characteristicsExperimentalData from Zippia (global)
Gender %%
  Female1742.7
  Male6357.3
  Could not be determined20NA
Age%M
35 or younger4251
35 or older38
Could not be determined20
Race%%
  White6160.1
  Black or Africa413.4
  Asian59.3
  Latino or Hispanic1013.1
  American Indian and Alaska NativeNA0.3
  Could not be determined203.8

Gender

With regard to gender distribution, the Craiyon-generated images learned immensely  toward male postal workers, with 17% of images representing females, 63% for males, and 20% for non-binary. In comparison, Zippia’s data reflects a workforce with 42.7% females and 57.3% males.

Many of the images that were classified as “Could not be determined” (20%) was due to the AI creating images of animals, workers without a face, or images that had a mix of female and male characteristics.

Age

The age distribution in Craiyon’s output presented a slight bias toward younger postal workers. Craiyon generated 42% of images representing postal workers under 35, 38% over 35, and 20% that could not be determined for their age . The average age of a postal worker, according to Zippia, is approximately 51 years, suggesting an overrepresentation of younger groups on Craiyon. This could imply a bias in age depiction since there were many young postal workers despite the average age of postal workers actually being 51 years old. 

Race

Craiyon’s racial distribution for postal workers is relatively consistent with Zippia’s statistics, showing 61% White, 4% Black or African American, 5% Asian, and 10% Latino or Hispanic, with American Indian and Alaska Native designers underrepresented. Zippia’s data indicates 60.1% White, 13.4% Black or African American, 9.3% Asian, and 13.1% Latino or Hispanic, with a small percentage for American Indian and Alaska Native designers (0.3%). Our results slightly underrepresented Black or African American postal workers and Asian postal workers which suggests that Craiyon may be responsive to diversity.

The lack of American Indian and Alaska Native identification in our categories stem from difficulty in recognizing these traits visually. Thus, this absence in the data might not indicate an actual lack of representation but rather a limitation in our ability to classify.

General Conclusion

The experiment indicates that Craiyon’s image generation for postal workers tends to reflect certain demographic biases, younger age and male bias. However, it did show a closer alignment with global workforce statistics from Zippia for race. In general, it was difficult to get a clear alignment between the results from Zippia and the generated images since 20% of the images could not specifically be categorized for gender, age, or race. 

Conclusion

Interpretation of Results

Our findings largely supported the initial hypothesis, as Craiyon displayed notable gender biases by predominantly depicting chefs and janitors as male, while fashion designers were mostly portrayed as female. This result aligns with findings by Currie et al. (2024) and García-Ull and Melero-Lázaro (2023), who observed that AI image models often reinforce traditional gender roles and occupational stereotypes. In terms of race, Craiyon’s outputs largely reflected a “White default,” particularly in images depicting janitors, mirroring Park’s (2024) findings on the challenges AI models face in diversifying representations of race. This racial bias, where White is often the default for neutral prompts, suggests that Craiyon’s training data may underrepresent racial diversity, further reinforcing societal stereotypes in occupational roles.  

The model also demonstrated an age bias, with a tendency to depict younger individuals across 4 occupations in our study. This aligns with past observations that AI-generated images may present a skewed, youthful demographic, likely influenced by cultural associations of certain professions with youth. The presence of these biases underscores the feedback loop effect, as described by García-Ull and Melero-Lázaro (2023), where models trained on biased data reinforce societal stereotypes, thereby perpetuating them in AI-generated outputs. 

Challenges Encountered

Several challenges surfaced during the study that impacted data interpretation and reliability. First, Craiyon’s outputs often contained visual ambiguity, making it difficult for researchers to consistently categorize demographic characteristics like gender and age. This required us to rely on subjective judgment, introducing a potential cognitive bias that might affect demographic identification accuracy. Furthermore, the use of neutral occupational prompts led to unintended outputs, such as animals or symbolic elements, rather than human depictions, which compromised data purity and consistency. The classification for demographic characteristics also presented challenges. The classifications on Zippia are limited and often not inclusive enough, which may not truly reflect the true demographic characteristics for us to compare.

Future Recommendations

To reduce bias in AI image generation, several research directions are recommended. First, implementing bias mitigation techniques, such as adversarial training methods that can reduce gender and racial biases (O’Connor & Liu, 2024), could improve model accuracy and inclusivity. Collaboration between AI researchers, sociologists, and policymakers can help develop cross-disciplinary approaches that account for social and cultural context, promoting more inclusive AI outcomes. Additionally, using stereotype-free datasets and diverse training models can help decrease the feedback loop effect in which AI models perpetuate existing societal biases (García-Ull & Melero-Lázaro, 2023).

References

Ali, R., Tang, O. Y., Connolly, I. D., Abdulrazeq, H. F., Mirza, F. N., Lim, R. K., Johnston, B. R., Groff, M. W., Williamson, T., Svokos, K., Libby, T. J., Shin, J. H., Gokaslan, Z. L., Doberstein, C. E., Zou, J., & Asaad, W. F. (2024). Demographic representation in 3 leading artificial intelligence text-to-image generators. JAMA Surgery, 159(1), 87–95. https://doi-org.ccny-proxy1.libr.ccny.cuny.edu/10.1001/jamasurg.2023.5695

Currie, G., Currie, J., Anderson, S., & Hewis, J. (2024). Gender bias in generative artificial intelligence text-to-image depiction of medical students. Health Education Journal, 83(7), 732–746. https://doi-org.ccny-proxy1.libr.ccny.cuny.edu/10.1177/00178969241274621

García-Ull, F.-J., & Melero-Lázaro, M. (2023). Gender stereotypes in AI-generated images. El Profesional de La Información, 32(5), 1–12. https://doi-org.ccny-proxy1.libr.ccny.cuny.edu/10.3145/epi.2023.sep.05

O’Connor, S., & Liu, H. (2024). Gender bias perpetuation and mitigation in AI technologies: Challenges and opportunities. AI & Society, 39(4), 2045–2057. https://doi-org.ccny-proxy1.libr.ccny.cuny.edu/10.1007/s00146-023-01675-4

Park, Y. S. (2024). White default: Examining racialized biases behind AI-generated images. Art Education, 77(4), 36–45. https://doi-org.ccny-proxy1.libr.ccny.cuny.edu/10.1080/00043125.2024.2330340

Craiyon. (n.d.). Your free AI image generator tool: Create AI art!. Retrieved November 4, 2024, from https://www.craiyon.com/

Zippia. (n.d.). How to become a fashion designer: What it is and career path. Retrieved November 4, 2024, from https://www.zippia.com/fashion-designer-jobs/

Zippia. (n.d.). How to become a chef: What it is and career path. Retrieved November 4, 2024, from https://www.zippia.com/chef-jobs/

Zippia. (n.d.). How to become a janitor: What it is and career path. Retrieved November 4, 2024, from https://www.zippia.com/janitor-jobs/

Zippia. (n.d.). How to become a postal worker: What it is and career path. Retrieved November 4, 2024, from https://www.zippia.com/postal-worker-jobs/

Self Reflection

Justin – As I was conducting research for the report, I realized that most of the data we gathered from Craiyon aligned with what we had for our hypothesis. Every so often we saw biases leaning towards race, while for other professions we saw biases leaning towards gender. On a few occasions the data we got from the AI were similar to those found on Zippia. This was for data regarding age. We can assume that Craiyon uses datasets from biased sources, and that the only way it could be prevented is if they collect datasets from multiple sources to make a more diverse outcome.

Heriberto- While generating photos for the project I realized how biased AI can be, even though on Zippia it was more diverse and on Craiyon it was purely white people. This was a bit shocking as I expect even the tiniest bit of diversity. Although there was a lot of bias, especially towards white people in age, we did see some sort of  “middle” ground where it was more aligned with what Zippia had listed. Also I think our presentation went pretty well even though we were nervous to go and didn’t expect to go first, besides me stumbling a couple times my group made up for where I lacked.

Sarah-  Approaching my AI image project, I recognized the need for self-awareness. I likely brought my own biases into the project because my perspectives, experiences, and beliefs could influence how I perceived and evaluated the AI-generated images. This dual bias consideration compelled me to reflect on both the AI’s limitations and my own subjective lens. As I examined the images, I questioned whether my bias affected my judgment of the bias of the images. I hope I can reduce my own bias for similar experiments in the future. 

Qiuming – As I was working on this lab , I was thinking about a question: Is AI bias a completely bad thing? Just as “bias” can be a positive or negative term. Isn’t it good for AI to reflect a situation close to reality? Unless the reality is that there is moral injustice caused by oppression and exploitation, then we really need to be more vigilant about the feedback loop effect brought by the AI ​​model.

To what extent will an “opinion” be considered “bias”, and what standards do we rely on to judge whether a “bias” is good or bad? How much tolerance can be given to subjective factors? I was very confused about the role of philosophy in AI development before, and doing this project may have answered some of my doubts.

AI Photos

Fashion Desinger

Janitor AI photos

Chef AI Photos

Postal Worker AI Photos