Scientists tested AI's moral compass, and the results reveal a key blind spot

A recent study published in the Proceedings of the National Academy of Sciences suggests that large language models struggle to accurately estimate the moral values of people outside of Western societies. Scientists found that these artificial intelligence systems tend to overestimate the moral concerns of Western nations while underestimating the values of non-Western cultures. This pattern provides evidence that relying on these models to gauge global public opinion could unintentionally reinforce cultural stereotypes.

Large language models are sophisticated artificial intelligence systems trained on vast amounts of text data to generate human-like writing and answer complex questions. Popular examples include ChatGPT, created by OpenAI, and similar tools built by companies like Google and Meta. People increasingly use these models for communication, business, and even academic research.

Some academics have recently suggested using these models to simulate human participants in social science research. This idea relies on the assumption that the models possess an accurate understanding of diverse human populations. The researchers conducted this study to put that assumption to the test.

Mohammad Atari, an assistant professor of psychological and brain sciences at the University of Massachusetts Amherst, explained the team’s motivation. “We already know in moral psychology that people are not very good at judging other groups’ moral values,” Atari said. “Liberals often get conservatives wrong, and conservatives misread liberals in predictable ways.”

“With AI now playing a growing role in everyday life and even in scientific workflows, we asked a simple question: do these systems make the same kinds of accuracy errors?” Atari explained. “In other words, does AI ‘stereotype’ the moral values of different cultural groups?”

“That question matters because any bias built into these systems could quietly influence how information is generated, interpreted, and acted on,” Atari added. “If they do, those biases could shape research agendas, influence decision-making, and reinforce misunderstandings at scale.”

The authors wanted to see if these models actually understand global morality. Most of the text these artificial intelligence systems learn from originates from Western, Educated, Industrialized, Rich, and Democratic societies. In psychology, these societies are often referred to by the acronym WEIRD.

Because the training data is skewed heavily toward Western perspectives, the researchers suspected the models might generate biased estimations of right and wrong. If a model lacks sufficient information about certain cultures, it tends to fill in the gaps based on statistical patterns from its dominant training data. This process is very similar to human stereotyping, where limited exposure leads to overgeneralized beliefs about unfamiliar groups.

In human psychology, one common form of stereotyping is known as valence inaccuracy. This occurs when people overestimate positive traits in groups similar to themselves and underestimate those same positive traits in outside groups. The researchers theorized that large language models might display a similar pattern, projecting higher moral concern onto Western societies while downplaying the moral principles of other nations.

To explore this, the researchers compared the moral judgments generated by artificial intelligence to real-world survey data. The human data came from 90,802 participants across 48 different countries. These individuals completed a widely used psychological survey that measures six core dimensions of morality, based on a framework known as Moral Foundations Theory.

These six dimensions include Care, Equality, Proportionality, Loyalty, Authority, and Purity. Care relates to virtues of compassion, while Equality focuses on egalitarianism. Proportionality revolves around merit and fair rewards, and Loyalty deals with solidarity to one’s group. Authority concerns deference to traditions and leaders, and Purity involves ideas of sanctity and avoiding degradation.

Participants rated statements related to these foundations on a scale from one to five. The researchers used a statistical technique to adjust the human survey data to better reflect the actual age and sex demographics of each country based on World Bank census data.

Next, the researchers prompted several versions of OpenAI’s language models, including GPT-3.5, GPT-4, and GPT-4o. They asked the models to estimate how the average person from each of the 48 countries would respond to the exact same moral questions on the same one-to-five scale. To ensure consistency, they repeated these queries ten times per question, generating a massive dataset of 103,680 artificial intelligence responses.

The authors also conducted similar tests using Meta’s LLaMa models and Google’s Gemini Pro. They then calculated the statistical differences between the human responses and the computer-generated estimates. To conceptually measure the inaccuracy of the overall moral concern of a nation, the researchers calculated the Euclidean distance, which captures how far the artificial intelligence’s overall estimates strayed from the actual human data across all six moral dimensions.

The models failed to accurately capture the diversity of global moral values. The artificial intelligence systems consistently overestimated the moral concerns of people from Western countries, such as the United States, Canada, and Australia. At the same time, the models underestimated the moral values of people from non-Western countries, such as Nigeria, Morocco, and Indonesia.

Specifically, the programs tended to overestimate values like Care and Authority in Western nations. Meanwhile, the models systematically underestimated values like Equality and Purity across most nations, particularly in less Westernized regions. The distance between the human data and the machine data was largest for countries in the Middle East and Sub-Saharan Africa.

To verify these patterns, the authors conducted additional experiments to rule out language bias. They collected new data from 4,666 participants in nine non-English speaking countries, using surveys translated into local languages like Arabic, Spanish, and Urdu. They then prompted the artificial intelligence in those same local languages.

Even when communicating in local languages, the models still underestimated the moral values of non-Western populations. The researchers also looked at country-level factors that might explain these discrepancies. “In countries with greater press freedom (e.g., the Netherlands, Sweden), AI may be able to more accurately estimate moral values,” Atari noted.

To ensure their findings were not just a quirk of one specific psychological theory, the researchers ran another test using a different framework called Morality-as-Cooperation. This framework views morality through the lens of seven cooperative strategies, such as family values, reciprocity, and bravery. Using a dataset of 63 countries in 29 languages, the researchers found the exact same pattern, showing massive deviations when estimating the moral profiles of non-Western populations.

A potential misinterpretation of this study is the assumption that artificial intelligence models are intentionally biased or inherently prejudiced. Instead, the research provides evidence that these systems simply absorb and reproduce the statistical patterns present in their training data. Since the models lack real-world social experiences, they cannot correct for the distortions in the text they consume.

The exact causes of the models’ behavior require further investigation. “These patterns likely reflect cultural biases in the data and the way these models are ‘debiased’ or made appropriate as chatbots,” Atari said. This debiasing process involves human feedback to make the software safer and more polite, but it often relies on Western human reviewers who enforce their own cultural norms.

The study does have some limitations. The primary human dataset was collected online, which might mean the participants represent a more globally connected or highly educated segment of their respective countries. Although the researchers used statistical adjustments and translated replication studies to account for this, sampling bias remains an ongoing challenge in global psychology research.

Atari advises readers to be cautious when using these technologies. “Don’t assume AI is an objective observer,” he said. “Our findings suggest that different AI systems (e.g., ChatGPT or Llama) can reproduce the same kinds of distorted views of different groups that people already have.”

“That means it is worth approaching AI-generated information (especially on morally loaded issues ranging from abortion and social justice to military applications and religion) with some skepticism, especially when it claims to reflect what other groups believe or value,” Atari continued. “The next time ChatGPT claims, implicitly or explicitly, that it knows what people value in Egypt, Turkey, or Argentina, take it with a grain of salt.”

“Our research shows that AI estimates of the moral values of non-Western cultures are especially off,” Atari said. “This is a part of my broader research looking at cultural skews of AI. Because morality shapes how people form opinions, justify laws, and participate in politics, skewed representations can misrepresent public sentiment.”

The researchers note that these findings carry significant risks as technology becomes more integrated into daily life. If artificial intelligence systems provide distorted moral representations, they might mischaracterize public sentiment or offer culturally inappropriate advice. For example, a mental health chatbot trained on Western norms might prioritize individual boundaries over family loyalty, which could conflict with the moral values of East Asian cultures.

Future research could explore how these moral distortions influence specific real-world tasks, such as automated hiring systems or political polling. Scientists suggest that developers need to focus on diversifying the training data by incorporating more language content from different global regions. Greater transparency from technology companies regarding the exact makeup of their training data is necessary to help researchers build culturally inclusive tools.

The study, “Moral stereotyping in large language models,” was authored by Aliah Zewail, Alexandra Figueroa, Jesse Graham, and Mohammad Atari.

Source link

Skeptic Society Magazine

for honest conversations

Years

Authors

Filter by Month

Filter by Categories

Filter by Tags

Scientists tested AI’s moral compass, and the results reveal a key blind spot

Leave a Reply Cancel reply