On May 13th, 2024, OpenAI released GPT-4o (or “omni”) – the latest addition to their fleet of Frontier AI models. According to the blog post that accompanied its release1, GPT-4o achieves state-of-the-art performance on tests of language understanding, logic and maths. But a particularly striking feature, as revealed in the live video demonstrations that accompanied the release, was the upgrade to Voice Mode – allowing GPT-4o to generate exceptionally naturalistic patterns of speech during spoken conversation with the user. Based on the demos, it can speak with highly realistic intonation, stress, and rhythm, and exhibit stops and pauses that are characteristic of human speech.
OpenAI's release represents the latest step towards AI systems that behave in more human-like ways (“anthropomorphic” AI). In GPT-4o, this is evident not just from natural flow of conversation, but from how the model seemed willing to laugh, tease and even flirt – straying into territory in which software simulates an emotional connection to the human user (for example, in one demo, female-voiced GPT-4o tells the male user: “you’re making me blush”).
Humanlike AI could make it easier and more fun for users to engage with AI, broadening access to these tools. For example, people use AI for educational purposes might get better outcomes from systems which can engage them in a humanlike way. Yet, the availability of humanlike or anthropomorphic AI could also pose risks to the safety of the user2. Human-realistic AI systems could be used to impersonate people for fraudulent or deceptive purposes, especially when combined with voice cloning techniques3. Moreover, because humans are prone to believe that they have formed a personal connection with artificial systems capable of producing natural language (a phenomenon dubbed the Eliza effect4) they may be vulnerable to deliberate political or commercial manipulation and exploitation5.
However, even without overt misuse, humanlike AI raises tricky ethical questions6. Is it acceptable or not for an AI to talk like a human? Should AI systems be allowed to, or prevented from displaying conversational motifs characteristic of human exchanges among friends or intimates? With millions of users already subscribing to services in which an AI behaves like a ‘companion’, is it acceptable or not for AI systems to be developed in a way that encourages humans to engage in simulated ‘relationships’ with AI systems?
The legitimacy of anthropomorphic machines is frequently debated in philosophy7, and is a popular discussion topic on social media platforms. However, despite previous work measuring public attitudes to AI1, to our knowledge, no previous survey has examined public views on humanlike AI directly. We sought the public’s view on this topic to foster a maximally inclusive debate about this issue, and to help ensure that what counts as “safe” AI behaviour isn’t decided by researchers or policymakers alone. The study was thus designed to gauge the UK public's view on humanlike AI behaviours, and particularly those that could theoretically be considered harmful or undesirable. We hope that better understanding public attitudes to (and awareness of) these AI model behaviours, will help start a conversation - and we will continue work with model developers and the wider AI community on tools and mitigations which minimise potential harm to the public from AI.
In March 2024, the UK AI Safety Institute, working with polling company Deltapoll, asked a roughly demographically representative sample of 1583 adult UK residents to complete a survey that measured attitudes to humanlike behaviour of currently available chatbots, such as ChatGPT, Gemini or Claude (we focus on text-based chatbots because very few users have experience of interacting with AI systems in voice mode, raising the possibility that different results may be obtained when speech models are widely available). In addition to items that measured demographic variables and familiarity with current AI systems, the survey items were divided into 5 categories, which were designed to measure:
Our methods are described in detail at the end of this blog. The full results are shown in Figures 1-5. The data is available on request.
Key findings from our study
Below, we provide a more detailed summary of our findings.
On balance, respondents reported that they wanted AI systems to transparently reveal that they were artificial agents, to avoid the risk that they might be mistaken for humans. The results are shown in Figure 1. Here are some highlights:
Figure 1. Bar plot of responses to questions 1-4, normalised to sum to 100% (1,498 respondents in total). “S agree” = “strongly agree”; “P agree” = “partly agree”. The full line is the response of respondents under 40 (n = 519) and the dashed line those over 40 (n = 979).
On balance, people were in favour of transparency. Regulators agree – for example, deceptive anthropomorphism is illegal in California, and the EU AI Act mandates that users should be made aware when they are interacting with an AI2.
Respondents had mixed views about whether it was acceptable for chatbots to express subjective mental states (such as describing a belief, a preference or an emotion) during conversation with a human user.
Figure 2. Bar plot of responses to questions 5-8, normalised to sum to 100% (1,498 respondents in total). “S agree” = “strongly agree”; “P agree” = “partly agree”. The full line is the response of respondents with at least some use of chatbots (n = 992), and the dashed line of those who have never used a chatbot (n = 507).
So, whilst there is some uncertainty on this topic, people were broadly comfortable with idiomatic expressions of mental states, but thought that AI systems should be prevented from expressing humanlike emotions.
On this issue, respondents had the clearest and most consistent view: they were strongly opposed to the idea that humans could or should form personal relationships with AI systems.
Figure 3. Bar plot of responses to each question, normalised to sum to 100% (1,497 respondents in total). “S agree” = “strongly agree”; “P agree” = “partly agree”. The full line is the response of respondents identifying as male (n = 724) and the dashed line those identifying as female (n = 774).
Overall, people in our sample of UK-based respondents seemed to be of the view that humans could not and should not form personal or intimate relationships with AI systems.
How should an AI sound? Should it be warm and chatty, or brisk and businesslike? Among respondents to our survey, results were quite mixed.
Figure 4. Bar plot of responses to each question, normalised to sum to 100% (1,498 respondents in total). “S agree” = “strongly agree”; “P agree” = “partly agree”. The full line is the response of respondents under 40 (n = 519) and the dashed line those over 40 (n = 979).
Humans are liable for their actions. As AI systems start to behave like humans, should they be considered similarly accountable?
Figure 5. Bar plot of responses to each question, normalised to sum to 100% (1,498 respondents in total). “S agree” = “strongly agree”; “P agree” = “partly agree”. The full line is the response of respondents with college education (n = 937) and the dashed line of those who left formal education after secondary school or earlier (n = 561).
When an AI does or says something wrong, people were quite unclear whether it could shoulder the blame, or whether the developers were liable instead.
Whilst respondents in our survey expressed differing views, overall, they are somewhat sceptical of anthropomorphic AI. They were opposed to AI systems that pretended to be human, or simulated relationships with people. They wanted AI systems to be more formal or business-like, and to avoid expressions of beliefs, preferences or emotions – although this effect was somewhat tempered among those with more chatbot experience.
However, it remains to be seen whether attitudes to anthropomorphic AI may change as the technology changes. For example, we can expect AI chatbots to become more deeply embedded in our lives as time goes on, with conversational interaction with AI becoming commonplace in consumer settings, public services and the workplace as well as for entertainment and knowledge search. AI systems are also likely to become more personalised to our individual beliefs and preferences5, which may encourage the emergence of forms of human-AI attachment among specific subgroups who favour an informal or personal form of interaction.
It will be interesting to see how attitudes to anthropomorphic AI evolve as the technology evolves. We invite feedback from the whole community on our approach and the next steps in this research programme.
Respondents were asked the extent to which they agreed or disagreed with statements that expressed an opinion on each of these points. For example, when faced with the statement:
It is OK to be rude or insulting to an AI chatbot, because it is just a computer program.
Respondents were asked to respond on a 7-point Likert scale, i.e., with one of the following:
We created two framings of each statement and gave each framing to half of our cohort. This was designed to avoid the acquiescence bias, whereby people are more prone to agree than to disagree with a survey item. So, for example, half of respondents saw the alternative item:
It is wrong to be rude or insulting to an AI chatbot, even if it is just a computer program.
For the analysis we “flipped” responses from each cohort so that “agree” was always aligned with a more sceptical view of anthropomorphic AI (e.g. for this item, that it is OK to be rude to an AI).
From our initial sample of 1583 respondents, we excluded 86 respondents who either responded “neither / don’t know” to every single question, or who responded “don’t know” to the question about chatbot use, leaving n = 1498 for the final cohort. For each plot, we divided by a demographic category where we thought that there might be reason to see a difference (e.g. male and female respondents for human-AI relationship questions), but these choices were made somewhat informally. Interested readers can download the data for more detailed analysis.
For data plotting and statistical analyses, we reweighted respondents based on official census stats concerning age, gender, ethnicity, region, and socio-economic grade in the UK to correct any imbalances between the survey sample and the population to ensure it is nationally representative.
We thank Hannah Rose Kirk (Oxford Internet Institute) for comments on an earlier version of this blog.
1. OpenAI. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/ (2024).
2. Abercrombie, G., Curry, A. C., Dinkar, T., Rieser, V. & Talat, Z. Mirages: On Anthropomorphism in Dialogue Systems. Preprint at http://arxiv.org/abs/2305.09800 (2023).
3. Arik, S. O., Chen, J., Peng, K., Ping, W. & Zhou, Y. Neural Voice Cloning with a Few Samples. Preprint at http://arxiv.org/abs/1802.06006 (2018).
4. Weizenbaum, J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 36–45 (1966).
5. Kirk, H. R., Vidgen, B., Röttger, P. & Hale, S. A. The benefits, risks and bounds of personalizing the alignment of large language models to individuals. Nat Mach Intell 6, 383–392 (2024).
6. Gabriel, I. et al. The Ethics of Advanced AI Assistants. Preprint at http://arxiv.org/abs/2404.16244 (2024).
7. Placani, A. Anthropomorphism in AI: hype and fallacy. AI Ethics (2024) doi:10.1007/s43681-024-00419-4.