29 June, 2020
The Open Mind Foundation (OMF - a foundation dedicated to the study of mass emotion) actively participates in research projects on emotional recognition, artificial intelligence, and computer vision. Using the latest technology, the foundation’s researchers have conducted an interesting experiment which implements visual recognition systems to perceive human emotion in the studio of a popular political talk-show.
Our experiment debunked the myth of artificial intelligence being able to accurately recognize human emotion based on facial expression without any cultural or psychological context. What would seem an unsuccessful endeavor has only pushed us to search for new methods of creating a precise, objective way to recognize emotions using technology.
The Story our Faces Tell
The ability to perceive facial expression is one of the first to develop after birth. It is especially important to note that we use our faces to express various emotions as well as read the moods of others and make inferences about them.
Imagine a broadcast of a premier-league soccer game: the eyes of a pretty girl, frozen in dejected expectation of a penalty, the huge eyes of a ten-year-old boy, full of tears after his favorite team’s loss, the uninhibited glee of a stately, round man, celebrating a goal scored against the opposing team.
Have you ever wondered why televised soccer games often show close-up shots of fans’ faces as they cheer or boo? This is done in order for the viewers to see the facial expressions of those in the bleachers and subconsciously “read” their emotions. Thanks to the mirror neuron mechanism, which is responsible for our ability to mimic and empathize, viewers then experience similar emotions to those felt by fans.
Facial assessment is ever-present in our daily social interactions and affects not only individual actions, but public opinion, and even political choices. This begs the question: is it possible to accurately assess a person’s emotional state using technological recognition software?
For example, in surveys and sociological research conducted by the foundation, two of the most important emotions at play are hope and humiliation. Which facial patterns indicate these emotions? How can we detect them visually? And is it ethically correct for a specially taught artificial intelligent machine to discern emotions as accurately as you or I could?
How to "Know" an Emotion
A person's emotional state can be identified based on the movement of his or her facial muscles. This is known as an emotional expression, or facial expression.
When we conducted our experiment in August of 2019, many emotion recognition programs which used videos of facial expressions were based on Paul Eckman’s popular theory of 6 basic emotions. According to Eckman, happiness, fear, surprise, anger, disgust, and sadness are universal emotions experienced by all human beings.
Each basic emotion has a specific set of defining facial movements (action units). The muscles contract and relax, moving the face and expressing an emotion. This movement can be described using key fixed points on the face, which then correlate with a specific emotion. Software developers use these points to teach neural networks to encode facial expressions. By measuring the changes in distance between key facial points, the system identifies the appropriate emotion.
Artificial intelligence systems based on this theory make emotion recognition easier. However, this also narrows the number of possible feelings and mental states to only 6 basic emotions.
The Faces of Freedom of Speech
Open Mind Foundation’s research focuses not only on collective emotions, the mechanisms behind their occurrence and their effect on reality, but the separate emotions of individuals. OMF scientists were curious to what extent a raised eyebrow, smile, or pursed lips could show what people are feeling when watching famous speakers who greatly influence the country speak in a TV studio.
Our research aimed to test current emotions recognition systems through video and confirm that the viewers’ emotions are correctly interpreted according to the theory of 6 basic emotions. We also wanted to find out whether it would be more convenient to use existing software or to spend time creating our own, which would better fit our needs.
The laboratory for our experiment on utilizing emotion recognition software in mass media was Savik Shuster’s weekly like program, Freedom of Speech on Channel Ukraine.
This program is the only one of its kind of Ukranian TV. A stronghold of democracy, it presents the toughest and most relevant topics while promoting values of liberty and freedom.
The show’s format (pre-quarantine):
Every Friday, "all of Ukraine" - 100 people who represent the country’s demographics based on age, gender, and region - gather at the studio. Every citizen has an equal chance of attending. To eliminate the viewers’ own wishes to participate, a team of professional sociologists randomly invite audience members using stratified, multi-step selection. It is impossible to sign up or request to take part in the project.
The opinion of these 100 viewers, with a slight margin of error, can be considered the opinion of all Ukranians. Each audience member is given a tablet, which they use to express their opinion on the speeches of invited politicians and public leaders. Every 8 seconds, viewers must choose to press the “agree” or “disagree” button. The results of this continuous voting are broadcasted to reflect public opinion on the most controversial current events and the guests discussing them.
Recognition vs. Perception - 0:1
The experiment tested a system created by a Russian startup to recognize the six basic emotions employing a new algorithm which developers taught using a database of two million faces - AffectNet (2017).
There were two methods used to train the neural network to recognize emotions: in the first, encoding was done by special trained engineers, while in the second, it was done by an automated encoding system known as FACS (Facial Action Coding System).
The system’s main goal was to determine which occurrences in the studio (speakers’ performances, the host’s comments, the context of the situation, etc.) caused bursts of emotion among the viewers.
Our experiment modeled regular occurrences in the studio during a livestream. The audience consisted of volunteers (about 80 people). Four sessions were held in total, each lasting 1-3 hours.
The movement of the viewers’ faces was recorded by cameras, the data was sent to a server, where it was processed and analyzed. The results were then displayed on screen as a graph of the audience’s overall emotions, showing what participants were feeling every second.
The very first test-runs showed that the system worked; emotions were recognized, data was processed, and results were displayed on the host’s tablet… But these were only basic emotions, which looked pretty on a graph and, though they showed certain trends, were emotions for emotions’ sake, without any context, without consideration of the situation or physical and psychological factors.
Additionally, because of the program’s imperfections, the final data of the viewers’ emotions did not always match reality. For example, participants watched a cute video about pets. It would seem this should have elicited positive feelings, but the system detected a varied set of emotions. IT specialists were unable to determine whether this was objectively the emotional state of the viewers or a bug in the system.
The viewers, much like fans at a football stadium, were also observed by an audience - independent experts were invited and often found inconsistencies between their own perception of the participants’ emotions and that of the machine. Currently, artificial intelligence is too far removed from the neural mechanism of the human brain. You and I can immediately notice and identify, most often correctly, the emotion of a person belonging to our cultural background, while a program, even one specially trained by millions of images, is not able to do this.
System testing in progress. Test subjects are closed for privacy reasons.
Although the system used in the experiment was more or less satisfactory at recognizing the specific meanings of previously encoded facial expressions, it struggled to discern the signs of an arising emotion, especially when it came to matching up the speakers’ performances with the emotional stimuli experienced by the viewers. The program could not discover and separate signs that showed one internal emotional state, and those which signaled a different, hidden, state.
Another conclusion is evident from the results - our research goals require a custom technological solution. This solution can be entirely new, or based on existing systems and adapted to fit the needs of the experiment in terms of tools, design, and technical execution.
The Path Towards a New Method
At the moment, our focus lies in finding a way to discern and objectively identify emotions. We aim to create a method of emotional perception which takes into account surrounding circumstances and context instead of mechanically recognizing patterns previously embedded into the program.
Savik Shuster, creator and host of the Freedom of Speech talk show and president of the Open Mind Foundation:
I have always had a careful approach towards the possibility of giving artificial intelligence the right to recognize emotions on faces. It brings to mind an almost Orwellian total control over people. People will express the same emotions differently. How will a machine be able to recognize them? That’s on the one hand, on the other, technology usually develops through evolution. It’s impossible to push away the next step, we can only move as quickly as possible to the one after it. In recent years, tens of startups, with million-dollar budgets have launched all over the world, claiming that they can determine emotions using images on a screen. We have tested the face reading approach and confirmed that it cannot be considered an accurate tool for measuring emotion. However, this experiment has become a jumping-off point for further research, and we are currently well on our way towards a completely new, objective, and more precise method of determining mass emotional states.
At this time, our researchers are perfecting algorithms which recognize and capture images and transfer data. For a complex analysis and improved emotional recognition, we are creating special devices which measure a person’s physio-psychological state. The new program will take into account social, situational, and even temporal context.
We are preparing for the next experiment and creating new criteria for a better, more adequate emotion perception AI system which includes the meaning of the situation, the content of what is said, and surrounding circumstances.
Pavel Osipov, vice president of the Open Mind Foundation, organizer of the experiment:
In 2019 we did not have enough experience or similar research to compare to. This was a unique experiment; one of the first attempts to model a situation where emotional recognition systems are used in a live TV studio. However, we came to realize that practically every system now available is not yet able to reliably perceive mass emotions, even if the systems’ creators may try to convince us otherwise. We already have 90% of the tools needed for a second experiment. We understand that identifying emotions purely based on video is not entirely correct. Emotions reveal themselves not only through facial expressions and gestures; other features are important as well, for example, the physiological reaction, which can give us an understanding of people’s subconscious motives that more accurately reflect their emotional state.
We would like to collect data on emotions through multiple channels, so we are currently focusing on research and development in the field of emotion perception systems. Specifically perception, because when we talk about recognition, we are dealing with something precise. And how can we teach a machine to recognize something we know so little about? Where do emotions come from, how are they expressed, how do the situational context, our previous experience, and our well-being affect this process? We are answering all of these questions through complex research, aimed at creating a system not only of recognition, but a holistic perception of emotions, which more accurately reflects our approach.
Bibliography
Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20(1), 1–68. doi:10.1177/1529100619832930
Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2019). AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10 (1), 18–31. doi: 10.1109/TAFFC.2017.2740923
Peters, J. (2020, June 8). IBM will no longer offer, develop, or research facial recognition technology. The Verge. Retrieved from https://www.theverge.com/2020/6/8/21284683/ibm-no-longer-general-purpose-facial-recognition-analysis-software
Комментарии