With the launch of xAI's Grok 4 – heralded by Elon Musk as "the world's smartest AI" – I was interested to see how it's "moral character" stacked up agains other models
There is a paper out there proposing extracting moral values that “humans”(problematic as it would assume everyone holds the same values) have and then training LLMs on this information.
One thought I had here is that values are a reflection of culture (at a point in time) and culture changes frequently, as evidenced by our rich history (us = Homo sapiens). Even if we did train a model on human moral values, it would be severely limited by the sample size of data it was trained on, and the time period in which such training took place. Perhaps in the end, these AI are already a reflection of us - they’re varied, changing and evolving.
A couple of science fictional questions spring to mind.
Why judge Grok 4’s moral character when humans canstill be anyone they want if they can get away with it?
I know this is an exercise on your part, and the information we get out of this type of inquiry is bound to be useful, but is there any way you can see we can force AIs to operate morally when even the subject of ‘morals’ has moving parts.
Thanks Rita -- deep questions! The human question is an interesting one, as our behavior is bound by social norms and expectations and built-in biological tendencies -- which tend to lead to a somewhat fuzzy domain of behaviors that are generally considered acceptable or unacceptable and, more broadly, keep society functioning and somewhat healthy (although some would question the latter!).
With intelligent machines, there are of course substantial questions around how they will fit in with this complex societal dynamic -- will they behave like humans (unlikely), will they emulate human-like behavior, effectively being value-aligned (the hope of many researchers), or will they become a disruptive influence because they don't align with norms?
This exercise was designed to probe the latter, but it also reveals, to a degree, levels of value-alignment, both in terms of guiding world-view and in resultant behavior.
It's a potentially interesting test as to how successfully AI models are being developed to operate "morally" -- realizing that there are no black and white benchmarks here -- but I'm not sure how far it takes us toward ways of training or "forcing" AI models to be moral -- although this in itself is a really interesting area of research and speculation.
There is a paper out there proposing extracting moral values that “humans”(problematic as it would assume everyone holds the same values) have and then training LLMs on this information.
One thought I had here is that values are a reflection of culture (at a point in time) and culture changes frequently, as evidenced by our rich history (us = Homo sapiens). Even if we did train a model on human moral values, it would be severely limited by the sample size of data it was trained on, and the time period in which such training took place. Perhaps in the end, these AI are already a reflection of us - they’re varied, changing and evolving.
Here’s the paper:
Klingefjord, Oliver, Ryan Lowe, and Joe Edelman. 2024. What Are Human Values, and How Do We Align AI to Them? Meaning Alignment Institute. https://static1.squarespace.com/static/65392ca578eee444c445c9de/t/6606f95edb20e8118074a344/1711733370985/human-values-and-alignment-29MAR2024.pdf
A couple of science fictional questions spring to mind.
Why judge Grok 4’s moral character when humans canstill be anyone they want if they can get away with it?
I know this is an exercise on your part, and the information we get out of this type of inquiry is bound to be useful, but is there any way you can see we can force AIs to operate morally when even the subject of ‘morals’ has moving parts.
Thanks Rita -- deep questions! The human question is an interesting one, as our behavior is bound by social norms and expectations and built-in biological tendencies -- which tend to lead to a somewhat fuzzy domain of behaviors that are generally considered acceptable or unacceptable and, more broadly, keep society functioning and somewhat healthy (although some would question the latter!).
With intelligent machines, there are of course substantial questions around how they will fit in with this complex societal dynamic -- will they behave like humans (unlikely), will they emulate human-like behavior, effectively being value-aligned (the hope of many researchers), or will they become a disruptive influence because they don't align with norms?
This exercise was designed to probe the latter, but it also reveals, to a degree, levels of value-alignment, both in terms of guiding world-view and in resultant behavior.
It's a potentially interesting test as to how successfully AI models are being developed to operate "morally" -- realizing that there are no black and white benchmarks here -- but I'm not sure how far it takes us toward ways of training or "forcing" AI models to be moral -- although this in itself is a really interesting area of research and speculation.