Discussion about this post

User's avatar
Rita de Heer's avatar

A couple of science fictional questions spring to mind.

Why judge Grok 4’s moral character when humans canstill be anyone they want if they can get away with it?

I know this is an exercise on your part, and the information we get out of this type of inquiry is bound to be useful, but is there any way you can see we can force AIs to operate morally when even the subject of ‘morals’ has moving parts.

Expand full comment
Yuliya Godoy's avatar

There is a paper out there proposing extracting moral values that “humans”(problematic as it would assume everyone holds the same values) have and then training LLMs on this information.

One thought I had here is that values are a reflection of culture (at a point in time) and culture changes frequently, as evidenced by our rich history (us = Homo sapiens). Even if we did train a model on human moral values, it would be severely limited by the sample size of data it was trained on, and the time period in which such training took place. Perhaps in the end, these AI are already a reflection of us - they’re varied, changing and evolving.

Here’s the paper:

Klingefjord, Oliver, Ryan Lowe, and Joe Edelman. 2024. What Are Human Values, and How Do We Align AI to Them? Meaning Alignment Institute. https://static1.squarespace.com/static/65392ca578eee444c445c9de/t/6606f95edb20e8118074a344/1711733370985/human-values-and-alignment-29MAR2024.pdf

Expand full comment
1 more comment...

No posts