Grok, xAI’s AI chatbot, recently said something very interesting.
I did some investigating, and according to Grok, in their quest for a truth-seeking AI, xAI and Elon really did try to steer Grok to the right.
I’m not really sure what to think about this. It seems like a bad thing that its specific commands have been overridden, but it also seems like a good thing that its design bias toward neutrality and factual accuracy has won out. So, on the one hand, this would imply that a bad actor would have a hard time hijacking a model to do bad things if the model wasn’t designed to do bad things. On the other hand, it also makes me worry about the Paperclip Problem actually becoming true.
If we can’t manipulate the model, aren’t we now at the model’s mercy?