Alice and Bob Swap Brains

Dec 10, 2024Edited

Thanks for sharing! I'm interested in *why* people two-box. Is "the problem has a choice and so deviation from the prediction is possible", is that fact the main reason why you two-box? (If not, then what is/are the reasons?)

It's not me that is saying deviation from the prediction is possible or not possible 1) for humans, 2) in real life. The thought experiment also doesn't say that, that's why it's a *thought* experiment. The thought experiment just says *IF* or *SUPPOSE* “a being in whose power to predict your choices you have enormous confidence...almost certainly this being's prediction about your choice in the situation to be discussed will be correct." exists.

I agree with you that deviation from prediction is (probably) possible for *humans* in actual real life. But that's just not the scenario the thought experiment is asking about.

It's kind of like if you have a chess game where the king is in checkmate, but *could* get out of checkmate if king pieces were allowed to move 2 squares instead of just 1. The thought experiment is like asking about what would happen IF kings *were* allowed to move 2 squares instead of just one. What would happen? Could the king escape from checkmate in that case?

Whereas you saying that you would take two boxes because deviation from prediction is possible in the newcomb problem is like saying, no the king cannot escape checkmate because because kings can only move 1 square as in the usual stipulated rules of chess. Which is entirely true normally! But just ... it's simply not engaging with the "IF" in the thought experiment as its written. (Which is also fine if you don't want to engage. But it would be good to be clear on whether or not you are or aren't engaging with something as stipulated.)

And that's why the "in real life" version of the experiment has to have the LLM be the one "deciding" (in a very loose sense of that word) to take one box or not. Because that is the only real life example right now where I can be the predictor in a way such that the LLM *cannot* (theoretically) actually deviate from my prediction of its outputs.

Expand full comment

Dec 10, 2024

I two box because two boxes have strictly more contents than one box.

So are you interpreting this problem as meaning there's no choice; the subject must pick what was predicted? That's not really a decision problem, but you can still ask the what should be predicted (as you have).

Expand full comment

Reply (2)

Dec 16, 2024

Tell me if I’m wrong, but I think you might have skipped one crucial point here. I arrived at my choice by roleplaying as the *predictor*, not the player (who is the one choosing to take one or two boxes. Seriously, try it yourself.

As the *predictor*, your question is “do I put $1,000,000 in box B, or do I leave it empty?” Play out the circumstances in which you would put the money in box B, and the circumstances where you wouldn’t, like I did in the post. THEN you ask, does the LLM (or in the thought experiment version, the human you are predicting) ever actually get to take $1,000,00 + $1000 home?

I am maximising, but I am maximising over the set of options available to me.

Expand full comment

Dec 17, 2024

Hey, Alice. I didn't miss that fact, but it's a complicated scenario. If it conflicts with a simpler analysis (the payoff matrix), it probably has errors. No amount of additional explanation or analysis solves that.

Expand full comment

Dec 17, 2024Edited

Then I guess the point I must not be understanding is, as the predictor, when would you put $1,000,000 into box B such that the LLM/human is in a position to take it (the $1,000,000) AND the $1000?

Maybe the deeper point should be, the player isn’t the only one that gets to have decision theory. The predictor is also a being that gets to have and use decision theory. And in the newcomb thought experiment/game, the predictor goes first.

Expand full comment

Dec 19, 2024

The glib answer is "for all cases where Alice puts the $1000+$1000000 into the boxes, the subject is in a position to take it all".

When I try to formulate that question more rigorously, I find it contradicts the big premise of the scenario (subject can choose):

Given a a universe that's determined to the extent that it's known the subject will take two boxes, what would lead to Alice putting the $1000+$1000000 in the two boxes?

The last lens, hindsight, would ask:

The subject took both boxes and gained $1000+$1000000. How did Alice not predict that?

The answer is that the prediction was wrong. According to the problem, this scenario has not happened yet, but it can. The probability is ε.

Expand full comment

Continue thread →

Dec 12, 2024Edited

Mmm there is “choice” and the subject “must” pick what was predicted, but in ways that don’t really fall under the normal usage of those words. This is where the LLM example really shines.

The Llama LLM on Meta (or anyone else’s) computer that I am predicting (with my copy of llama) “must” generate “One” or “D” or whichever output it does, given some specific input prompt. But I don’t work at Meta. I did nothing to train their models or fine tune or add any user/system prompts or anything like that. Maybe the tiny footprint I have on the internet somehow got into their training data, but other than that potential negligible amount of influence, I have zero control over which token the llama will output. I just predict what it would output.

And you can also ask, was there a choice in what token the LLM outputs? On a zoomed out level of abstraction, theoretically, llama could have output any one of the 128k(iirc) tokens in its vocab size. In a very very zoomed in view though, yes, it had to output whatever token it did, once the model parameters were set a certain way, and the input prompt was tokenized a certain way, and the sampling distributions were seeded pseudorandomly in a certain way. Idk if you would call that choice, but it is what it is.

Switching back to humans, you or I could have theoretically learned any of the natural languages on earth to “native level fluency”, in a very zoomed out level. But when you zoom in, there was an extremely low chance my brain could have acquired say, Xhosa, it way it acquired English given the linguistic environment and people I was exposed to. None of this is to deny that I *could have* become natively fluent in Xhosa, if the people around me really wanted to push for it and fed me access to the input I would have needed. Nor does it say I can’t in the future become natively fluent in Xhosa if I really tried and dedicated time/attention/resources to it. It just is what it is. Is that a choice?

And now if we go back to the newcomb problem decision, how does any specific person come to make the choice they do when facing the question…?

Expand full comment