5 Comments
User's avatar
Seby's avatar

That is interesting data. I ran into this issue months ago when I was having 4o create an image for an article that included two dials. For one of the dials, the image showed the pointer in the wrong “zone”. I tried several times to correct it, even using different models, but the issue persisted. Of course now I know that it was a failure of the image translation layer, so swapping models wouldn’t make a difference. I eventually gave up. The dial on the left is supposed to point to “low”.

I’m not sure why the translation layer has such a difficult time with dials, but until that is fixed, I guess it’s better to use AI only with digital readouts or text readouts for such things.

T.D. Inoue's avatar

That’s a cool example. I’ve taken to using subregion editing for things like that. AIs have a hard time doing two things at once.

Seby's avatar

I thought I had attached the image, but oh well. Yes, I thought about trying that but I don't have a lot of experience creating AI prompted art. The AI art I have is mostly autogenic. LOL

The Logosmitten's avatar

This study seems to assume models were trained on this specific analog perception. They probably weren't to a great degree. If they had been, they would get it right. This, in no way, proves you wrong. It actually shows that if you were to extrapolate this across a great degree of similar concepts, AI would fail to replace humans. Reading gauges is important, but reading synonymous conceptual concepts that were not the focus of training is why I have no interest in using agents.

T.D. Inoue's avatar

Totally agree.

The primary purpose of the study was to characterize some of the underlying capabilities of the vision model in a similar way I did with my color perception studies. As you noted, it should serve as a cautionary tale on a couple of dimensions. The take home message is that we can’t take it for granted that AIs are competent at tasks without testing them thoroughly in those domains (your point).