• GamingChairModel@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    33 minutes ago

    Yeah, it’s counterintuitive because it’s a lot more work for a human to draw a picture (much less a photorealistic picture) than to write a few words, but human language grammar actually has a lot of strict rules that makes that stream of letters work as “valid” output, much less “decent” output that kinda matches the prompt/description. Transpose a pair of letters or even substitute a single letter (or token) and you’ve got an output that just doesn’t work, in a way that generated images don’t have to worry about.