• Madrigal@lemmy.world
    link
    fedilink
    English
    arrow-up
    32
    ·
    2 days ago

    Nah, guarantee the models have rules built in to deal with obvious stuff like that.

    You need to be more subtle. Give them information that is slightly wrong.

    • ozymandias117@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      20 hours ago

      Just need to use less obvious insults, a la, “your mother was a hamster, and your father smelt of elderberries”

      Still poisons the model with something an end user won’t like, but isn’t easy enough to train out

    • taco@anarchist.nexus
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 day ago

      Perhaps by generating a bunch of complex copilot code to upload. It’s easy to mass produce and would look plausibly functional.