Banks are on an unstoppable uncontrolled trajectory in pursuit of KYC over-achievement. That is, they over-collect far more data on people than legally required (before it gets leaked to criminals in data breaches). Banks’ privacy policies are rife with anti-consumer weasel words.

It’s such a shit-show that privacy proponents have no real choice other than to quit banks and operate entirely with cash. Not many people have that level of discipline.

Software can turn this situation around. For example, there are ~6000 privacy-abusing banks and credit unions in the US. If a robot harvests all the privacy policies, fetches AOS apps to check permission reqs, records those with websites MitMd by Cloudflare, and uses all that info to find the lesser of evils, consumers can participate in creating a competition for privacy (as opposed to a competition of meaningless soul-selling fractions of a percent of interest earnings). The heart of the problem is banks are only getting pressure from the side of oppressors and tyranny and no pressure from the side of the people they purport to serve. Software and data can remedy this.

Worth noting that long before the AI bubble started, a university in the US studied bank privacy policies in bulk using a scraper bot that just looked at the standardised privacy disclosure forms for which all banks must conform to a standard layout. The data has rotted by now so their research is not of much use.

  • evenwicht@lemmy.sdf.orgOP
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    12 days ago

    Thanks for the insight. Certainly having a human look at the raw data of 6000 banks is a non-starter. I’ve not studied AI, so if I were going to take this project on I would have to (for example) look at what banks charge for paper statements (because offline banking options are a refuge from copious privacy abuses). I would want to short-list banks that offer gratis paper statements.

    The phrase “free paper statements” can be worded in many different ways. I might expect an LLM to be good at that sort of thing. In my non-AI approach, I would have to look at a large sample to get an idea of all the different ways that something is expressed then try to write a regular expression to cover them. Is that still the best way?

    Phrases like “we value your privacy” and “we only use your data as legally permitted” (which tries to deceive readers into thinking of data minimisation when it really means the opposit) – these can also be worded in many ways, all of which could elevate a /bullshit/ score, of sorts.

    One tool I find quite useful for language translation comes from these two sites:

    It’s not just a translation of a blob of text, but you enter a short phrase in one language and it finds real instances of the same phrase in the other language, so you can see how one idea can be expressed in many ways within a language. I assumed an LLM was in play but I don’t really know.

    Of course what we need is not translation from one language to another but a tool that detects different ways within 1 language to express the same idea; almost like synonyms but for phrases.

    • MotoAsh@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      12 days ago

      LLMs could do that for you, and it could be mostly accurate. The way to use AI and get reliable results is to have it do something that’s verifiable, and then verify it if you want actually reliable data. Like you could definitely have AI try to find the links to docs and features and whatnot as listed on their sites. Just make sure to keep the links it finds around and at least verify they’re what it says it is. You can use an AI to do that step, but LLMs do not actually reason and would be fully capable of false negatives and false positives on any determination no matter how small.

      IMO, as long as you made the data fully auditable by hand as easily as possible (and that’d be a great idea for anything referencing externalities), it wouldn’t be bad to use “AI” to construct it. It’d even be a good idea if you want to just trust the AI’s output, as an auditable data set could be updated by AI, too, when policies change and whatnot.