Using AI for image transcripts, yay or nay?

Gonzako@lemmy.world · 2 months ago

Using AI for image transcripts, yay or nay?

Lumidaub@feddit.org · 2 months ago

If you can get an AI to produce an actually useful description, that would be extremely interesting. However, AIs don’t know what’s important about an image and will fill up the description with useless information, effectively spam for the person that needs a description.

Write just a sentence, describe the thing that is important, while keeping in mind why you’re even posting the image, and it’s going to take less time than asking the AI.

Frank Heijkamp@mastodontech.de · 2 months ago

@Lumidaub
Writing a short description will be faster and more accurate.

It will tale less time than checking and correcting the output of #ai.
@Gonzako

Gonzako@lemmy.world · 2 months ago

So you posted this from mastodon? Is @Lumidaub your tag there?

Lumidaub@feddit.org · 2 months ago

“@Lumidaub” is a reference to me. The system added that because they were, technically, replying to my comment here.

Gonzako@lemmy.world · 2 months ago

Gotcha, these look so full of links on my client

Lumidaub@feddit.org · 2 months ago

Yep, same, it’s a bit of a weakness of the Fediverse imho.

HappyFrog@lemmy.blahaj.zone · 2 months ago

For those that need it, any description is better than none.

Lumidaub@feddit.org · 2 months ago

True and one sentence written by a human who understands the image is better than twenty sentences by a word prediction machine.

HappyFrog@lemmy.blahaj.zone · 2 months ago

No matter how good human written descriptions are, people just won’t do them. So having a automated system is much more preferable.

Lumidaub@feddit.org · 2 months ago

I know what you’re saying but I truly think for most people it’s simply that they’re overthinking it. They think every single thing needs to be in the description, with references explained and sourced and whatnot. That does sound exhausting. And I have written a handful of descriptions like that for pictures where I thought the details were interesting enough to justify the effort. But really, a simple “The thirteenth Doctor and Rose Tyler embracing and deeply kissing” is already very sufficient in most cases (add “standing on an asteroid in front of a field of glittering stars - digital colour painting” if you have the spoons). So imho it’s better to educate them and encourage short, concise descriptions than to give in to the slop.

lambisio@feddit.cl · 2 months ago

You do realize that would lead to people (humans) doing the descriptions even less?

HappyFrog@lemmy.blahaj.zone · 2 months ago

I would argue that it might even increase the number of descriptions if they’re auto generated. People would find it easier to fix small issues than write a whole one from scratch.

lambisio@feddit.cl · edit-2 2 months ago

It would increase the number of bad descriptions and desincentivize people to add good descriptions or (depending on the interface/permissions; you are assuming I could just manually go and fix someone else’s alt-info in their image) submit fixes and corrections. And it would kill the environment even more anyway.

HappyFrog@lemmy.blahaj.zone · 2 months ago

Sure. I don’t care enough to argue. I’m not going to be the one who implements this system anyway, lol.

Rain World: Slugcat Game@lemmy.world · 28 days ago

people do them lots and lots on parts of the fediverse that are not lemmy! in fact, i think the only reason no lemmers do it is because the ui sucks (the alt text box in a post pops up in a completely unrelated section when you add an image!! alt text must be attached to images)

x74sys@programming.dev · edit-2 2 months ago

Yeah, apart from the fact that I imagine that people who need alt text don’t appreciate LLM output. It‘s very boring. It’s either extremely technical and ice-cold or so cringe that you have to stop reading. Just what I think.

At least for me, if I realize that I’m reading an AI blog article or AI generated text in some other form, I don’t read it.

originalucifer@moist.catsweat.com · 2 months ago

personally, this is the kind of laser focused tooling its good for. LLMs are going to be critical to assisting the disabled in many contexts.

hendrik@palaver.p3x.de · 2 months ago

I’d ask someone who needs these transcriptions first. I tend more towards “Nay”. I mean if they want AI transcriptions, I guess they could just run their own AI. And that way they get to choose between human and AI ones. I’m kind of against flooding the internet with AI content as long as the recipients can do it themselves.

Lumidaub@feddit.org · 2 months ago

That’s a good point but wouldn’t it be preferable to have one AI run one time instead of several of them doing the work again and again?

(Assuming that we’re even okay with AI generated descriptions in the first place which I’m not for reasons I’ve laid out in my other comments but I’m talking hypothetically)

Meldrik@lemmy.wtf · edit-2 2 months ago

Alternatively, it’s built into the platform. So when someone uploads an image to Lemmy a local AI model does the description.

Edit: Then it could even be marked as AI generated and people could choose to be exposed to it or not.

Rain World: Slugcat Game@lemmy.world · 28 days ago

people think local ai is the panacea, when ai must have a shit-ton of content scraped from the internet, and ~countless hours churning in datacenters, for the model to be produced in the first place

hendrik@palaver.p3x.de · edit-2 2 months ago

Really hard to tell. I mean there are situations in which people think they’re doing someone a favour. But they’re really not. Upside of doing it individually is: affected people get to pick the model they like best. And they can prompt it however they like. Depends a bit on your expertise on the matter if your pre-generated stuff is on the same level or more a disservice. Upside of pre-generating it once is: maybe a bit less CO2 in the atmosphere and a few less trees killed. But that certainly depends on how many people read those descriptions. If there’s just 2 people with screenreaders out there, who don’t even click on all the images, you might very well be wasting compute. And have a negative balance on the environment.

x74sys@programming.dev · edit-2 2 months ago

In my opinion, no. It has to be heavily curated. You’re not saving yourself a lot of work if you have to read it word by word (and probably correct stuff) anyway.

I think just one very short sentence describing what’s on there (it doesn’t have to be detailed) is a lot better than whatever an LLM will give you.

Baŝto@discuss.tchncs.de · 2 months ago

It depends a lot on the image. Multi panel comics have pretty long alt texts and AI can make it faster to reproduce the text in tge image.

lambisio@feddit.cl · 2 months ago

and AI can make it faster to reproduce the text in tge image

That was solved decades ago without AI. It’s called OCR.

Rain World: Slugcat Game@lemmy.world · 28 days ago

ackshyually ai is a broad term that applied to ocr when it was first made, but nobody uses it that way anymore, they refer to machine learning. but wait! machine learning is used for ocr!!

x74sys@programming.dev · 2 months ago

But then you’re primarily extracting text, which you don’t need LLMs for. OCR tools will do the job much cheaper and more effective.

Kierunkowy74@kbin.social@piefed.zip · edit-2 2 months ago

Check your output as it may be less accurate than your effort.

AI is able to extensively describe a photo, like these published on !pics@lemmy.world , but fails at seeing, what part of it is actually important, or recognising a point of a meme. It will save you many keystrokes, but probably will still need to be manually corrected.

placebo@lemmy.zip · 2 months ago

AI is great for this. We shouldn’t put people with disabilities at a disadvantage because of the anti-AI hysteria.

Petersson@feddit.org · 2 months ago

Personally “AI” is a slur for profit-driven generative bs. The concept it’s based on is great. I love pattern recognition and all the possible usecases for Machine Learning when it comes to science, material research, …

tl;dr: Go for it.

rako@tarte.nuage-libre.fr · 2 months ago

Using AI for

no

I find it tiring

The problem with disabled people isn’t the disability, it’s the behaviour of non-disabled people putting them under, willingly or not. You being tired of that ir actively putting them under. Yes, it’s tiring to take care of people, it’s work. There’s no goind around that. Treating people as equals requires taking care of them, and until you take that as normal (just like brushing your teeth or doind the laundry or sweeping the floor at your place is work, but you still do it) you will be belittling them.

The change needs to happen on your side, on your conception of humanity and society. AI is not going to help you

Gonzako@lemmy.world · 2 months ago

Yeah, but you can’t preemptively take care of everyone. For example, satisfactorys arachnophobia mode wouldnt exist if it wasnt for the fact that one of the devs couldn’t work on it otherwise.

Time and effort are a limited resource.

rako@tarte.nuage-libre.fr · 2 months ago

There is a huge difference between not taking care because it’s not important to you, and not taking care because you can’t. It’s a cop out to mix up both.

It’s completely ok to acknowledge that you can’t do it, and to ask around for others to relay you. That’s society at work doing good things for all of us, and that’s how we get out of all this mess. It’s perfectly fine !

Deebster@infosec.pub · 2 months ago

I read this disgraceful comment yesterday, and I’ve dug through my history to reply to it today.

@rako, this unacceptable. Let’s remove the mention of AI to see if you can get some perspective… Imagine this exchange:

P1: I’ve been cooking for the homeless but it’s taking up a lot of my time and energy. Is it ok to use shop-bought meals?

P2: You being weary of cooking is belittling the homeless! People like you are what’s wrong with society.

I hope you can agree that this is unfair, and unhinged. It’s also not mischaracterising what you wrote.

@Gonzako you don’t seem to have minded rako trying to shame you, but they were way out of line.

rako@tarte.nuage-libre.fr · 2 months ago

I’m sorry but you are just showing you don’t know what AI is about. AI isn’t a shop-bought meal. It’s paying a white supremacist who pinky-promises you he’ll feed people you just have to give him money. The white supremacist chooses what he wants to do with the money: maybe he’ll feed people because they’re white, maybe he’ll beat people because they’re black, but one thing is for sure: he will always work for his own benefit. Not for others, not for you, and certainly not for disabled people.

The correct comparison with AI is not, as many people say, a neutral tool. AI is a political project aiming at domination of a large part of the population. The apt comparison is slavery. Yes, it can be very useful ! It’s free workforce, you don’t need to argue with it, concede anything, and things just get done. Slavery is fine if you’re part of the dominating part of the population, just like AI is fine if you’re part of the dominating part of population. If you’re on the other side you will always be exploited, dehumanized, tortured (yes, subjecting people to constan horrors in the name of “training” is torture)

Let’s redo your analogy now:

I’ve been cooking for the homeless but I’m getting tired. Is it ok to ask slaves to cook meals for me so I can give the meals to homeless ?

Slavery, and AI, isn’t going to help the homeless or the disabled. Destroying the earth, appropriating others’ art and work and knowledge for personal profit is not helping, it’s actively hurting.

What is at stake here, really, is your own appreciation of the goods vs the bads of AI. If the literal anti-democratic project is acceptable because it makes you feel like a good person (“I’m helping people !”) then there is a big work to be done to unravel that. When your personal opinion of yourself is more important than the actual good you might do, something is wrong.

Tamlyn@lemmy.zip · 2 months ago

A lot artists doesn’t want that their art is used on ai. You can’t prevent that if you let ai summarize your images. So don’t use ai for that

Lumidaub@feddit.org · 2 months ago

Those are different mechanisms. Object recognition doesn’t mean the AI is now trained on the image and can reproduce it (which is btw why AI can still “visually” recognise what’s in an image that has been nightshaded/glazed).

Sir. Haxalot@nord.pub · 2 months ago

This is true but it’s also important to remember that if you use an AI model hosted by the same party that trains it it’s likely that they will pass any data you input to the training stage. Unless you have an enterprise contract regulating training use.

OP mentioned he will use a self-hosted LLM though and in that case it’s no risk of the data being used for training.

Lumidaub@feddit.org · 2 months ago

I mean, if you put any image online that hasn’t been protected/poisoned in some way, you have to (unfortunately) assume it’s in some AI’s training data anyway. If the tradeoff for a useful description (! See my other comments about the lack of usefulness) is that an image is also fed into one more training corpus, that would be worth a thought, imho. If the image is protected/poisoned, I’d indeed encourage this whole hypothetical process, just to further sabotage the data.

Gonzako@lemmy.world · 2 months ago

I was actually thinking of using a self-hosted LLM for these tasks. I wanna dig again into it and I got access to computers on the cheap

Rain World: Slugcat Game@lemmy.world · 28 days ago

LLMs can’t see. are you talking about one with a vision thingamabob bolted on? you could just use ocr

Auster@thebrainbin.org · 2 months ago

Imo it’s a good use. But do make sure you read the outputs throughly. Even hand-made OCR tools can go crazy some times. Also if the AI can be fully offline / self-hosted, that’s even better imo.

forestbeasts@pawb.social · 2 months ago

Do not.

Please just don’t.

People (hi I’m people) need what the image IS, what’s important about it, why you included it. Not just what some slop generator shat out about it.

Better to have nothing, which is at least honest, than to have something that PURPORTS to have meaning but then just, doesn’t.

– Frost

Doorknob@lemmy.world · 2 months ago

By transcribing, do you mean describing what is in a picture, or transcribing text in a picture?

For the former, I can’t really imagine an image you couldn’t describe for accessibility within a sentence, and for the latter, OCR could do the job equally well.

I’m not saying this to just push the view that neural networks are no good for anything btw. For translation, for example, or text to speech/speech to text, I genuinely think they’re a revelation, and they need very little compute to perform those functions.

Rain World: Slugcat Game@lemmy.world · 28 days ago

quediuspayu@lemmy.dbzer0.com · 2 months ago

If you can run it your computer for a job that you would do anyway, I don’t see why not

qaz@lemmy.world · edit-2 2 months ago

I’d say go ahead but make sure it produces accurate enough results and make sure to add something like [AI Transcribed] in front so people can take the potential for additional errors into consideration when reading it.

Also, if you’re using an online service make sure you’re using something that doesn’t use it as training data. Many (probably almost all) artists / photographers won’t appreciate that.

FaceDeer@fedia.io · 2 months ago

Give it a test and see how accurate it is, if it’s good enough then go ahead. People have been using AI-based OCR for literal decades already, nothing has fundamentally changed. There’s just a sudden moral panic about it lately.

Rain World: Slugcat Game@lemmy.world · 28 days ago

it’s not a sudden, unexplained moral panic, it’s a bubble with billions being invested in for no return, and ai chows down on those billions well, thus giant, unexplainable, strikingly accurate ai, that investors want put in every facet of your life. and no one! likes it!!