DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 23 hours ago

DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry.

mub@lemmy.ml · 12 hours ago

It is rare that I fail to get the gist of what is being said in these technical explanations, but this one has me actually wondering about the gist of the gist. Some of it made me feel like it was made up nonsense.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 11 hours ago

It seemed pretty clear to me. If you have any clue on the subject then you presumably know about the interconnect bottleneck in traditional large models. The data moving between layers often consumes more energy and time than the actual compute operations, and the surface area for data communication explodes as models grow to billions parameters. The mHC paper introduces a new way to link neural pathways by constraining hyper-connections to a low-dimensional manifold.

In a standard transformer architecture, every neuron in layer N potentially connects to every neuron in layer N+1. This is mathematically exhaustive making it computationally inefficient. Manifold constrained connections operate on the premise that most of this high-dimensional space is noise. DeepSeek basically found a way to significantly reduce networking bandwidth for a model by using manifolds to route communication.

Not really sure what you think the made up nonsense is. 🤷

mub@lemmy.ml · 4 hours ago

Thx. That is more helpful.

I don’t actually think it was nonsense, it just sounded like it.

Paul Sutton (zleap)@techhub.social · 1 day ago

@yogthos

It looks like the west has been caught with its pants down, China is developing what seems to be far more efficient AI tech, perhaps while big tech are motivated by money and IP theft, China has just got on with developing better ideas.

This also seems to be a major win for Open Source models, which is a good thing. Could also be a good thing for the EU who want to develop their own AI solutions to break from US big tech.

just another dev@lemmy.my-box.dev · 1 day ago

Genuinely curious, what makes you think that deepseek has been built without ip theft?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 23 hours ago

Ah yes, they must be stealing IP from the future when they publish novel papers on things nobody’s done before!

just another dev@lemmy.my-box.dev · edit-2 21 hours ago

Snark aside, thanks for clarifying which kind of ip theft was meant, because this is not the kind of ip theft that is normally associated with training models.

It would have been incredibly impressive if they managed to train it without ~~stealing~~ acquiring tons of data.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 21 hours ago

I’m personally against copyrights as a concept and absolutely don’t care about this aspect, especially when it comes to open models. The way I look at is that the model is unlocking this content and making this knowledge available to humanity.

just another dev@lemmy.my-box.dev · 21 hours ago

That’s nice, dear.

hitmyspot@aussie.zone · 1 day ago

I guess it depends on what kind of ipntheft you mean.

just another dev@lemmy.my-box.dev · 1 day ago

I was thinking about the training data, of which you need massive amounts to train. And as far as I know, pretty much all companies have worked on a scraping basis, rather than paying for (or even asking for).

What kind of ip theft were you thinking of?

hitmyspot@aussie.zone · 13 hours ago

I was referring to both scraping to create the models and using the models to create infringing content.

DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry.

DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry.

mHC: Manifold-Constrained Hyper-Connections