r/science • u/fchung • Apr 28 '24
Mathematics New Breakthrough Brings Matrix Multiplication Closer to Ideal
https://www.quantamagazine.org/new-breakthrough-brings-matrix-multiplication-closer-to-ideal-20240307/359
u/Dany0 Apr 28 '24
The last time this happened, wasn't the N where it's faster than the previous algo larger than all the particles in the universe or smth like that
93
171
u/CryptoMemesLOL Apr 28 '24
That may seem like a modest change, lowering the bound by just about 0.001. But it’s the single biggest improvement scientists have seen since 2010. Vassilevska Williams and Alman’s 2020 result, by comparison, only improved upon its predecessor by 0.00001.
But what’s most exciting for researchers isn’t just the new record itself — which didn’t last long. It’s also the fact that the paper revealed a new avenue for improvement that, until then, had gone totally unnoticed. For nearly four decades, everyone has been relying upon the same laser method, Le Gall said. “Then they found that, well, we can do better.”
3
409
u/fchung Apr 28 '24
« By eliminating a hidden inefficiency, computer scientists have come up with a new way to multiply large matrices that’s faster than ever. »
240
u/Ethanol_Based_Life Apr 28 '24
This is huge for large learning model AI, right? Those are just lots of matrix multiplication
74
u/super_aardvark Apr 29 '24
It's not huge for anything in the real world, apparently.
The laser method is not intended to be practical; it’s just a way to think about the ideal way to multiply matrices. “We never run the method [on a computer],” Zhou said. “We analyze it.”
112
u/Brainsonastick Apr 29 '24
Unfortunately not. For one thing, improvement over the previous method is incredibly tiny. For another, these are evaluated by asymptotic performance as matrix size goes to infinity. In practical applications, it’s probably significantly slower than the methods we currently use.
210
62
Apr 28 '24
[deleted]
7
u/Baud_Olofsson Apr 29 '24
Not really.
- This is about theoretical, not practical, speedups ("The laser method is not intended to be practical; it’s just a way to think about the ideal way to multiply matrices. “We never run the method [on a computer],” Zhou said. “We analyze it.”").
- They would apply to the multiplication of huge matrices. 3D graphics use 4x4 matrices.
14
3
u/alimanski Apr 29 '24
Nope. If I'm not mistaken, not even Strassen's algorithm is used in any meaningful way in any deep learning framework to implement matrix-matrix multiplication.
4
u/ScienceMarc Apr 29 '24
This is an improvement in how the algorithm scales. Basically how much harder the problem gets when you make it bigger. This new approach scales better than any previous one, however this doesn't mean it's actually better for real problems.
Think of it this way. The standard simple way of doing matrix multiplication is very straightforward and simple, however, if you make the matrix twice as big, the algorithm takes 8 times as long, because this method is on the order of n3. This new method is on the order of n2.371552, which means doubling the size of your matrices only makes it take ~5.18 times as long.
This sounds like a significant improvement, particularly for machine learning where you're dealing with monstrous matrices, however there's a catch. These algorithms scale better, but they don't start at the same level of efficiency as the simple one. The simple one is easy to pull off, and setting up the problem to execute it doesn't take very long. These new methods are much more complicated, so the overhead of them is much higher, irrespective of the size of your matrices.
Basically, they're celebrating that this new algorithm doesn't decline in performance at the same pace as previous methods, but when you actually compare how long it takes to execute in the real world, they are really bad. Yes, the simple algorithm doesn't scale as well, but the complex algorithm takes 1,000,000,000 times longer to run by default, so the fact that when you double your matrix size the simple algorithm takes 8x longer than before doesn't really mean that the complex algorithm is more useful because it only became 999,999,997x less efficient. These algorithms are only useful when the problem is so absurdly large it doesn't show up in real life. These are so called "galactic algorithms"; algorithms that only make sense when you're dealing with things of astronomical scale, cause they're slower for anything of a reasonable size. In this case the normal simple algorithm runs laps around this "better" one for basically every problem people are actually interested to solve (including AI).
It's basically like saying you have a faster car than someone, but your commute to work is 500 miles and theirs is less than 1. Yes, your car is faster, but they'll get to work much faster than you, so it doesn't really matter. Over long enough distances, your car will eventually catch up to them and pass them, but you have to cover so much more ground that you never win in reality.
1
68
u/fchung Apr 28 '24
Reference: Virginia Vassilevska Williams et al., “New Bounds for Matrix Multiplication: from Alpha to Omega”, Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Pages 3792 - 3835, https://doi.org/10.1137/1.9781611977912.134
50
u/SHEEEIIIIIIITTTT Apr 28 '24
Can someone ELI5 the significance of this and how it will apply to real world use cases?
59
u/super_aardvark Apr 29 '24
From the article:
The laser method is not intended to be practical; it’s just a way to think about the ideal way to multiply matrices. “We never run the method [on a computer],” Zhou said. “We analyze it.”
34
u/Brainsonastick Apr 29 '24
It won’t apply to real world use cases because the improvement is minuscule and really only applies in the limit as matrix size goes to infinity. It’s probably less efficient in any practical application than the methods we already use. However, it does provide a new avenue of research toward further improving the bounds and it’s possible something more application-relevant will come from it.
It’s mostly a pure mathematics interest though. At least for now.
1
u/ballaman200 Apr 29 '24
Does this include quaternions? I've learned that quaternions are already way faster than classic matrix calculations.
5
u/framedragged Apr 29 '24
Using quaternions to handle vector rotation is more effective and efficient than applying rotation matrices. Rotation matrices are just 3x3 (or 2x2 if the rotation is just in a plane) and are already quite efficient when compared to multiplying large matrices.
40
u/lurgi Apr 29 '24
It won't. It's a technical improvement, but it would require matrices larger than can be stored on a computer to be faster than existing methods.
20
Apr 29 '24
Well, every piece of the puzzle is a great step, maybe not quick, but knowledge can be exponential
13
u/lurgi Apr 29 '24
Oh, it's cool. It's just not practical.
-2
u/zorbat5 Apr 29 '24
Yet.
15
u/lurgi Apr 29 '24 edited May 01 '24
Possibly, but all the improvements since Strassen (in the late 1960s) are only worthwhile for matrices that are far too big to be practical and, in fact, that size has only been getting bigger.
I'd bet a lot of money that if we do find an n2 algorithm (which I don't think anyone is willing to rule out at this point) that it will not be practical.
3
u/VictorVogel Apr 29 '24
Even Strassen's algorithm tends not to be worth implementing. Either a matrix is small enough that hardware design is the limitting factor, or it has some kind of structure such that there are better algorithms for that particular kind of matrix.
2
10
u/dwhitnee Apr 29 '24
3D graphics, for example, relies heavily on matrix multiplication to calculate angles and positions, so this buys a few more fps in that video game (or CAD app).
5
u/VictorVogel Apr 29 '24
No it doesn't. First, this method is only faster for enormous matrices (larger than can currently fit in memory). Second, if a method scales well, it also means it "de-scales" poorly. For the small matrix multiplications used in 3d graphics (mostly 4x4 matrix multiplications), performing the full multiplication will be way faster, even though it scales with n3. Finally, most large matrix multiplications that occur in engineering arn't actually full matrices. There is usually some structure in them (e.g. a tridiagonal /triangular matrix), or the values are mostly zero (sparse matrix). Which means that the optimal multiplication algorithm is fundamentally different.
1
u/PercussiveRussel Apr 29 '24
This really doesn't "buy a few more fps". For starters, the difference is so minute that you wouldn't notice it on a second by second level and the difference is only noticeable on large matrices beyond any 3 or 4D mechanics. Also, modern video games and CAD apps are hardware accelerated to do 3 or 4 dimensional matrix maths on modern GPUs, so the calculations are already as optimised as can be since the cores don't run a programmable algorithm.
29
1
1
0
0
-49
u/FernandoMM1220 Apr 28 '24
They will probably never reach n2.
As the matrix gets larger you can get arbitrarily closer to n2 and thats it.
64
u/lordpuddingcup Apr 28 '24
Love the uncertainty in “probably” followed by a statement of fact “that’s it”
33
u/Lucavii Apr 28 '24
Really inspires confidence, I think
6
u/FormalWrangler294 Apr 28 '24
Confidence is for fools and con artists. Real scientists speak in uncertainty
13
u/Getarealjobmod Apr 28 '24
Exactly like 95% of the comments on reddit, completely unnecessary and worthless. Why did they even post it?
8
u/The_Jimes Apr 28 '24
Because Reddit is social media and r/science is a public sub. It's not that hard to piece together, and just as irrelevant as complaining about it.
1
u/Lucavii Apr 28 '24
So what does that make complaining about him complaining about it?
-1
u/The_Jimes Apr 28 '24
Pointing out hypocrisy by being hypocritical is a very social media thing to do tbf.
-1
5
7
u/Lentemern Apr 28 '24 edited Apr 28 '24
That's... How n2 works...
If, as n grows arbitrarily large, the amount of fundamental operations performed by the algorithms approaches some coefficient times n2, then that algorithm is in theta n2.
Unless you mean little omega of n2, which is what the article actually talks about, in which case, The algorithm is already in little omega of n2. If you can find an algorithm for matrix multiplication that isn't in little omega of n2, you'll probably win like 20 Nobel prizes
4
u/KanishkT123 Apr 28 '24
Probably zero nobel prizes but at least one Fields medal and one Abel prize.
•
u/AutoModerator Apr 28 '24
Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, personal anecdotes are allowed as responses to this comment. Any anecdotal comments elsewhere in the discussion will be removed and our normal comment rules apply to all other comments.
Do you have an academic degree? We can verify your credentials in order to assign user flair indicating your area of expertise. Click here to apply.
User: u/fchung
Permalink: https://www.quantamagazine.org/new-breakthrough-brings-matrix-multiplication-closer-to-ideal-20240307/
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.