Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs

by Anand Lal Shimpi on 10/31/2012 1:28 AM EST

Post Your Comment
Please log in or sign up to comment.

130 Comments

Back to Article

karasaj - Wednesday, October 31, 2012 - link
We should see what kinds of frames we get :)

(Yes you'd have single threaded cpu bottleneck, but I can dream) Reply
N4g4rok - Wednesday, October 31, 2012 - link
I bet that microstutter's a bastard though. Reply
Alexvrb - Saturday, November 03, 2012 - link
Put it in AFR mode! :P Reply
hansmuff - Sunday, November 04, 2012 - link
Hmm, with this kind of power, just render ALL POSSIBLE frames ahead for a full second, and just flip the display to the framebuffer that corresponds to the gameplay :) Reply
Rookierookie - Wednesday, October 31, 2012 - link
Yes, but can it run Crysis? Reply
SilthDraeth - Wednesday, October 31, 2012 - link
Wrong question. The correct question is;

Will it blend? Reply
losttsol - Wednesday, October 31, 2012 - link
Yes it can, as long as Crysis isn't running on top of Windows Vista. Reply
inighthawki - Wednesday, October 31, 2012 - link
What does Vista have to do with this? Reply
RussianSensation - Wednesday, October 31, 2012 - link
Over 9000 fps!

Good to see GPUs gaining traction outside of videogames, paving way for their use as a general purpose devices that can benefit a wide variety of usage patterns outside of games :) Hopefully the profits from these will mean even better GPUs for us gamers down the line. Reply
CeriseCogburn - Saturday, November 10, 2012 - link
You mean nVidia GPU's gaining traction, and far outperforming amd cores. Reply
UltraTech79 - Monday, November 05, 2012 - link
It could simulate a cpu/gpu though minecraft redstore that could play Crysis at 4K better than anything any of us have. Reply
yottabit - Wednesday, October 31, 2012 - link
Probably about 5-10% more than 4 way SLI. LOL Reply
martixy - Thursday, November 01, 2012 - link
And where exactly do you see a parallel between game code and a complex project like one of those? Reply
karasaj - Thursday, November 01, 2012 - link
I'm not sure if you're trolling or don't get it. Reply
This Guy - Thursday, November 01, 2012 - link
Looked on Bench. I can't find 18,688x Tesla K20's any where. I also looked for 18,688x AMD Optrons. This ain't like Anandtech. Normally Bench is updated when the article is released. Reply
jleach1 - Sunday, November 04, 2012 - link
1GPU per CPU. No Sli here.

These clusters dont parralelize workload like SLI does. Reply
lambchowder - Thursday, November 01, 2012 - link
50 spreadsheets per second Reply
mike55 - Wednesday, October 31, 2012 - link
This is pretty awesome. I'm jealous you got to go. The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

Some of the videos seem to be stopping early when I play them, and I have to skip ahead a bit to continue watching. Reply
Peanutsrevenge - Wednesday, October 31, 2012 - link
I've had that problem with all youtube videos when I watch the HD stream for a while.
It's not specific to Anandtech at all, for me at least.

This doesn't happen with <480p video.

Nice to know it's not just me Reply
B3an - Wednesday, October 31, 2012 - link
Same. For me, most YouTube vids will get about 75% of the way through and then stop.

And great article. Reply
Strunf - Wednesday, October 31, 2012 - link
He's probably not telling the whole thing, there's no way you could reduce the wire thickness by 20 or more by just increasing the voltage to 480V. Reply
A5 - Wednesday, October 31, 2012 - link
Uh, yes you can. Higher voltage = less current for the same power, which means you can use a thinner cable. Reply
Kevin G - Wednesday, October 31, 2012 - link
There is likely a reduction in size of the insulating layer due to lower amperage as well. Reply
relztes - Wednesday, October 31, 2012 - link
Voltage is 2.3 times higher, so current is 2.3 times lower for the same power. A wire 2.3x thinner (5.3x less cross sectional area) will give the same power loss. Insulation thickness would be slightly higher because it's based on voltage not current. Reply
HighTech4US - Wednesday, October 31, 2012 - link
If you double the Voltage you halve the Current for the same amount of Power.

Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much. Reply
HighTech4US - Wednesday, October 31, 2012 - link
> The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

V = Voltage
I = Current
R = Resistance
P = Power

P = V times I

So if you double the Voltage you halve the Current for the same amount of Power.

Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much.

So they determined a fixed power loss in the cables and reduced the size (which increased the resistance) of the cables so that the thinner cables (at 480 volts) had the same loss as the thicker cables (at 208 volts). Reply
Jaybus - Tuesday, February 19, 2013 - link
A 480 Vrms circuit draws less than half the current of a 208 Vrms circuit at the same power level. So the resistance of the wire can be more than double. Resistance of the wire is the resistivity of the copper material times the length divided by the cross-sectional area. .This means the radius is less than half, or the diameter of the wire for 480 V can be less than a quarter of the diameter of the wire for 208 V. Reply
ishbuggy - Wednesday, October 31, 2012 - link
This is an awesome article Anand! I would love to see more super-computing like this, and maybe some in-depth discussion of how super-computing works and differs from traditional computing architectures. Thanks for the great article though! Reply
truman5 - Wednesday, October 31, 2012 - link
+1

I just registered as a user just to say how awesome this article is! Reply
itnAAnti - Friday, November 09, 2012 - link
+1

I also just registered just to say that this is a great article! One of the best I have seen on Anandtech, keep up the awesome work. Perhaps you can look into the Parallella Adapteva project next! Reply
mayankleoboy1 - Wednesday, October 31, 2012 - link
Is there any scope of a FPGA or a group of FPGA's that replace standard algorithms with hardware implementations ?

Example : Fourier transforms, matrix multiplication. Reply
prashanth07 - Wednesday, October 31, 2012 - link
Yes, there is significant research going on. In our lab we had a pretty big group working of using FPGAs for HPC. The RC based supercomputer is called Novo-G. It was the worlds biggest publicly known RC super computer.

It is very small in physical size compared to some of the top conventional super computer, but for some specific compute requirements it comes close to beating top supercomputers. There was a major upgrade planned (around the time I was graduating) so it might even better now.
What exact type of computations? I don't remember very well (I didn't work on RC, I was mostly s/w guy in conventional HPC part of lab), you might be able to get some info by checking out few posters or papers abstract.

See:
http://chrec.org/
http://hcs.ufl.edu/ (very outdated, We didn't had anyone updating this site regularly)
http://www.alligator.org/news/campus/article_36cb1... (very abstract, low on specific info) Reply
Guspaz - Wednesday, October 31, 2012 - link
Just think, if Moore's law holds for another few decades, you'll see this performance in a smartphone in 20-30 years... Reply
Montrey - Saturday, November 03, 2012 - link
http://www.top500.org/blog/2008/01/20/top500_proje...

According to the paper, it takes 6 to 8 years for the #1 computer on the list to move to #500, and then another 8 to 10 years for that performance to be available in your average notebook computer. Not sure on notebook to smartphone, but it can't be very long. Reply
Doh! - Wednesday, October 31, 2012 - link
This kind of article keeps me coming back to anandtech.com. Awesome stuff. Reply
bl4C - Wednesday, October 31, 2012 - link
indeed, i was thinking:
"now THIS is an anandtech.com article"

great, thx ! Reply
gun_will_travel - Wednesday, October 31, 2012 - link
With all the settings turned up. Reply
dragonsqrrl - Wednesday, October 31, 2012 - link
Anand, I just want to confirm the core count on the Tesla K20. So this means one of the 15 SMX blocks is disabled on the K20? Reply
Ryan Smith - Wednesday, October 31, 2012 - link
We're basing our numbers off of the figures published by HPCWire.

http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...

For a given clockspeed of 732MHz and DP performance of 1.3TFLOPs, it has to be 14 SMXes. The math doesn't work for anything else. Reply
RussianSensation - Wednesday, October 31, 2012 - link
The article only states a range for DP of 1.2-1.3 Tflops.

The specification could be 705mhz with GPU boost to 732mhz x 2496 CUDA cores ~ 1.22 Tflops

http://www.heise.de/newsticker/meldung/Finale-Spez...

Not saying it can't be 2688 CUDA cores but you are using the high-end of the range when the article clearly lists a range of 1.2-1.3Tflops. I don't think you can just assume that it's 2688 without a confirmation given the range of values provided. Reply
Ryan Smith - Wednesday, October 31, 2012 - link
We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it. Reply
RussianSensation - Wednesday, October 31, 2012 - link
Hey Ryan, what about this?

http://www.brightsideofnews.com/news/2012/10/29/ti...

The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
46,645,248 CUDA Cores (yes, that's 46 million)
299,008 x86 cores
91.25 TB ECC GDDR5 memory
584 TB Registered ECC DDR3 memory
Each x86 core has 2GB of memory

1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

The Titan supercompute has 18,688 nodes.

46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card. Reply
Ryan Smith - Thursday, November 01, 2012 - link
Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

"The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

(Press agencies are bad about using power of 10, hence "710" TB). Reply
Ryan Smith - Thursday, November 01, 2012 - link
The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T... Reply
RussianSensation - Wednesday, October 31, 2012 - link
Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688. Reply
ypsylon - Wednesday, October 31, 2012 - link
Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

@Guspaz

If we get that kind of performance in phones then it is really scary prospect. :D Reply
twotwotwo - Wednesday, October 31, 2012 - link
We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

*Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical. Reply
JAH - Wednesday, October 31, 2012 - link
Must've been a fun trip for a geek/nerd. I'm jealous!

Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation? Reply
silverblue - Wednesday, October 31, 2012 - link
I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch. Reply
silverblue - Wednesday, October 31, 2012 - link
I WAS wondering, rather. Too early for posting, it seems. Reply
Anand Lal Shimpi - Wednesday, October 31, 2012 - link
It was an awesome trip, seriously one of the best. Talking to Dr. Messer was one of the highlights for sure, that guy is insanely smart and very passionate about his work.

Old hardware is traded in when you order the next round of upgrades :)

Take care,
Anand Reply
Jaybus - Tuesday, February 19, 2013 - link
Yes, great work! I suggest seeing about a trip to IBM Research or HRL Labs to investigate the DARPA SynAPSE project. That could be another really interesting trip and article. Reply
Mumrik - Wednesday, October 31, 2012 - link
I guess we're finally beyond the bad "But will it run Crysis?" jokes.

This was pretty amazing to watch. The challenges of putting something together at that scale are fascinating and intimidating. Reply
dishayu - Wednesday, October 31, 2012 - link
I am sad to never even have visited a datacenter. Would love to take a tour like this some day.

And also, gaming finally started paying off in real world as well, that's pretty sweet as well. :D Reply
poohbear - Wednesday, October 31, 2012 - link
sure but can it play crysis?? Reply
GTRagnarok - Wednesday, October 31, 2012 - link
Wouldn't it be so much more power efficient if they were able to use Intel's chips? Maybe they will redesign the whole thing in the future. Reply
A5 - Wednesday, October 31, 2012 - link
It would, but you'd have to take that up with Cray. Reply
Reikon - Wednesday, October 31, 2012 - link
Did anyone else notice on the second picture of the Titan Installation gallery that the guy is using a ridiculous amount of thermal paste for each CPU? Reply
Ian Cutress - Wednesday, October 31, 2012 - link
In this environment, where stability is key, he was probably taught that having a bit more is safer than having a bit less. No doubt the data center was designed around airflow software to ensure that heating issues do not arise based on an 'average' application of thermal material. Reply
maximumGPU - Wednesday, October 31, 2012 - link
here's to us gamers for advancing science and making the world a better place.
You're welcome!

Awesome article. Reply
piroroadkill - Wednesday, October 31, 2012 - link
That sounds like a downgrade, no matter how you slice it.. Reply
extide - Wednesday, October 31, 2012 - link
x2 I was thinking the same, especially at only 2.2Ghz!! I bet they are ~flat on CPU power and all the gain is from the GPU's. Reply
SunLord - Friday, November 02, 2012 - link
HPC is all highly multi-threaded by it's very nature which just happens to be about the only thing bulldozer is some what good at Reply
Jorange - Wednesday, October 31, 2012 - link
I wonder how many Petaflops this beast would have achieved if it used Sandy Bridge EP class chips? Anandtech's review of the Opteron 6276 vs Sandy Bridge Xeon EP showed that Intel was far more performant. Reply
SunLord - Friday, November 02, 2012 - link
I doubt will make enough of difference to be worth it given the main focus is all on the cuda gpu compute side Reply
CeriseCogburn - Saturday, November 10, 2012 - link
The AMD crap cores probably cause huge bottlenecks and lag the entire system and wind up as a large loss overall as they waste computer time. Reply
Jorange - Wednesday, October 31, 2012 - link
In a world in which millions of morons are enthralled by Honey Boo Boo and her band of genetic regressionists, it is great that scientists are advancing our understanding of the Universe. Without those 1%, one can only imagine the state our planet would be in. Reply
Ian Cutress - Wednesday, October 31, 2012 - link
I ported some Brownian motion code from CPU to GPU for my thesis and got a considerable increase (4000x over previously published data). Best thing was that the code scaled with GPUs. Having access to 20k GPUs with 2688 CUDA cores would just be gravy. Especially when simulating 10^12 and beyond independent particles. Reply
maximumGPU - Wednesday, October 31, 2012 - link
4000x ?! i don't think i've ever seen such a speedup, was that simply from 1 cpu to a 1 gpu?
i ported a monte carlo risk simulation (which also uses brownian motion, although i suspect for different purposes than yours) and saw about 300-400X speed up, thought that was at the top end of what you can get in terms of speed increases. Reply
Ian Cutress - Thursday, November 01, 2012 - link
It helped that the previously published data was a few generations back, so I had some Moore's Law advantage. The type of simulation for that research was essentially dropped there and then because it was so slow, and no-one had ever bothered to do it on newer hardware. I think a 2.2 GHz Nehalem single core simulation of my code compared to a GTX480 version of the code was 350x jump or so. Make that 16 cores vs 1 GPU (for a DP system) and it makes it more like 23x. Reply
Krysto - Wednesday, October 31, 2012 - link
It's 46 million GPU cores:

http://www.brightsideofnews.com/news/2012/10/29/ti...

This is embarrassing. Reply
iMacmatician - Wednesday, October 31, 2012 - link
No, it's not. BSN's numbers are incorrect. Reply
MetalManTN - Wednesday, October 31, 2012 - link
I've read articles on anandtech for years, but I register an account for the first time today to comment on how wonderful this article is. The scope of what is covered in the article is nothing short of fascinating, and the quality of the writing and attention to detail is superb. Thank you! Reply
Creig - Wednesday, October 31, 2012 - link
and we already know the answer is 42. Reply
spaghetti_taco - Wednesday, October 31, 2012 - link
Very interesting article, loved the 30,000 foot explanation of the supernova modeling, really helped me to understand in more concrete detail what types of things scientists are using these supercomputers for.

One thing I'd love to see is more in depth discussion of the networking. As you pointed out, the networking connectivity is just as important as the data processing, but you really just glossed over it. At least something as simple as vendor, models, host bus adapters, etc. Reply
WaitingForNehalem - Wednesday, October 31, 2012 - link
Anand, you should have visited me at the University of Tennessee! Reply
zero2dash - Wednesday, October 31, 2012 - link
I want to work there. :D
Holy Santa Claus shit. I think I'd need a motorized cart for the tour; I'd be too weak to walk.
[drool]

Great article and major kudos for all the photos...a few of these are gonna be desktop wallpapers. ;) Reply
BSMonitor - Wednesday, October 31, 2012 - link
If they used Sandy Bridge Xeon's, that'd be about 4 Mega Watts and no giant pipes with coolant!! Reply
Braincruser - Wednesday, October 31, 2012 - link
And new motherboards, memory systems, optimisations... practically they would have to exclude the GPU's to fit it in any realistic spending. Reply
JMC2000 - Wednesday, October 31, 2012 - link
Problem is, there more than likely was a queue of clients that couldn't wait for ORNL/Cray to completely replace every node, which would have taken much longer. Reply
Death666Angel - Thursday, November 08, 2012 - link
When they built this, the Intel stuff wasn't better than AMDs. And now they already have all this hardware which is tuned for the AMD stuff. Wouldn't make sense to update to Intel this time. Reply
CeriseCogburn - Saturday, November 10, 2012 - link
Wouldn't make sense only because nVidia is smoking the daylights out of the barely over 2Pflops from crap AMD cpus, adding multiple DOZENS of Pflops.

So neither Intel nor AMD can hang. Reply
wwwcd - Wednesday, October 31, 2012 - link
Moore's law will be broken...at basement and the above ;) Reply
tomek1984 - Wednesday, October 31, 2012 - link
Thus even four years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer is finally "yes, it can" LOL Reply
davegraham - Wednesday, October 31, 2012 - link
Anand,

you missed a huge data item in your article. by saying it's "just a bunch of SATA drives" you completely glossed over the WAY those SATA drives are organized (by DDN). DDN uses a wide/shallow bus topology to keep parallel writes going to the drives organized and processed in a VERY optimal manner. consequently, they're able to ingest at over 6GB/s per head...now, multiply that across the requirements from ORNL and you can see why this becomes important.

next time, don't just skip over it. ;)

D Reply
webmastir - Wednesday, October 31, 2012 - link
9 million $ a year to power it? Yikes.

Either way, I'm glad to have this in my state :) Reply
mikato - Wednesday, October 31, 2012 - link
Lots of hydroelectric in Tennessee :) Reply
bill.rookard - Wednesday, October 31, 2012 - link
I just can't imagine trying to build a program that scales up to that kind of performance, it's just staggering.

That being said, I have this little program I'd like to run on it... called... SkyNet.... Reply
mfenn - Wednesday, October 31, 2012 - link
I want more coverage of big iron! Hope you talk about it in depth on the podcast as well. Reply
harezzebra - Wednesday, October 31, 2012 - link
Hi Anand,

please do a indepth virtualization review, as you did earlier. your review is must for latest virtualization offerings from vmware, microsoft and citrix for unbiased decision making.

regards
harsh Reply
mdlam - Wednesday, October 31, 2012 - link
Will it run Crysis? Reply
tspacie - Wednesday, October 31, 2012 - link
Did you get any information about the network (yarc-2 , gemini) ? Cray's claim to fame has been their network architecture which is supposed to be a key contributor to the actual performance of the supercomputer. Reply
thebluephoenix - Wednesday, October 31, 2012 - link
They should have used Radeons 7970. You can buy 6 for the price of one K20, no ECC though (and for that is Fire Pro S). Reply
HighTech4US - Wednesday, October 31, 2012 - link
Toy GPUs have no place in HPC Computers. Reply
thebluephoenix - Wednesday, October 31, 2012 - link
1TFLOPS Double precision Toy?

http://i.top500.org/system/177430
http://i.top500.org/system/177154 Reply
garadante - Wednesday, October 31, 2012 - link
You missed the point in the article saying ECC memory was a -must- for a usage scenario like this. With nearly 20,000 GPUs, and all of that information being continuously communicated between the GPU memory and the GPU itself, without ECC, errors would pop up very quickly, and would make useful computation nigh impossible. Reply
HighTech4US - Thursday, November 01, 2012 - link
Can you guaranty that the Toy GPU you recommend would not produce a single error on a software run that takes 6 months?

You may accept an occasional graphics glitch while gaming but no HPC customer will. Reply
RussianSensation - Wednesday, October 31, 2012 - link
It's also about the specific software that works better with CUDA. GCN GPUs are no toys but the software support is nowhere near as prevalent in the professional GPGPU space compared to what NV has accomplished. This makes a lot of sense since NV essentially invented the GPGPU space starting with G80 in 2006. They spent a lot more money creating the CUDA eco-system and making sure they were the pioneers in this space. Given the higher widespread adoption of CUDA and proven track record of working with NV, larger companies are far more likely to go with Nvidia.

This is actually no different than what we saw in the Distributed Computing space. For more than half a decade, NV's GPUs were faster in many apps. As the DC community is more dynamic and adopts much quicker to moder code and technologies, in the last 3 years, almost all of the new DC projects are dominated by AMD GPUs.

On paper, HD7970 GE delivers 1.075 TFlops of DP and an 1200mhz 7970 has 1.23 Tflops. Without software support, for now it doesn't mean much in the professional space but the horsepower is already there. Reply
mikato - Wednesday, October 31, 2012 - link
Is this the supercomputer that will also be crunching away on the massive amount of data NSA is storing on everyone from strategic points in the telecom backbone?
http://www.wired.com/threatlevel/2012/03/ff_nsadat... Reply
Luscious - Wednesday, October 31, 2012 - link
I'm curious if they ever went near F@H during burn-in and testing to see how much PPD that supercomputer could do. Reply
just4U - Wednesday, October 31, 2012 - link
"The evolution of Cray's XT/XK lines simply stemmed from that point, with Opteron being the supported CPU of choice."

-----

I would have liked more of explaination here.. Does that mean that Intel's line doesn't work as well? Are there plans by Cray to move to Intel?

Power draw must be key. I wonder what sort of power use they'd be looking at running Intel's proccessors.

Great to see AMD in that super computer though.. I just have questions about future plans based on the current situation in the cpu market. Reply
Th-z - Wednesday, October 31, 2012 - link
Very nice article and love your last paragraph, Anand. It's a revelation. It is indeed incredible to think when we wanted that 3D accelerator to play GLQuake, it actually turned the wheel for great things to come. To think back, something as ordinary or insignificant as gaming actually paved the way to accelerate our knowledge today. This goes to show even ordinary things can morph into great things that one can never imagine. It really humbles you to not look down anything, to be respectful in this intertwined world, the same way it humbles us as human beings as we know more about the universe. Reply
pman6 - Wednesday, October 31, 2012 - link
so that's where all of AMD's revenue came from.

I was wondering who was buying AMD products Reply
CeriseCogburn - Saturday, November 10, 2012 - link
What amd revenue ?

Just look up and down, left and right here, the amd fanboys are legion - granted they can barely bone up 10 cents a week, but after a few years they can buy 2 generations back. Reply
lorribot - Wednesday, October 31, 2012 - link
Wonder if PC game piracy will be blamed for the failure of the supercomputer industry? Reply
Braincruser - Saturday, November 03, 2012 - link
Well, you see the more someone pirates games, the more money he has to invest in hardware. So the better the hardware gets. <- nothing beats simple logic. Reply
ClagMaster - Wednesday, October 31, 2012 - link
I have been working with supercomputers for 25 years.

Although parallelism is very important for processing large models, there is one important feature Mr Anand failed to discuss about Titain, choosing instead to obscess about transistor count and CPU's and GPU's.

And that is how much memory per box is available. 96GB? 256GB? of DDR3-1333 memory?

Problem is usually memory for those large reactor or coupled neutron-gamma transport problems analyzed with Monte Carlo or Advanced Discrete Ordinates, not the number of processors. Need lots of memory for the geometry, depleteable materials, and cross-section data.

And once the computing is done, how much space is available for storing the results? I have seen models so large that they run for 2 weeks with over 2000 processors only to fail because the file storage system ran out of space to store the output files. Reply
garadante - Wednesday, October 31, 2012 - link
You failed to read the entire article. Anand stated there was something like 32 GB of RAM per CPU and 6 GB per GPU (if I remember correctly, going off the top of my head) for a grand total of 710 TB RAM total as well as 1 PB of HDD storage available. Check back through the pages to find what exactly he posted. Reply
chemist1 - Wednesday, October 31, 2012 - link
So Sandy Bridge does ~160 GFlops on the LINPACK benchmark, while Titan should do ~20 PFlops, making it 125K times faster. 125K ~ 2^17, so with 17 doublings a PC will be as fast as Titan. If we assume 1.5 years/doubling, that gives us 25 years. And just imagine the capabilities of a 2037 supercomputer.... Reply
pandemonium - Wednesday, October 31, 2012 - link
What a treat, for you, to be able to witness this. Thanks for the adventurous article, Anand! :) Reply
martixy - Thursday, November 01, 2012 - link
Thank you for this article! It was absolutely awesome to read through it and a nice break from the usual consumer stuff.
Faith in humanity restored... :) Reply
bigboxes - Thursday, November 01, 2012 - link
I want to see the Performance tab on Windows Task Manager! :o Reply
Abi Dalzim - Thursday, November 01, 2012 - link
We all know the answer is 42. Reply
easp - Thursday, November 01, 2012 - link
For all the people speculating or suggesting that they should have used AMD GPUs or Intel CPUs, I think you need to think more like engineers, and less like "cowboys."

To get started, reread this:

"By adding support for ECC, enabling C++ and easier Visual Studio integration, NVIDIA believes that Fermi will open its Tesla business up to a group of clients that would previously not so much as speak to NVIDIA. ECC is the killer feature there."

Now, why on earth would ECC memory on a GPU (which, apparently, AMD wasn't offering) be important? The answer is simple: because a supercomputer that doesn't produce trustworthy results is worse than useless. Shaving some money off the power and cooling budget, or even a 50% boost to raw performance and/or price performance doesn't really matter if the results of calculations that take weeks or months to run can't be trusted.

Since this machine gets much of its compute performance from GPU operations, it is essential that it use GPUs that support ECC memory to allow both detection and recovery from memory corruption.

As to the CPUs, I'm not suggesting that Intel CPUs are significantly less computationally sound than AMDs, but Cray and ORNL already have extensive experience with AMDs CPUs and supporting hardware. Switching to Intel would almost certainly require additional validation work.

And don't underestimate the effort that goes into validating or optimizing these systems. Street price on the raw components alone has to be tens of millions of dollars. You can bet there is a lot of time and effort spent making sure things work right before things make it to full-scale production use.

I know a guy, PhD in Mathematics who used to work for Cray. These days, he's working for Boeing, where his full-time-job, as best as I can understand it, is to make sure that some CFD code they run from NASA is used properly so the results can be trusted. When he worked at Cray, his job was much more technical, he hand-optimized the assembly code for critical portions of application code from Cray's clients so it ran optimally on their vector CPU architecture. When doing computation at this scale things that are completely insignificant on individual consumer systems, or even enterprise servers, can be hugely important. Reply
CeriseCogburn - Monday, November 05, 2012 - link

I note, with 225,000 plus AMD cpu's , they get barely over 2 petaflops.

Add just 18,000 plus nVidia video cards, and ACHIEVE 20+ PETAFLOPS.

LOL - once again, amd sucks, and nVidia does not. Reply
Azethoth - Friday, November 02, 2012 - link
So you are sitting at home playing monopoly on your iMac? Reply
2kfire - Friday, November 02, 2012 - link
Can someone ban this joker? Reply
Daggarhawk - Friday, November 02, 2012 - link
Anand I LOVE this post. Breath of fresh air to get to see some of the real world applications for all this awesome tech we love. The interviews with scientists are especially fascinating and eye opening. Love the use of video to hear the insights, affect and passion of the researchers and see them at work. Please more of this sort of thing!! Reply
armandc001-tech lover - Saturday, November 03, 2012 - link
dammm what an article....! Reply
philosofa - Saturday, November 03, 2012 - link
Thank you Anand!

I've been noting till I'm blue in the face that GK-110 formed Nvidia's backup plan, should the GCN/Kepler power ratio not have worked out as much to AMD's disadvantage as it did (presumably 'Big Fermi' was a similar action plan being enacted).

It's not something I've seen anyone else say explicitly, so it's (confirmation bias aside) just lovely to hear that's your take too :) Reply
galaxyranger - Sunday, November 04, 2012 - link
I am not intelligent in any way but I enjoy reading the articles on this site a great deal. It's probably my favorite site.

What I would like to know is how does Titan compare in power to the CPU that was at the center of the star ship Voyager?

Also, surely a supercomputer like Titan is powerful enough to become self aware, if it had the right software made for it? Reply
Hethos - Tuesday, November 06, 2012 - link
For your second question, if it has the right software then any high-end consumer desktop PC could become self-aware. It would work rather sluggishly, compared to some sci-fi AIs like those in the Halo universe, but would potentially start learning and teaching itself. Reply
Daggarhawk - Tuesday, November 06, 2012 - link
Hethos that is not by any stretch certain. Since "self awareness" or "consciousness" has never been engineered or simulated, it is still quite uncertain what the specific requirements would be to produce it. Yet here you're not only postulating that all it would take would be the right OS but also how well it would perform. My guess is that Titan would much sooner be able to simulate a brain (and therefore be able to learn, think, dream, and do all the things that brains do) much sooner than it would /become/ "a brain" It look a 128 core computer a 10hr run render a few-minute simulation of a complete single celled organism . Hard to say how much more compute power it would take to fully simulate a brain and be able to interact with it in real time. as for other methods of AI, it may take totally different kinds of hardware and networking all together. Reply
quirksNquarks - Sunday, November 04, 2012 - link
Thank You,

this was a perfectly timed article - as people have forgotten why it is important the Technology keeps pushing boundaries regardless of *daily use* stagnation.

Also is a great example of why AMD does offer 16-core Chips. For These Kinds of Reasons! More Cores on One Chip means Less Chips are needed to be implemented - powered - tested - maintained.

an AMD 4 socket Mobo offers 64 cores. A personal Supercomputer. (Just think of how many they'll stuff full of ARM cores).

why Nvidia GPUs ?
a) Error Correction Code
b) CUDA

as to the CPUs...

http://www.newegg.ca/Product/Product.aspx?Item=N82...
$599 for every AMD 6274 chip (obvi they don't pay as much when ordering 300k).

vs

http://www.newegg.ca/Product/Product.aspx?Item=N82...
$1329 for an Intel Sandy Bridge equivalent which isn't really an equivalent considering these do NOT run in 4 socket designs. (obvi a little less when ordering in bulk numbers).

now multiple that price difference (the ratio) in the order of 10's of THOUSANDS!!

COMMON SENSE people.... Less Money for MORE CORES - or - More Money for LESS CORES ?
which road would YOU take? if you were footing the $ Bill.

but the Biggest thing to consider...

ORNL Upgraded from Jaguar <> Titan - which meant they ONLY needed a CHIP upgrade in that regards (( SAME SOCKET )) .. TRY THAT WITH INTEL > :P Reply
phoenicyan - Monday, November 05, 2012 - link
I'd like to see description of logical architecture. I guess it could be 16x16x73 3D Torus. Reply
XyaThir - Saturday, November 10, 2012 - link
Nice article, too bad there is nothing about the storage in this HPC cluster! Reply
logain7997 - Tuesday, November 13, 2012 - link
Imagine the PPD this baby could produce folding. 0.0 Reply
hyperblaster - Tuesday, December 04, 2012 - link
In addition to the bit about ECC, nVidia really made headway over AMD primarily because of CUDA. nVidia specially targeted a whole bunch of developers of popular academic software and loaned out free engineers. Experienced devs from nVidia would actually do most of the legwork to port MPI code to CUDA, while AMD did nothing of the sort. Therefore, there is now a large body of well-optimized computational simulation software that supports CUDA (and not OpenCL). However, this is slowly changing and OpenCL is catching on. Reply
Jag128 - Tuesday, January 15, 2013 - link
I wonder if it could play crysis on full? Reply
mikbe - Friday, June 28, 2013 - link
I was actually surprise at how many actual times the word "actually" was actually used. Actually, the way it's actually used in this actual article it's actually meaningless and can actually be dropped, actually, most of the actual time. Reply

Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs

Post Your Comment

130 Comments

Back to Article

karasaj - Wednesday, October 31, 2012 - link

N4g4rok - Wednesday, October 31, 2012 - link

Alexvrb - Saturday, November 03, 2012 - link

hansmuff - Sunday, November 04, 2012 - link

Rookierookie - Wednesday, October 31, 2012 - link

SilthDraeth - Wednesday, October 31, 2012 - link

losttsol - Wednesday, October 31, 2012 - link

inighthawki - Wednesday, October 31, 2012 - link

RussianSensation - Wednesday, October 31, 2012 - link

CeriseCogburn - Saturday, November 10, 2012 - link

UltraTech79 - Monday, November 05, 2012 - link

yottabit - Wednesday, October 31, 2012 - link

martixy - Thursday, November 01, 2012 - link

karasaj - Thursday, November 01, 2012 - link

This Guy - Thursday, November 01, 2012 - link

jleach1 - Sunday, November 04, 2012 - link

lambchowder - Thursday, November 01, 2012 - link

mike55 - Wednesday, October 31, 2012 - link

Peanutsrevenge - Wednesday, October 31, 2012 - link

B3an - Wednesday, October 31, 2012 - link

Strunf - Wednesday, October 31, 2012 - link

A5 - Wednesday, October 31, 2012 - link

Kevin G - Wednesday, October 31, 2012 - link

relztes - Wednesday, October 31, 2012 - link

HighTech4US - Wednesday, October 31, 2012 - link

HighTech4US - Wednesday, October 31, 2012 - link

Jaybus - Tuesday, February 19, 2013 - link

ishbuggy - Wednesday, October 31, 2012 - link

truman5 - Wednesday, October 31, 2012 - link

itnAAnti - Friday, November 09, 2012 - link

mayankleoboy1 - Wednesday, October 31, 2012 - link

prashanth07 - Wednesday, October 31, 2012 - link

Guspaz - Wednesday, October 31, 2012 - link

Montrey - Saturday, November 03, 2012 - link

Doh! - Wednesday, October 31, 2012 - link

bl4C - Wednesday, October 31, 2012 - link

gun_will_travel - Wednesday, October 31, 2012 - link

dragonsqrrl - Wednesday, October 31, 2012 - link

Ryan Smith - Wednesday, October 31, 2012 - link

RussianSensation - Wednesday, October 31, 2012 - link

Ryan Smith - Wednesday, October 31, 2012 - link

RussianSensation - Wednesday, October 31, 2012 - link

Ryan Smith - Thursday, November 01, 2012 - link

Ryan Smith - Thursday, November 01, 2012 - link

RussianSensation - Wednesday, October 31, 2012 - link

ypsylon - Wednesday, October 31, 2012 - link

twotwotwo - Wednesday, October 31, 2012 - link

JAH - Wednesday, October 31, 2012 - link

silverblue - Wednesday, October 31, 2012 - link

silverblue - Wednesday, October 31, 2012 - link

Anand Lal Shimpi - Wednesday, October 31, 2012 - link

Jaybus - Tuesday, February 19, 2013 - link

Mumrik - Wednesday, October 31, 2012 - link

dishayu - Wednesday, October 31, 2012 - link

poohbear - Wednesday, October 31, 2012 - link

GTRagnarok - Wednesday, October 31, 2012 - link

A5 - Wednesday, October 31, 2012 - link

Reikon - Wednesday, October 31, 2012 - link

Ian Cutress - Wednesday, October 31, 2012 - link

maximumGPU - Wednesday, October 31, 2012 - link

piroroadkill - Wednesday, October 31, 2012 - link

extide - Wednesday, October 31, 2012 - link

SunLord - Friday, November 02, 2012 - link

Jorange - Wednesday, October 31, 2012 - link

SunLord - Friday, November 02, 2012 - link

CeriseCogburn - Saturday, November 10, 2012 - link

Jorange - Wednesday, October 31, 2012 - link

Ian Cutress - Wednesday, October 31, 2012 - link

maximumGPU - Wednesday, October 31, 2012 - link

Ian Cutress - Thursday, November 01, 2012 - link

Krysto - Wednesday, October 31, 2012 - link

iMacmatician - Wednesday, October 31, 2012 - link

MetalManTN - Wednesday, October 31, 2012 - link

Creig - Wednesday, October 31, 2012 - link

spaghetti_taco - Wednesday, October 31, 2012 - link

WaitingForNehalem - Wednesday, October 31, 2012 - link