Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs

Home CPUs

by Anand Lal Shimpi on 10/31/2012 1:28:00 AM
Posted in CPUs , Cloud Computing , IT Computing , HPC , GPUs , nvidia

126 Comments

Post a Comment

18.6K GPU SLI? by karasaj on Wednesday, October 31, 2012

We should see what kinds of frames we get :)

(Yes you'd have single threaded cpu bottleneck, but I can dream)

karasaj

RE: 18.6K GPU SLI? by N4g4rok on Wednesday, October 31, 2012

I bet that microstutter's a bastard though.

N4g4rok

RE: 18.6K GPU SLI? by Alexvrb on Saturday, November 03, 2012

Put it in AFR mode! :P

Alexvrb

RE: 18.6K GPU SLI? by hansmuff on Sunday, November 04, 2012

Hmm, with this kind of power, just render ALL POSSIBLE frames ahead for a full second, and just flip the display to the framebuffer that corresponds to the gameplay :)

hansmuff

RE: 18.6K GPU SLI? by Rookierookie on Wednesday, October 31, 2012

Yes, but can it run Crysis?

Rookierookie

RE: 18.6K GPU SLI? by SilthDraeth on Wednesday, October 31, 2012

Wrong question. The correct question is;

Will it blend?

SilthDraeth

RE: 18.6K GPU SLI? by losttsol on Wednesday, October 31, 2012

Yes it can, as long as Crysis isn't running on top of Windows Vista.

losttsol

RE: 18.6K GPU SLI? by inighthawki on Wednesday, October 31, 2012

What does Vista have to do with this?

inighthawki

RE: 18.6K GPU SLI? by RussianSensation on Wednesday, October 31, 2012

Over 9000 fps!

Good to see GPUs gaining traction outside of videogames, paving way for their use as a general purpose devices that can benefit a wide variety of usage patterns outside of games :) Hopefully the profits from these will mean even better GPUs for us gamers down the line.

RussianSensation

RE: 18.6K GPU SLI? by CeriseCogburn on Saturday, November 10, 2012

You mean nVidia GPU's gaining traction, and far outperforming amd cores.

CeriseCogburn

RE: 18.6K GPU SLI? by UltraTech79 on Monday, November 05, 2012

It could simulate a cpu/gpu though minecraft redstore that could play Crysis at 4K better than anything any of us have.

UltraTech79

RE: 18.6K GPU SLI? by yottabit on Wednesday, October 31, 2012

Probably about 5-10% more than 4 way SLI. LOL

yottabit

RE: 18.6K GPU SLI? by martixy on Thursday, November 01, 2012

And where exactly do you see a parallel between game code and a complex project like one of those?

martixy

RE: 18.6K GPU SLI? by karasaj on Thursday, November 01, 2012

I'm not sure if you're trolling or don't get it.

karasaj

RE: 18.6K GPU SLI? by This Guy on Thursday, November 01, 2012

Looked on Bench. I can't find 18,688x Tesla K20's any where. I also looked for 18,688x AMD Optrons. This ain't like Anandtech. Normally Bench is updated when the article is released.

This Guy

RE: 18.6K GPU SLI? by jleach1 on Sunday, November 04, 2012

1GPU per CPU. No Sli here.

These clusters dont parralelize workload like SLI does.

jleach1

RE: 18.6K GPU SLI? by lambchowder on Thursday, November 01, 2012

50 spreadsheets per second

lambchowder

Supercomputer by mike55 on Wednesday, October 31, 2012

This is pretty awesome. I'm jealous you got to go. The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

Some of the videos seem to be stopping early when I play them, and I have to skip ahead a bit to continue watching.

mike55

RE: Supercomputer by Peanutsrevenge on Wednesday, October 31, 2012

I've had that problem with all youtube videos when I watch the HD stream for a while.
It's not specific to Anandtech at all, for me at least.

This doesn't happen with <480p video.

Nice to know it's not just me

Peanutsrevenge

RE: Supercomputer by B3an on Wednesday, October 31, 2012

Same. For me, most YouTube vids will get about 75% of the way through and then stop.

And great article.

B3an

RE: Supercomputer by Strunf on Wednesday, October 31, 2012

He's probably not telling the whole thing, there's no way you could reduce the wire thickness by 20 or more by just increasing the voltage to 480V.

Strunf

RE: Supercomputer by A5 on Wednesday, October 31, 2012

Uh, yes you can. Higher voltage = less current for the same power, which means you can use a thinner cable.

RE: Supercomputer by Kevin G on Wednesday, October 31, 2012

There is likely a reduction in size of the insulating layer due to lower amperage as well.

Kevin G

RE: Supercomputer by relztes on Wednesday, October 31, 2012

Voltage is 2.3 times higher, so current is 2.3 times lower for the same power. A wire 2.3x thinner (5.3x less cross sectional area) will give the same power loss. Insulation thickness would be slightly higher because it's based on voltage not current.

relztes

RE: Supercomputer by HighTech4US on Wednesday, October 31, 2012

If you double the Voltage you halve the Current for the same amount of Power.

Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much.

HighTech4US

RE: Supercomputer by HighTech4US on Wednesday, October 31, 2012

> The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

V = Voltage
I = Current
R = Resistance
P = Power

P = V times I

So if you double the Voltage you halve the Current for the same amount of Power.

Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much.

So they determined a fixed power loss in the cables and reduced the size (which increased the resistance) of the cables so that the thinner cables (at 480 volts) had the same loss as the thicker cables (at 208 volts).

HighTech4US

Great Article by ishbuggy on Wednesday, October 31, 2012

This is an awesome article Anand! I would love to see more super-computing like this, and maybe some in-depth discussion of how super-computing works and differs from traditional computing architectures. Thanks for the great article though!

ishbuggy

RE: Great Article by truman5 on Wednesday, October 31, 2012

+1

I just registered as a user just to say how awesome this article is!

truman5

RE: Great Article by itnAAnti on Friday, November 09, 2012

+1

I also just registered just to say that this is a great article! One of the best I have seen on Anandtech, keep up the awesome work. Perhaps you can look into the Parallella Adapteva project next!

itnAAnti

FPGA's ? by mayankleoboy1 on Wednesday, October 31, 2012

Is there any scope of a FPGA or a group of FPGA's that replace standard algorithms with hardware implementations ?

Example : Fourier transforms, matrix multiplication.

mayankleoboy1

RE: FPGA's ? by prashanth07 on Wednesday, October 31, 2012

Yes, there is significant research going on. In our lab we had a pretty big group working of using FPGAs for HPC. The RC based supercomputer is called Novo-G. It was the worlds biggest publicly known RC super computer.

It is very small in physical size compared to some of the top conventional super computer, but for some specific compute requirements it comes close to beating top supercomputers. There was a major upgrade planned (around the time I was graduating) so it might even better now.
What exact type of computations? I don't remember very well (I didn't work on RC, I was mostly s/w guy in conventional HPC part of lab), you might be able to get some info by checking out few posters or papers abstract.

See:
http://chrec.org/
http://hcs.ufl.edu/ (very outdated, We didn't had anyone updating this site regularly)
http://www.alligator.org/news/campus/article_36cb1... (very abstract, low on specific info)

prashanth07

Soon... Soon... by Guspaz on Wednesday, October 31, 2012

Just think, if Moore's law holds for another few decades, you'll see this performance in a smartphone in 20-30 years...

Guspaz

RE: Soon... Soon... by Montrey on Saturday, November 03, 2012

http://www.top500.org/blog/2008/01/20/top500_proje...

According to the paper, it takes 6 to 8 years for the #1 computer on the list to move to #500, and then another 8 to 10 years for that performance to be available in your average notebook computer. Not sure on notebook to smartphone, but it can't be very long.

Montrey

Awesome article! by Doh! on Wednesday, October 31, 2012

This kind of article keeps me coming back to anandtech.com. Awesome stuff.

Doh!

RE: Awesome article! by bl4C on Wednesday, October 31, 2012

indeed, i was thinking:
"now THIS is an anandtech.com article"

great, thx !

bl4C

But can it run Crysis? by gun_will_travel on Wednesday, October 31, 2012

With all the settings turned up.

gun_will_travel

Tesla K20 by dragonsqrrl on Wednesday, October 31, 2012

Anand, I just want to confirm the core count on the Tesla K20. So this means one of the 15 SMX blocks is disabled on the K20?

dragonsqrrl

RE: Tesla K20 by Ryan Smith on Wednesday, October 31, 2012

We're basing our numbers off of the figures published by HPCWire.

http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...

For a given clockspeed of 732MHz and DP performance of 1.3TFLOPs, it has to be 14 SMXes. The math doesn't work for anything else.

Ryan Smith

RE: Tesla K20 by RussianSensation on Wednesday, October 31, 2012

The article only states a range for DP of 1.2-1.3 Tflops.

The specification could be 705mhz with GPU boost to 732mhz x 2496 CUDA cores ~ 1.22 Tflops

http://www.heise.de/newsticker/meldung/Finale-Spez...

Not saying it can't be 2688 CUDA cores but you are using the high-end of the range when the article clearly lists a range of 1.2-1.3Tflops. I don't think you can just assume that it's 2688 without a confirmation given the range of values provided.

RussianSensation

RE: Tesla K20 by Ryan Smith on Wednesday, October 31, 2012

We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it.

Ryan Smith

RE: Tesla K20 by RussianSensation on Wednesday, October 31, 2012

Hey Ryan, what about this?

http://www.brightsideofnews.com/news/2012/10/29/ti...

The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
46,645,248 CUDA Cores (yes, that's 46 million)
299,008 x86 cores
91.25 TB ECC GDDR5 memory
584 TB Registered ECC DDR3 memory
Each x86 core has 2GB of memory

1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

The Titan supercompute has 18,688 nodes.

46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card.

RussianSensation

RE: Tesla K20 by Ryan Smith on Thursday, November 01, 2012

Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

"The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

(Press agencies are bad about using power of 10, hence "710" TB).

Ryan Smith

RE: Tesla K20 by Ryan Smith on Thursday, November 01, 2012

The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T...

Ryan Smith

RE: Tesla K20 by RussianSensation on Wednesday, October 31, 2012

Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688.

RussianSensation

Jealous for sure am I... by ypsylon on Wednesday, October 31, 2012

Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

@Guspaz

If we get that kind of performance in phones then it is really scary prospect. :D

ypsylon

The Moore's Law thing is fun to think about by twotwotwo on Wednesday, October 31, 2012

We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

*Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical.

twotwotwo

Great Work by JAH on Wednesday, October 31, 2012

Must've been a fun trip for a geek/nerd. I'm jealous!

Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation?

JAH

RE: Great Work by silverblue on Wednesday, October 31, 2012

I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch.

silverblue

RE: Great Work by silverblue on Wednesday, October 31, 2012

I WAS wondering, rather. Too early for posting, it seems.

silverblue

RE: Great Work by Anand Lal Shimpi on Wednesday, October 31, 2012

It was an awesome trip, seriously one of the best. Talking to Dr. Messer was one of the highlights for sure, that guy is insanely smart and very passionate about his work.

Old hardware is traded in when you order the next round of upgrades :)

Take care,
Anand

Anand Lal Shimpi

Subject

Comment

Post Comment

Please login or register to post a comment.
User Name

Password

Remember me?