POST A COMMENT

130 Comments

Back to Article

  • karasaj - Wednesday, October 31, 2012 - link

    We should see what kinds of frames we get :)

    (Yes you'd have single threaded cpu bottleneck, but I can dream)
    Reply
  • N4g4rok - Wednesday, October 31, 2012 - link

    I bet that microstutter's a bastard though. Reply
  • Alexvrb - Saturday, November 03, 2012 - link

    Put it in AFR mode! :P Reply
  • hansmuff - Sunday, November 04, 2012 - link

    Hmm, with this kind of power, just render ALL POSSIBLE frames ahead for a full second, and just flip the display to the framebuffer that corresponds to the gameplay :) Reply
  • Rookierookie - Wednesday, October 31, 2012 - link

    Yes, but can it run Crysis? Reply
  • SilthDraeth - Wednesday, October 31, 2012 - link

    Wrong question. The correct question is;

    Will it blend?
    Reply
  • losttsol - Wednesday, October 31, 2012 - link

    Yes it can, as long as Crysis isn't running on top of Windows Vista. Reply
  • inighthawki - Wednesday, October 31, 2012 - link

    What does Vista have to do with this? Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Over 9000 fps!

    Good to see GPUs gaining traction outside of videogames, paving way for their use as a general purpose devices that can benefit a wide variety of usage patterns outside of games :) Hopefully the profits from these will mean even better GPUs for us gamers down the line.
    Reply
  • CeriseCogburn - Saturday, November 10, 2012 - link

    You mean nVidia GPU's gaining traction, and far outperforming amd cores. Reply
  • UltraTech79 - Monday, November 05, 2012 - link

    It could simulate a cpu/gpu though minecraft redstore that could play Crysis at 4K better than anything any of us have. Reply
  • yottabit - Wednesday, October 31, 2012 - link

    Probably about 5-10% more than 4 way SLI. LOL Reply
  • martixy - Thursday, November 01, 2012 - link

    And where exactly do you see a parallel between game code and a complex project like one of those? Reply
  • karasaj - Thursday, November 01, 2012 - link

    I'm not sure if you're trolling or don't get it. Reply
  • This Guy - Thursday, November 01, 2012 - link

    Looked on Bench. I can't find 18,688x Tesla K20's any where. I also looked for 18,688x AMD Optrons. This ain't like Anandtech. Normally Bench is updated when the article is released. Reply
  • jleach1 - Sunday, November 04, 2012 - link

    1GPU per CPU. No Sli here.

    These clusters dont parralelize workload like SLI does.
    Reply
  • lambchowder - Thursday, November 01, 2012 - link

    50 spreadsheets per second Reply
  • mike55 - Wednesday, October 31, 2012 - link

    This is pretty awesome. I'm jealous you got to go. The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

    Some of the videos seem to be stopping early when I play them, and I have to skip ahead a bit to continue watching.
    Reply
  • Peanutsrevenge - Wednesday, October 31, 2012 - link

    I've had that problem with all youtube videos when I watch the HD stream for a while.
    It's not specific to Anandtech at all, for me at least.

    This doesn't happen with <480p video.

    Nice to know it's not just me
    Reply
  • B3an - Wednesday, October 31, 2012 - link

    Same. For me, most YouTube vids will get about 75% of the way through and then stop.

    And great article.
    Reply
  • Strunf - Wednesday, October 31, 2012 - link

    He's probably not telling the whole thing, there's no way you could reduce the wire thickness by 20 or more by just increasing the voltage to 480V. Reply
  • A5 - Wednesday, October 31, 2012 - link

    Uh, yes you can. Higher voltage = less current for the same power, which means you can use a thinner cable. Reply
  • Kevin G - Wednesday, October 31, 2012 - link

    There is likely a reduction in size of the insulating layer due to lower amperage as well. Reply
  • relztes - Wednesday, October 31, 2012 - link

    Voltage is 2.3 times higher, so current is 2.3 times lower for the same power. A wire 2.3x thinner (5.3x less cross sectional area) will give the same power loss. Insulation thickness would be slightly higher because it's based on voltage not current. Reply
  • HighTech4US - Wednesday, October 31, 2012 - link

    If you double the Voltage you halve the Current for the same amount of Power.

    Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much.
    Reply
  • HighTech4US - Wednesday, October 31, 2012 - link

    > The comment about the thickness requirement of the cables for 480V compared to 208V in the first power delivery video is staggering. I'm surprised there's such a difference.

    V = Voltage
    I = Current
    R = Resistance
    P = Power

    P = V times I

    So if you double the Voltage you halve the Current for the same amount of Power.

    Power Loss (in the cables) is calculated as I squared times R. Since I is 1/2 at 480 Volts the Power Loss is 1/4 (1/2 squared) as much.

    So they determined a fixed power loss in the cables and reduced the size (which increased the resistance) of the cables so that the thinner cables (at 480 volts) had the same loss as the thicker cables (at 208 volts).
    Reply
  • Jaybus - Tuesday, February 19, 2013 - link

    A 480 Vrms circuit draws less than half the current of a 208 Vrms circuit at the same power level. So the resistance of the wire can be more than double. Resistance of the wire is the resistivity of the copper material times the length divided by the cross-sectional area. .This means the radius is less than half, or the diameter of the wire for 480 V can be less than a quarter of the diameter of the wire for 208 V. Reply
  • ishbuggy - Wednesday, October 31, 2012 - link

    This is an awesome article Anand! I would love to see more super-computing like this, and maybe some in-depth discussion of how super-computing works and differs from traditional computing architectures. Thanks for the great article though! Reply
  • truman5 - Wednesday, October 31, 2012 - link

    +1

    I just registered as a user just to say how awesome this article is!
    Reply
  • itnAAnti - Friday, November 09, 2012 - link

    +1

    I also just registered just to say that this is a great article! One of the best I have seen on Anandtech, keep up the awesome work. Perhaps you can look into the Parallella Adapteva project next!
    Reply
  • mayankleoboy1 - Wednesday, October 31, 2012 - link

    Is there any scope of a FPGA or a group of FPGA's that replace standard algorithms with hardware implementations ?

    Example : Fourier transforms, matrix multiplication.
    Reply
  • prashanth07 - Wednesday, October 31, 2012 - link

    Yes, there is significant research going on. In our lab we had a pretty big group working of using FPGAs for HPC. The RC based supercomputer is called Novo-G. It was the worlds biggest publicly known RC super computer.

    It is very small in physical size compared to some of the top conventional super computer, but for some specific compute requirements it comes close to beating top supercomputers. There was a major upgrade planned (around the time I was graduating) so it might even better now.
    What exact type of computations? I don't remember very well (I didn't work on RC, I was mostly s/w guy in conventional HPC part of lab), you might be able to get some info by checking out few posters or papers abstract.

    See:
    http://chrec.org/
    http://hcs.ufl.edu/ (very outdated, We didn't had anyone updating this site regularly)
    http://www.alligator.org/news/campus/article_36cb1... (very abstract, low on specific info)
    Reply
  • Guspaz - Wednesday, October 31, 2012 - link

    Just think, if Moore's law holds for another few decades, you'll see this performance in a smartphone in 20-30 years... Reply
  • Montrey - Saturday, November 03, 2012 - link

    http://www.top500.org/blog/2008/01/20/top500_proje...

    According to the paper, it takes 6 to 8 years for the #1 computer on the list to move to #500, and then another 8 to 10 years for that performance to be available in your average notebook computer. Not sure on notebook to smartphone, but it can't be very long.
    Reply
  • Doh! - Wednesday, October 31, 2012 - link

    This kind of article keeps me coming back to anandtech.com. Awesome stuff. Reply
  • bl4C - Wednesday, October 31, 2012 - link

    indeed, i was thinking:
    "now THIS is an anandtech.com article"

    great, thx !
    Reply
  • gun_will_travel - Wednesday, October 31, 2012 - link

    With all the settings turned up. Reply
  • dragonsqrrl - Wednesday, October 31, 2012 - link

    Anand, I just want to confirm the core count on the Tesla K20. So this means one of the 15 SMX blocks is disabled on the K20? Reply
  • Ryan Smith - Wednesday, October 31, 2012 - link

    We're basing our numbers off of the figures published by HPCWire.

    http://www.hpcwire.com/hpcwire/2012-10-29/titan_se...

    For a given clockspeed of 732MHz and DP performance of 1.3TFLOPs, it has to be 14 SMXes. The math doesn't work for anything else.
    Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    The article only states a range for DP of 1.2-1.3 Tflops.

    The specification could be 705mhz with GPU boost to 732mhz x 2496 CUDA cores ~ 1.22 Tflops

    http://www.heise.de/newsticker/meldung/Finale-Spez...

    Not saying it can't be 2688 CUDA cores but you are using the high-end of the range when the article clearly lists a range of 1.2-1.3Tflops. I don't think you can just assume that it's 2688 without a confirmation given the range of values provided.
    Reply
  • Ryan Smith - Wednesday, October 31, 2012 - link

    We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it. Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Hey Ryan, what about this?

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
    46,645,248 CUDA Cores (yes, that's 46 million)
    299,008 x86 cores
    91.25 TB ECC GDDR5 memory
    584 TB Registered ECC DDR3 memory
    Each x86 core has 2GB of memory

    1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

    The Titan supercompute has 18,688 nodes.

    46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card.
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

    http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

    "The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

    18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

    (Press agencies are bad about using power of 10, hence "710" TB).
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T... Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
    http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

    46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688.
    Reply
  • ypsylon - Wednesday, October 31, 2012 - link

    Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

    @Guspaz

    If we get that kind of performance in phones then it is really scary prospect. :D
    Reply
  • twotwotwo - Wednesday, October 31, 2012 - link

    We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

    *Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

    I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

    I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical.
    Reply
  • JAH - Wednesday, October 31, 2012 - link

    Must've been a fun trip for a geek/nerd. I'm jealous!

    Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation?
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

    Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch.
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I WAS wondering, rather. Too early for posting, it seems. Reply
  • Anand Lal Shimpi - Wednesday, October 31, 2012 - link

    It was an awesome trip, seriously one of the best. Talking to Dr. Messer was one of the highlights for sure, that guy is insanely smart and very passionate about his work.

    Old hardware is traded in when you order the next round of upgrades :)

    Take care,
    Anand
    Reply
  • Jaybus - Tuesday, February 19, 2013 - link

    Yes, great work! I suggest seeing about a trip to IBM Research or HRL Labs to investigate the DARPA SynAPSE project. That could be another really interesting trip and article. Reply
  • Mumrik - Wednesday, October 31, 2012 - link

    I guess we're finally beyond the bad "But will it run Crysis?" jokes.

    This was pretty amazing to watch. The challenges of putting something together at that scale are fascinating and intimidating.
    Reply
  • dishayu - Wednesday, October 31, 2012 - link

    I am sad to never even have visited a datacenter. Would love to take a tour like this some day.

    And also, gaming finally started paying off in real world as well, that's pretty sweet as well. :D
    Reply
  • poohbear - Wednesday, October 31, 2012 - link

    sure but can it play crysis?? Reply
  • GTRagnarok - Wednesday, October 31, 2012 - link

    Wouldn't it be so much more power efficient if they were able to use Intel's chips? Maybe they will redesign the whole thing in the future. Reply
  • A5 - Wednesday, October 31, 2012 - link

    It would, but you'd have to take that up with Cray. Reply
  • Reikon - Wednesday, October 31, 2012 - link

    Did anyone else notice on the second picture of the Titan Installation gallery that the guy is using a ridiculous amount of thermal paste for each CPU? Reply
  • Ian Cutress - Wednesday, October 31, 2012 - link

    In this environment, where stability is key, he was probably taught that having a bit more is safer than having a bit less. No doubt the data center was designed around airflow software to ensure that heating issues do not arise based on an 'average' application of thermal material. Reply
  • maximumGPU - Wednesday, October 31, 2012 - link

    here's to us gamers for advancing science and making the world a better place.
    You're welcome!

    Awesome article.
    Reply
  • piroroadkill - Wednesday, October 31, 2012 - link

    That sounds like a downgrade, no matter how you slice it.. Reply
  • extide - Wednesday, October 31, 2012 - link

    x2 I was thinking the same, especially at only 2.2Ghz!! I bet they are ~flat on CPU power and all the gain is from the GPU's. Reply
  • SunLord - Friday, November 02, 2012 - link

    HPC is all highly multi-threaded by it's very nature which just happens to be about the only thing bulldozer is some what good at Reply
  • Jorange - Wednesday, October 31, 2012 - link

    I wonder how many Petaflops this beast would have achieved if it used Sandy Bridge EP class chips? Anandtech's review of the Opteron 6276 vs Sandy Bridge Xeon EP showed that Intel was far more performant. Reply
  • SunLord - Friday, November 02, 2012 - link

    I doubt will make enough of difference to be worth it given the main focus is all on the cuda gpu compute side Reply
  • CeriseCogburn - Saturday, November 10, 2012 - link

    The AMD crap cores probably cause huge bottlenecks and lag the entire system and wind up as a large loss overall as they waste computer time. Reply
  • Jorange - Wednesday, October 31, 2012 - link

    In a world in which millions of morons are enthralled by Honey Boo Boo and her band of genetic regressionists, it is great that scientists are advancing our understanding of the Universe. Without those 1%, one can only imagine the state our planet would be in. Reply
  • Ian Cutress - Wednesday, October 31, 2012 - link

    I ported some Brownian motion code from CPU to GPU for my thesis and got a considerable increase (4000x over previously published data). Best thing was that the code scaled with GPUs. Having access to 20k GPUs with 2688 CUDA cores would just be gravy. Especially when simulating 10^12 and beyond independent particles. Reply
  • maximumGPU - Wednesday, October 31, 2012 - link

    4000x ?! i don't think i've ever seen such a speedup, was that simply from 1 cpu to a 1 gpu?
    i ported a monte carlo risk simulation (which also uses brownian motion, although i suspect for different purposes than yours) and saw about 300-400X speed up, thought that was at the top end of what you can get in terms of speed increases.
    Reply
  • Ian Cutress - Thursday, November 01, 2012 - link

    It helped that the previously published data was a few generations back, so I had some Moore's Law advantage. The type of simulation for that research was essentially dropped there and then because it was so slow, and no-one had ever bothered to do it on newer hardware. I think a 2.2 GHz Nehalem single core simulation of my code compared to a GTX480 version of the code was 350x jump or so. Make that 16 cores vs 1 GPU (for a DP system) and it makes it more like 23x. Reply
  • Krysto - Wednesday, October 31, 2012 - link

    It's 46 million GPU cores:

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    This is embarrassing.
    Reply
  • iMacmatician - Wednesday, October 31, 2012 - link

    No, it's not. BSN's numbers are incorrect. Reply
  • MetalManTN - Wednesday, October 31, 2012 - link

    I've read articles on anandtech for years, but I register an account for the first time today to comment on how wonderful this article is. The scope of what is covered in the article is nothing short of fascinating, and the quality of the writing and attention to detail is superb. Thank you! Reply
  • Creig - Wednesday, October 31, 2012 - link

    and we already know the answer is 42. Reply
  • spaghetti_taco - Wednesday, October 31, 2012 - link

    Very interesting article, loved the 30,000 foot explanation of the supernova modeling, really helped me to understand in more concrete detail what types of things scientists are using these supercomputers for.

    One thing I'd love to see is more in depth discussion of the networking. As you pointed out, the networking connectivity is just as important as the data processing, but you really just glossed over it. At least something as simple as vendor, models, host bus adapters, etc.
    Reply
  • WaitingForNehalem - Wednesday, October 31, 2012 - link

    Anand, you should have visited me at the University of Tennessee! Reply
  • zero2dash - Wednesday, October 31, 2012 - link

    I want to work there. :D
    Holy Santa Claus shit. I think I'd need a motorized cart for the tour; I'd be too weak to walk.
    [drool]

    Great article and major kudos for all the photos...a few of these are gonna be desktop wallpapers. ;)
    Reply
  • BSMonitor - Wednesday, October 31, 2012 - link

    If they used Sandy Bridge Xeon's, that'd be about 4 Mega Watts and no giant pipes with coolant!! Reply
  • Braincruser - Wednesday, October 31, 2012 - link

    And new motherboards, memory systems, optimisations... practically they would have to exclude the GPU's to fit it in any realistic spending. Reply
  • JMC2000 - Wednesday, October 31, 2012 - link

    Problem is, there more than likely was a queue of clients that couldn't wait for ORNL/Cray to completely replace every node, which would have taken much longer. Reply
  • Death666Angel - Thursday, November 08, 2012 - link

    When they built this, the Intel stuff wasn't better than AMDs. And now they already have all this hardware which is tuned for the AMD stuff. Wouldn't make sense to update to Intel this time. Reply
  • CeriseCogburn - Saturday, November 10, 2012 - link

    Wouldn't make sense only because nVidia is smoking the daylights out of the barely over 2Pflops from crap AMD cpus, adding multiple DOZENS of Pflops.

    So neither Intel nor AMD can hang.
    Reply
  • wwwcd - Wednesday, October 31, 2012 - link

    Moore's law will be broken...at basement and the above ;) Reply
  • tomek1984 - Wednesday, October 31, 2012 - link

    Thus even four years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer is finally "yes, it can" LOL Reply
  • davegraham - Wednesday, October 31, 2012 - link

    Anand,

    you missed a huge data item in your article. by saying it's "just a bunch of SATA drives" you completely glossed over the WAY those SATA drives are organized (by DDN). DDN uses a wide/shallow bus topology to keep parallel writes going to the drives organized and processed in a VERY optimal manner. consequently, they're able to ingest at over 6GB/s per head...now, multiply that across the requirements from ORNL and you can see why this becomes important.

    next time, don't just skip over it. ;)

    D
    Reply
  • webmastir - Wednesday, October 31, 2012 - link

    9 million $ a year to power it? Yikes.

    Either way, I'm glad to have this in my state :)
    Reply
  • mikato - Wednesday, October 31, 2012 - link

    Lots of hydroelectric in Tennessee :) Reply
  • bill.rookard - Wednesday, October 31, 2012 - link

    I just can't imagine trying to build a program that scales up to that kind of performance, it's just staggering.

    That being said, I have this little program I'd like to run on it... called... SkyNet....
    Reply
  • mfenn - Wednesday, October 31, 2012 - link

    I want more coverage of big iron! Hope you talk about it in depth on the podcast as well. Reply
  • harezzebra - Wednesday, October 31, 2012 - link

    Hi Anand,

    please do a indepth virtualization review, as you did earlier. your review is must for latest virtualization offerings from vmware, microsoft and citrix for unbiased decision making.

    regards
    harsh
    Reply
  • mdlam - Wednesday, October 31, 2012 - link

    Will it run Crysis? Reply
  • tspacie - Wednesday, October 31, 2012 - link

    Did you get any information about the network (yarc-2 , gemini) ? Cray's claim to fame has been their network architecture which is supposed to be a key contributor to the actual performance of the supercomputer. Reply
  • thebluephoenix - Wednesday, October 31, 2012 - link

    They should have used Radeons 7970. You can buy 6 for the price of one K20, no ECC though (and for that is Fire Pro S). Reply
  • HighTech4US - Wednesday, October 31, 2012 - link

    Toy GPUs have no place in HPC Computers. Reply
  • thebluephoenix - Wednesday, October 31, 2012 - link

    1TFLOPS Double precision Toy?

    http://i.top500.org/system/177430
    http://i.top500.org/system/177154
    Reply
  • garadante - Wednesday, October 31, 2012 - link

    You missed the point in the article saying ECC memory was a -must- for a usage scenario like this. With nearly 20,000 GPUs, and all of that information being continuously communicated between the GPU memory and the GPU itself, without ECC, errors would pop up very quickly, and would make useful computation nigh impossible. Reply
  • HighTech4US - Thursday, November 01, 2012 - link

    Can you guaranty that the Toy GPU you recommend would not produce a single error on a software run that takes 6 months?

    You may accept an occasional graphics glitch while gaming but no HPC customer will.
    Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    It's also about the specific software that works better with CUDA. GCN GPUs are no toys but the software support is nowhere near as prevalent in the professional GPGPU space compared to what NV has accomplished. This makes a lot of sense since NV essentially invented the GPGPU space starting with G80 in 2006. They spent a lot more money creating the CUDA eco-system and making sure they were the pioneers in this space. Given the higher widespread adoption of CUDA and proven track record of working with NV, larger companies are far more likely to go with Nvidia.

    This is actually no different than what we saw in the Distributed Computing space. For more than half a decade, NV's GPUs were faster in many apps. As the DC community is more dynamic and adopts much quicker to moder code and technologies, in the last 3 years, almost all of the new DC projects are dominated by AMD GPUs.

    On paper, HD7970 GE delivers 1.075 TFlops of DP and an 1200mhz 7970 has 1.23 Tflops. Without software support, for now it doesn't mean much in the professional space but the horsepower is already there.
    Reply
  • mikato - Wednesday, October 31, 2012 - link

    Is this the supercomputer that will also be crunching away on the massive amount of data NSA is storing on everyone from strategic points in the telecom backbone?
    http://www.wired.com/threatlevel/2012/03/ff_nsadat...
    Reply
  • Luscious - Wednesday, October 31, 2012 - link

    I'm curious if they ever went near F@H during burn-in and testing to see how much PPD that supercomputer could do. Reply
  • just4U - Wednesday, October 31, 2012 - link

    "The evolution of Cray's XT/XK lines simply stemmed from that point, with Opteron being the supported CPU of choice."

    -----

    I would have liked more of explaination here.. Does that mean that Intel's line doesn't work as well? Are there plans by Cray to move to Intel?

    Power draw must be key. I wonder what sort of power use they'd be looking at running Intel's proccessors.

    Great to see AMD in that super computer though.. I just have questions about future plans based on the current situation in the cpu market.
    Reply
  • Th-z - Wednesday, October 31, 2012 - link

    Very nice article and love your last paragraph, Anand. It's a revelation. It is indeed incredible to think when we wanted that 3D accelerator to play GLQuake, it actually turned the wheel for great things to come. To think back, something as ordinary or insignificant as gaming actually paved the way to accelerate our knowledge today. This goes to show even ordinary things can morph into great things that one can never imagine. It really humbles you to not look down anything, to be respectful in this intertwined world, the same way it humbles us as human beings as we know more about the universe. Reply
  • pman6 - Wednesday, October 31, 2012 - link

    so that's where all of AMD's revenue came from.

    I was wondering who was buying AMD products
    Reply
  • CeriseCogburn - Saturday, November 10, 2012 - link

    What amd revenue ?

    Just look up and down, left and right here, the amd fanboys are legion - granted they can barely bone up 10 cents a week, but after a few years they can buy 2 generations back.
    Reply
  • lorribot - Wednesday, October 31, 2012 - link

    Wonder if PC game piracy will be blamed for the failure of the supercomputer industry? Reply
  • Braincruser - Saturday, November 03, 2012 - link

    Well, you see the more someone pirates games, the more money he has to invest in hardware. So the better the hardware gets. <- nothing beats simple logic. Reply
  • ClagMaster - Wednesday, October 31, 2012 - link

    I have been working with supercomputers for 25 years.

    Although parallelism is very important for processing large models, there is one important feature Mr Anand failed to discuss about Titain, choosing instead to obscess about transistor count and CPU's and GPU's.

    And that is how much memory per box is available. 96GB? 256GB? of DDR3-1333 memory?

    Problem is usually memory for those large reactor or coupled neutron-gamma transport problems analyzed with Monte Carlo or Advanced Discrete Ordinates, not the number of processors. Need lots of memory for the geometry, depleteable materials, and cross-section data.

    And once the computing is done, how much space is available for storing the results? I have seen models so large that they run for 2 weeks with over 2000 processors only to fail because the file storage system ran out of space to store the output files.
    Reply
  • garadante - Wednesday, October 31, 2012 - link

    You failed to read the entire article. Anand stated there was something like 32 GB of RAM per CPU and 6 GB per GPU (if I remember correctly, going off the top of my head) for a grand total of 710 TB RAM total as well as 1 PB of HDD storage available. Check back through the pages to find what exactly he posted. Reply
  • chemist1 - Wednesday, October 31, 2012 - link

    So Sandy Bridge does ~160 GFlops on the LINPACK benchmark, while Titan should do ~20 PFlops, making it 125K times faster. 125K ~ 2^17, so with 17 doublings a PC will be as fast as Titan. If we assume 1.5 years/doubling, that gives us 25 years. And just imagine the capabilities of a 2037 supercomputer.... Reply
  • pandemonium - Wednesday, October 31, 2012 - link

    What a treat, for you, to be able to witness this. Thanks for the adventurous article, Anand! :) Reply
  • martixy - Thursday, November 01, 2012 - link

    Thank you for this article! It was absolutely awesome to read through it and a nice break from the usual consumer stuff.
    Faith in humanity restored... :)
    Reply
  • bigboxes - Thursday, November 01, 2012 - link

    I want to see the Performance tab on Windows Task Manager! :o Reply
  • Abi Dalzim - Thursday, November 01, 2012 - link

    We all know the answer is 42. Reply
  • easp - Thursday, November 01, 2012 - link

    For all the people speculating or suggesting that they should have used AMD GPUs or Intel CPUs, I think you need to think more like engineers, and less like "cowboys."

    To get started, reread this:

    "By adding support for ECC, enabling C++ and easier Visual Studio integration, NVIDIA believes that Fermi will open its Tesla business up to a group of clients that would previously not so much as speak to NVIDIA. ECC is the killer feature there."

    Now, why on earth would ECC memory on a GPU (which, apparently, AMD wasn't offering) be important? The answer is simple: because a supercomputer that doesn't produce trustworthy results is worse than useless. Shaving some money off the power and cooling budget, or even a 50% boost to raw performance and/or price performance doesn't really matter if the results of calculations that take weeks or months to run can't be trusted.

    Since this machine gets much of its compute performance from GPU operations, it is essential that it use GPUs that support ECC memory to allow both detection and recovery from memory corruption.

    As to the CPUs, I'm not suggesting that Intel CPUs are significantly less computationally sound than AMDs, but Cray and ORNL already have extensive experience with AMDs CPUs and supporting hardware. Switching to Intel would almost certainly require additional validation work.

    And don't underestimate the effort that goes into validating or optimizing these systems. Street price on the raw components alone has to be tens of millions of dollars. You can bet there is a lot of time and effort spent making sure things work right before things make it to full-scale production use.

    I know a guy, PhD in Mathematics who used to work for Cray. These days, he's working for Boeing, where his full-time-job, as best as I can understand it, is to make sure that some CFD code they run from NASA is used properly so the results can be trusted. When he worked at Cray, his job was much more technical, he hand-optimized the assembly code for critical portions of application code from Cray's clients so it ran optimally on their vector CPU architecture. When doing computation at this scale things that are completely insignificant on individual consumer systems, or even enterprise servers, can be hugely important.
    Reply
  • CeriseCogburn - Monday, November 05, 2012 - link


    I note, with 225,000 plus AMD cpu's , they get barely over 2 petaflops.

    Add just 18,000 plus nVidia video cards, and ACHIEVE 20+ PETAFLOPS.

    LOL - once again, amd sucks, and nVidia does not.
    Reply
  • Azethoth - Friday, November 02, 2012 - link

    So you are sitting at home playing monopoly on your iMac? Reply
  • 2kfire - Friday, November 02, 2012 - link

    Can someone ban this joker? Reply
  • Daggarhawk - Friday, November 02, 2012 - link

    Anand I LOVE this post. Breath of fresh air to get to see some of the real world applications for all this awesome tech we love. The interviews with scientists are especially fascinating and eye opening. Love the use of video to hear the insights, affect and passion of the researchers and see them at work. Please more of this sort of thing!! Reply
  • armandc001-tech lover - Saturday, November 03, 2012 - link

    dammm what an article....! Reply
  • philosofa - Saturday, November 03, 2012 - link

    Thank you Anand!

    I've been noting till I'm blue in the face that GK-110 formed Nvidia's backup plan, should the GCN/Kepler power ratio not have worked out as much to AMD's disadvantage as it did (presumably 'Big Fermi' was a similar action plan being enacted).

    It's not something I've seen anyone else say explicitly, so it's (confirmation bias aside) just lovely to hear that's your take too :)
    Reply
  • galaxyranger - Sunday, November 04, 2012 - link

    I am not intelligent in any way but I enjoy reading the articles on this site a great deal. It's probably my favorite site.

    What I would like to know is how does Titan compare in power to the CPU that was at the center of the star ship Voyager?

    Also, surely a supercomputer like Titan is powerful enough to become self aware, if it had the right software made for it?
    Reply
  • Hethos - Tuesday, November 06, 2012 - link

    For your second question, if it has the right software then any high-end consumer desktop PC could become self-aware. It would work rather sluggishly, compared to some sci-fi AIs like those in the Halo universe, but would potentially start learning and teaching itself. Reply
  • Daggarhawk - Tuesday, November 06, 2012 - link

    Hethos that is not by any stretch certain. Since "self awareness" or "consciousness" has never been engineered or simulated, it is still quite uncertain what the specific requirements would be to produce it. Yet here you're not only postulating that all it would take would be the right OS but also how well it would perform. My guess is that Titan would much sooner be able to simulate a brain (and therefore be able to learn, think, dream, and do all the things that brains do) much sooner than it would /become/ "a brain" It look a 128 core computer a 10hr run render a few-minute simulation of a complete single celled organism . Hard to say how much more compute power it would take to fully simulate a brain and be able to interact with it in real time. as for other methods of AI, it may take totally different kinds of hardware and networking all together. Reply
  • quirksNquarks - Sunday, November 04, 2012 - link

    Thank You,

    this was a perfectly timed article - as people have forgotten why it is important the Technology keeps pushing boundaries regardless of *daily use* stagnation.

    Also is a great example of why AMD does offer 16-core Chips. For These Kinds of Reasons! More Cores on One Chip means Less Chips are needed to be implemented - powered - tested - maintained.

    an AMD 4 socket Mobo offers 64 cores. A personal Supercomputer. (Just think of how many they'll stuff full of ARM cores).

    why Nvidia GPUs ?
    a) Error Correction Code
    b) CUDA

    as to the CPUs...

    http://www.newegg.ca/Product/Product.aspx?Item=N82...
    $599 for every AMD 6274 chip (obvi they don't pay as much when ordering 300k).

    vs

    http://www.newegg.ca/Product/Product.aspx?Item=N82...
    $1329 for an Intel Sandy Bridge equivalent which isn't really an equivalent considering these do NOT run in 4 socket designs. (obvi a little less when ordering in bulk numbers).

    now multiple that price difference (the ratio) in the order of 10's of THOUSANDS!!

    COMMON SENSE people.... Less Money for MORE CORES - or - More Money for LESS CORES ?
    which road would YOU take? if you were footing the $ Bill.

    but the Biggest thing to consider...

    ORNL Upgraded from Jaguar <> Titan - which meant they ONLY needed a CHIP upgrade in that regards (( SAME SOCKET )) .. TRY THAT WITH INTEL > :P
    Reply
  • phoenicyan - Monday, November 05, 2012 - link

    I'd like to see description of logical architecture. I guess it could be 16x16x73 3D Torus. Reply
  • XyaThir - Saturday, November 10, 2012 - link

    Nice article, too bad there is nothing about the storage in this HPC cluster! Reply
  • logain7997 - Tuesday, November 13, 2012 - link

    Imagine the PPD this baby could produce folding. 0.0 Reply
  • hyperblaster - Tuesday, December 04, 2012 - link

    In addition to the bit about ECC, nVidia really made headway over AMD primarily because of CUDA. nVidia specially targeted a whole bunch of developers of popular academic software and loaned out free engineers. Experienced devs from nVidia would actually do most of the legwork to port MPI code to CUDA, while AMD did nothing of the sort. Therefore, there is now a large body of well-optimized computational simulation software that supports CUDA (and not OpenCL). However, this is slowly changing and OpenCL is catching on. Reply
  • Jag128 - Tuesday, January 15, 2013 - link

    I wonder if it could play crysis on full? Reply
  • mikbe - Friday, June 28, 2013 - link

    I was actually surprise at how many actual times the word "actually" was actually used. Actually, the way it's actually used in this actual article it's actually meaningless and can actually be dropped, actually, most of the actual time. Reply

Log in

Don't have an account? Sign up now