3 Days Before Elections, German State Censors Pirate Party from 'net in Back Page News


Photo - - - - -

C++ vs C# performance : file i/o

Herb Sutter likes to point out how Console.WriteLine() calls a virtual method - ToString() - to illustrate the reliance on virtual methods in .NET vs direct method calls (through templates) in C++. He's done it on his blog, and more recently at Lang.NEXT.

What's being implied here of course is that .NET is generally slower than C++. While Sutter might have a point, text input/output is a poor example, because C++'s standard streams are slower than you can imagine. Much, much slower than .NET's. Raymond Chen and Rico Mariani famously competed at writing a fast English/Chinese dictionary in C++ vs C#; the C# version was several times faster until Raymond Chen ultimately scrapped all standard iostreams and wrote his own.

By curiosity, I decided to write a simple benchmark and see for myself. The operation would be this:
- read a chunky text file as a list or array of lines
- reverse each line
- write the results to the original file.

That, repeated multiple times and benchmarked.

The approach for .NET: File.ReadAllLines(), a for loop over the result, Array.Reverse, and File.WriteAllLines. That's the way I would write this procedure normally; it's as simple as possible, and I can't think of an obvious way of making it more efficient.

The approach for C++: C++ has std::reverse which is the equivalent of Array.Reverse (but more generic), however, it doesn't have an equivalent of File.Read/Write-AllLines. I searched a bit online to see what was considered the standard approach, and eventually settled on opening an ifstream, reading the file line by line into a vector of strings using std::getline, and outputting it to an ofstream using the operator <<.

If an obvious, faster approach exists for C++, I'm all ears. Note that storing the lines as a vector here is a requirement of the test, not an implementation detail (An array could be used, but then the number of lines would need to be known in advance. I don't see an easy way of doing that.) Of course, the test could be done line-by-line, avoiding the creation of multiple strings and a container - but the goal of the test is precisely to benchmark the performance of all these things. Admittedly, it benchmarks more than just I/O - but I don't do this every day so I might as well benchmark a few things together.

My first implementation was in C++/CLI, and for 20 iterations over a random ~175KB json file I had, the results were:

standard cpp streams: 340 ms
.NET System.IO : 58 ms

Unsure whether these results were skewed towards .NET or C++ due to using the CLR switch, I created separate C++ and C# projects instead. The results:

C++: 134 ms
C#: 57 ms

Note that this is, of course, in Release mode and without a debugger attached. Just for fun, let's see what happens in the typical "F5" scenario: debugging the Debug build.

C++: 13653 ms (that's 13,6 seconds, yes!)
C#: 59 ms

This is consistent with other benchmarks I did in the past: C# is relatively unaffected by being built in Debug mode and having a debugger attached; C++ presents extreme discrepancies in that regard.

So there you have it. C# is twice as fast as C++ at reversing lines of text in a text file (and in debug it's 231 times as fast). So much for avoiding virtual method calls, eh?

I'm aware that a faster C++ version is possible. The C++ version I wrote is pure STL and very straightforward. It is of course possible to use faster libraries, or to write your own, but the purpose of the test is just to compare the standard facilities of both languages.

Update(2/5/2012): Thanks to Brandon Live for pointing out some inconsistencies betweeen the two versions: notably, C++ counted time differently, which resulted in a very slight advantage for C#, and C++ had no "warm-up" call. I've updated the numbers and code, but the difference is small enough as to not affect any of the conclusions: C# is comfortably twice as fast in stand-alone and 200+ times as fast when debugged (and yes that includes with "native debugging" on).

C# version:
using System;
using System.Diagnostics;
using System.IO;
namespace PerfTestCSharp {
	class Program {

		static void Main() {
			// Call once for warm-up
			CSharpPerformOperation();
			long csharpTime = 0;
			for (int i = 0; i < 20; ++i) {
				var sw = Stopwatch.StartNew();
				CSharpPerformOperation();
				csharpTime += sw.ElapsedMilliseconds;
			}
			Console.WriteLine("");
			Console.WriteLine("C# time: {0} ms", csharpTime);
			Console.ReadKey();
		}
		static void CSharpPerformOperation() {
			var lines = File.ReadAllLines("text.txt");
			for (int i = 0; i < lines.Length; ++i) {
				var charArr = lines[i].ToCharArray();
				Array.Reverse(charArr);
				lines[i] = new string(charArr);
			}
			File.WriteAllLines("text.txt", lines);
		}
	}
}

C++ version:
#include <vector>
#include <fstream>
#include <string>
#include <algorithm>
#include <iostream>
#include <windows.h>
#include <sstream>
using namespace std;
// implementation of a high-precision counter from http://stackoverflow.com/questions/1739259/how-to-use-queryperformancecounter
double PCFreq = 0.0;
__int64 CounterStart = 0;


void StartCounter()
{
	LARGE_INTEGER li;
	if(!QueryPerformanceFrequency(&li))
		cout << "QueryPerformanceFrequency failed!\n";
	PCFreq = double(li.QuadPart)/1000.0;
	QueryPerformanceCounter(&li);
	CounterStart = li.QuadPart;
}
double GetCounter()
{
	LARGE_INTEGER li;
	QueryPerformanceCounter(&li);
	return double(li.QuadPart-CounterStart)/PCFreq;
}

void CPPPerformOperation() {
	vector<string> lines;
	ifstream inFile("text.txt");
	string line;
	while(getline(inFile, line)) {
	  lines.push_back(line);
	}
	for (size_t i = 0; i < lines.size(); ++i) {
	  reverse(begin(lines[i]), end(lines[i]));
	}
	ofstream outFile("text.txt");
	for (auto it = begin(lines); it != end(lines); ++it) {
	  outFile << *it << "\n";
	}
}
int main() {
	// call once for warm-up...
	CPPPerformOperation();
	double totalTime = 0;

	for (int i = 0; i < 20; ++i) {
		StartCounter();
		CPPPerformOperation();
		totalTime += GetCounter();
	}
	cout << "CPP time : " << totalTime << " milliseconds.";
	system("pause");
}




I'm not even going to try to defend C++ iostreams, they've got to be the most overly designed, poorly implemented part of the C++ standard library (lots of virtual calls and memory allocations). Its well known that if you want to do file IO fast in C++ you will end up needing to use the C IO library.

I think what Herb was really trying to get at with the virtual dispatch is that with C++ templates you can implement something that can avoid it.
Obviously you could have a class that implements a virtual ToString just like C# and Java, but there is also the possibility for a faster implementation (such as boost::lexical_cast or boost::spirit). Here instead of having to go through the virtual dispatch (which could inhibit inlining) the compiler will know which conversion methods needs to be called based on the class' type and potentially inline it.

As a note, lexical_cast had a lot of performance problems in the past, solely because it was based on stringstream (which is a part of the crap that is iostreams). Later Boost versions have spent time optimising the conversion routines to eliminate this dependancy.

Anyway there are two problems with his comment: the first is that you get different behaviour when dealing with inheritance and the second is that this is a micro-optimisation you generally don't care about.
You're using all the notoriously slow stuff in C++ (like vectors). You could write your own fast datastructure for holding the lines that would be many times faster than a vector.

Of course, it would complicate things and you might not be that much faster than C# and yes you wanted to use the standard facilities, but that's just not the way things work in real life. For most things C++ is now getting obsolete, but if you really know how to use it and you really need to get the most out of your hardware it still is a good choice. Making a comparison like this is rather pointless imho, both have their strength, weaknesses and their purpose.
If virtual calls are an issue with standard classes, create your own dummy derivate and seal it. The JIT knows what to do. If the type is a sealed class, it'll convert the virtual calls into standard ones.

Example:

class VirtualClass
{
public virtual void SomeMethod() { ... }
}
sealed class SealedVirtualClass : VirtualClass
{
}

There. Instant optimization. (Of course, you'll have to clone the constructors as needed.)

Quote

Note that storing the lines as a vector here is a requirement of the test, not an implementation detail (An array could be used, but then the number of lines would need to be known in advance. I don't see an easy way of doing that.)

Why would the number of line need to be known in advance? Or I guess you mean you'd need that for a fixed size array.
Depending on the characteristics you expect to find in the files you're parsing, you could start with a fixed array on the stack and switch to a dynamic heap-allocated array if/when you run out of space there.

At the very least you should be calling vector<string>::reserve to cut down allocations.

Also, it looks like your C# version is measuring things quite differently? Why do you create a new StopWatch inside each loop iteration, instead of starting one before and getting the time afterward like you do in the C++ version?

And your C# version has an extra call outside the loop for "warm-up." So you aren't really measuring any I/O for the C# version, but you are for the C++ version. I guess it's easy to win when you cheat :-)
@Brandon Live, the "warm-up" prevents the JIT overhead from being measured. C++ has no JIT, thus it's not necessary.
I guess System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod would be fine too, but it's more difficult to use.

Aethec, on 29 April 2012 - 14:22, said:

@Brandon Live, the "warm-up" prevents the JIT overhead from being measured. C++ has no JIT, thus it's not necessary.I guess System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod would be fine too, but it's more difficult to use.

JIT'ing is clearly part of how C# works. It's not useful to compare how C# works without JIT'ing since C# uses JIT'ing. If you want to optimize that, you can ngen the code. I'd be interested in seeing the results of a fair test without the "warm up" (and with the timers changed to work the same).

The "warm-up" call does way more than (maybe) causing it to JIT that code. First, it might not even do that (it's probably already JIT'd. If not, you have no idea if it was inlined, or if other optimizations like loop unrolling may affect what happens here.

Anyway, the main point is that you are measuring way more work in the C++ test (such as the I/O associated with loading the file, getting the code and possibly some data structures into the CPU cache, etc).

Oh, and PrepareMethod has nothing to do with JIT'ing... It's only relevant in a CER which doesn't apply here at all.
Without a warm-up call, the C# time will depend on how much test iterations you set.
I just tried ; one iteration takes 1 ms with warm-up and 3-5 ms without on my machine.

Each language and technology requires you to do some stuff if you want to benchmark it.
For example, I could write a "benchmark" to filter and sort collections in C# using LINQ, and in any other language. The C# time would be 0 ms since the query isn't even executed unless you add something to the LINQ query that forces it to execute, which would seem unfair to C# compared to other languages.

Quote

I just tried ; one iteration takes 1 ms with warm-up and 3-5 ms without on my machine.

Right, so the C# test is getting an unfair 300-500% advantage on the first run.

If you want to compare warm times (i.e. no JIT'ing, with the file loaded off the disk, CPU cache primed, etc), then you'd need to give C++ the "warm-up" call too. That would give you an accurate number, and give the C++ version a fair shake. Of course, to provide real meaningful data you'd also want to provide a cold comparison, where the C++ version will have the advantage that it doesn't have any JIT cost. If you don't include that in the comparison, it's a bit like saying "Macs boot so much faster than PCs (if you count the OS load time on the PC but not the Mac)."

The debug comments in the article are particularly amusing, given this cheat. In the C++ version the program likely stopped at the first call while it loaded symbols. The C# probably did something similar, just at a different time (i.e. right at first launch, or definitely by the end of the first "warm-up" call). Also, even the C# case involves a lot of native code, but you probably didn't have native debugging on, so it probably didn't bother trying to resolve symbols for all the native DLLs getting loaded. Give C++ the same "warm-up" call and it'll probably win this test too.

Brandon Live, on 29 April 2012 - 03:43, said:

Why would the number of line need to be known in advance? Or I guess you mean you'd need that for a fixed size array.Depending on the characteristics you expect to find in the files you're parsing, you could start with a fixed array on the stack and switch to a dynamic heap-allocated array if/when you run out of space there.
But I don't want to implement my own custom data structure here. I want a dynamically growing list because I know nothing of the size of the file, and the standard data structure for doing that in C++ is std::vector.

Quote

At the very least you should be calling vector<string>::reserve to cut down allocations.
How much should I reserve? The size of the file is not known in advance. It can be arbitrarily small or large.

Just because I'm curious, I tried with a reserve of 1000 lines. No difference whatsoever in performance, still 140ms for C++. A reserve of 10000 lines cut down times by 10ms, but it's only an optimization for that particular file because it has around that number of lines, and reserving a 10000 element array is a large allocation to do for a file that might be 50 lines long.

Quote

Also, it looks like your C# version is measuring things quite differently? Why do you create a new StopWatch inside each loop iteration, instead of starting one before and getting the time afterward like you do in the C++ version?
Because in the C# version I print a "." after each call and I don't want to count that in. If you remove that and use the same time counting method as in C++ you'll see it makes no difference whatsoever it makes a 5ms difference for C++, so thanks for pointing that out. I'll illustrate next.

Quote

And your C# version has an extra call outside the loop for "warm-up." So you aren't really measuring any I/O for the C# version, but you are for the C++ version. I guess it's easy to win when you cheat :-)
No difference whatsoever. Results for a version where the C# code has the same time counting logic as C++ and C++ calls the operation for warm-up:

C#: 57ms
C++: 135ms

The 5ms speedup for C++ (compared to 140ms previously) comes from fixing the time couting logic. The warm-up call made no difference.


Quote

The debug comments in the article are particularly amusing, given this cheat. In the C++ version the program likely stopped at the first call while it loaded symbols. The C# probably did something similar, just at a different time (i.e. right at first launch, or definitely by the end of the first "warm-up" call).
No difference whatsoever with a "warm-up" call for C++ in debug mode. Results:

C#: 56ms
C++: 14 seconds

C# still 200x as fast in debug. It's not loading symbols that takes time (those are loaded before hand), it's running the code. Every iteration is equally slow.

Quote

Also, even the C# case involves a lot of native code, but you probably didn't have native debugging on
With native debugging on:

C#: 57ms

Quote

Of course, to provide real meaningful data you'd also want to provide a cold comparison, where the C++ version will have the advantage that it doesn't have any JIT cost.
Removing the warm-up calls in both versions have no measurable effect, same numbers.

Any other suggestions?
This C# one-liner is faster than your method on my machine, especially with large files (30% for 2 MBs, 50% for 5 MBs):
File.WriteAllLines( "text.txt", File.ReadAllLines( "text.txt" ).Reverse() );
I'm always amazed at how LINQ and iterators make things both simple and fast.

Aethec, on 04 May 2012 - 19:17, said:

This C# one-liner is faster than your method on my machine, especially with large files (30% for 2 MBs, 50% for 5 MBs):
File.WriteAllLines( "text.txt", File.ReadAllLines( "text.txt" ).Reverse() );
I'm always amazed at how LINQ and iterators make things both simple and fast.
But that doesn't do the same thing. You're reversing the order of lines within the file, not the order of characters within each line.
Oops. I misunderstood what your code was doing. LINQ can't help, it seems. Sorry.

Here's a version that actually does what it's supposed to do :)
using ( var stream = File.Open( "text.txt", FileMode.Open, FileAccess.ReadWrite ) )
using ( var reader = new StreamReader( stream ) )
using ( var writer = new StreamWriter( stream ) )
{
    List<string> lines = new List<string>( 1000 );
    for ( string line = reader.ReadLine(); line != null; line = reader.ReadLine() )
    {
	    var charArr = line.ToCharArray();
	    Array.Reverse( charArr );
	    lines.Add( new string( charArr ) );
    }

    writer.BaseStream.Seek( 0, SeekOrigin.Begin );
    foreach ( var line in lines )
    {
	    writer.WriteLine( line );
    }
    writer.Flush();
}
Faster because it only opens one stream (~20% on a 5MB file here).

Aethec, on 04 May 2012 - 21:12, said:

[/code]Faster because it only opens one stream (~20% on a 5MB file here).
Interesting. I don't see any difference in performance if I modify it to open two streams, one for reading and one for writing. The improved speed seems to come from iterating the list of lines twice instead of three times. Also while it is faster for a 5MB file, it is slower for a 175KB one (like the one I was using initially).

Btw there is no need to call Flush() (or Close()) on a FileStream if you call Dispose() - which the using statement ensures.
Guess I should have checked with two streams before making assumptions :)
Didn't know about the Flush() thing, thanks.

It seems the machine matters - my version is always faster than yours on my machine, regardless of the file size. I guess it has something to do with the speed of the disk, and the file's physical location since it is a hard drive and not a SSD.

I tried one version without any list, using a temporary file:
string path = Path.GetTempFileName();
using ( var writer = new StreamWriter( path ) )
{
    foreach ( var line in File.ReadAllLines( "text.txt" ) )
    {
	    var charArr = line.ToCharArray();
	    Array.Reverse( charArr );
	    writer.WriteLine( charArr );
    }
}

File.Copy( path, "text.txt", true );
File.Delete( path );
A bit faster than my previous method for large files, but much slower (3x) for small files.
This is the fastest (without using unsafe code) I could come up with:
File.WriteAllLines("text.txt", File.ReadAllLines("text.txt").Select(line => {
	var charArr = line.ToCharArray();
	Array.Reverse(charArr);
	return new string(charArr);
}));

May 2012

S M T W T F S
  12345
678910 11 12
13141516171819
20212223242526
2728293031  

Categories