Sunday, 13 March 2016

performance - Swift's Dictionary is slow even with -Ofast




I'm implementing what is essentially a cache using a Dictionary in Swift. The performance is well short of what I would expect. I've read some of the other questions, for example this one about array sorting that seem to suggest that -Ofast is the answer (if you're prepared to accept the changes it brings with it). However, even when compiled -Ofast, performance compares poorly to other languages. I'm using Swift version 1.0 (swift-600.0.34.4.8).



The following is a boiled-down example which illustrates the problem:



import Foundation

class Holder {
var dictionary = Dictionary()

func store(#key: Int, value: Int) {

dictionary[key] = value
}
}

let holder = Holder()

let items = 5000

for (var i: Int = 0; i < 5000; i++) {
holder.store(key: i, value: i)

}


Compiled with -O3 it takes more than two seconds to run:



xcrun swift -sdk $(xcrun --show-sdk-path --sdk macosx) -O3 Test.swift && time ./Test

real 0m2.295s
user 0m2.176s
sys 0m0.117s



Compiling with -Ofast yields a 3-4x improvement:



xcrun swift -sdk $(xcrun --show-sdk-path --sdk macosx) -Ofast Test.swift && time ./Test

real 0m0.602s
user 0m0.484s
sys 0m0.117s



By comparison, this Java implementation:



import java.util.Map;
import java.util.HashMap;

public class Test {
public static void main(String[] args) {
Holder holder = new Holder();
int items = 5000;

for (int i = 0; i < items; i++) {
holder.store(i, i);
}
}
}

class Holder {
private final Map map = new HashMap();

public void store(Integer key, Integer value) {

map.put(key, value);
}
}


is ~6x faster again:



javac Test.java && time java Test

real 0m0.096s

user 0m0.088s
sys 0m0.021s


Is it simply the cost of copying the Dictionary as it's mutated and stored in the Holder instance that's causing Swift to fare so badly? Removing Holder and accessing the Dictionary directly would suggest that it is.



This code:



import Foundation


var dictionary = Dictionary()

let items = 5000

for (var i: Int = 0; i < 5000; i++) {
dictionary[i] = i
}


is significantly faster:




$ xcrun swift -sdk $(xcrun --show-sdk-path --sdk macosx) -O3 NoHolder.swift && time ./NoHolder

real 0m0.011s
user 0m0.009s
sys 0m0.002s

$ xcrun swift -sdk $(xcrun --show-sdk-path --sdk macosx) -Ofast NoHolder.swift && time ./NoHolder

real 0m0.011s

user 0m0.007s
sys 0m0.003s


While it provides a (hopefully) interesting data point, accessing the Dictionary directly isn't possible in my situation. Is there anything else I can do to get closer to this level of performance with Swift in its current form?


Answer



TL;DR It's Beta.



I would think that the answer right now is just that Swift is in beta, the tools are in beta, and a lot of optimizations are yet to be done. Replicating your "Holder" class example in Obj-C shows that even it is quite a bit faster at the same -Ofast level.




@import Foundation;

@interface Holder : NSObject

@property NSMutableDictionary *dictionary;
- (void)storeValue:(NSInteger)value forKey:(NSString *)key;

@end

@implementation Holder


- (instancetype)init {
self = [self initWithDict];
return self;
}


- (instancetype)initWithDict {
if (!self) {
self = [super init];

_dictionary = [NSMutableDictionary dictionary];
}

return self;
}

- (void)storeValue:(NSInteger)value forKey:(NSString *)key {
[self.dictionary setValue:@(value) forKey:key];
}


@end

int main(int argc, const char * argv[]) {

Holder *holder = [Holder new];

for (NSInteger i = 0; i < 5000; i++) {
[holder storeValue:i forKey:[NSString stringWithFormat:@"%ld", i]];
}


}


The Obj-C is fast out of the gate.



time ./loop 

real 0m0.013s
user 0m0.006s
sys 0m0.003s



The similarities in time to the NoHolder example you give is a good indication at how much optimization the Obj-C compiler is doing.



Taking a look at the assembly for the -O3 and -Ofast levels in Swift show there is a big difference in the amount of safety checking done. Looking at the Obj-C assembly shows that, well, there is just a lot less of it to be performed. Since the key to making a program fast is to make it not need to do much…



OS-X-Dos-Equis:~ joshwisenbaker$ wc -l objc.txt 
159 objc.txt
OS-X-Dos-Equis:~ joshwisenbaker$ wc -l oFast.txt
3749 oFast.txt



(Edit: Update with results of finalizing the Holder class.)



So another interesting wrinkle is the use of the @final decoration on the class definition. If you know that your class is never going to be subclassed then try adding the keyword like this: @final class Holder



As you can see it also normalizes the performance when compiled the same way.



OS-X-Dos-Equis:~ joshwisenbaker$ swift -sdk $(xcrun --show-sdk-path --sdk macosx) -Ofast bench.swift && time ./bench


real 0m0.013s
user 0m0.007s
sys 0m0.003s


Even using just -O3 the @final works magic.



OS-X-Dos-Equis:~ joshwisenbaker$ swift -sdk $(xcrun --show-sdk-path --sdk macosx) -O3  bench.swift && time ./bench

real 0m0.015s

user 0m0.009s
sys 0m0.003s


Again, I think the differences you are seeing in performance is probably down to the current optimization levels when compiled.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...