KStars loads 41560 stars in 225ms!!!
June 4, 2008
“We all know Linux is great…it does infinite loops in 5 seconds.”
–Linus Torvalds
KStars is now no longer far behind, thanks to James Bowlin. James seems to know every bit of computer architecture. One interesting thing that I learnt from him, after he repeatedly tried to put it into my head, is that malloc()s are like hard disk seeks. You had better do a BIG malloc() at once than make several small malloc()s. He knew for sure, that much of the time KStars spent in loading stars in the StarComponent::readData() method went into the constructors!
So he suggested something that I initially thought was un-implementable: Forget C++ constructors. Let’s do our own memory management. Create a template StarObject with some default data initialized in it. Whenever you want to create a new StarObject, just memcpy() it to make a new one!
Now, the first question I asked is what about those memory allocations that the constructor does? For instance, StarObject::StarObject() creates a new QString to store the spectral type. James said we’d take care of that by allocing them in a StarObject::init() function that sets up our freshly copied StarObject. I still had a lot of doubts in mind.
I decided to jump in and see what happens. Went and made some changes in StarObject - implemented a StarObject::init() that did the job of a constructor, and after a few segfaults, changed StarObject::SpType from QString, to QString *, so that I could alloc it freely at my will. Now, this could create some bugs and memory leaks in KStars if not used properly, but every programmer will have the sense to look at my “WARNING:” comments. I couldn’t believe that it actually worked! And I was really amazed to see the whooping 6x increase in speed, with the readData() method taking 6 microseconds per star instead of the earlier 42!!!
I had some funny idea of how C++ might store the class, but it turned out to be different. I still don’t know how it does that - it looks fuzzier after I implemented this.
I’m pretty sure that things will break (and segfault) when KStars learns to remember the user-added information links and observation log in the Detail dialog. We will probably fix this in the trunk before the KDE4.1 release, but that means that I’ll have to make some more changes before it safely merges into the branch.
I’m now trying another suggestion that James made, about loading all star name data into memory, to avoid the disk head seeking between two different files. Now that the fread() that grabs the data from the binary files takes a significant fraction of the load time, this might be an important factor. Earlier, it would’ve simply gotten buffered by the time StarObject::StarObject() would finish. This requires me to add a header to the star name file describing the number of records, so that’s going to change everything right from the data file generating program. I can’t wait to see how much of an efficiency boost I’ll get. It might not be much, though.
June 5, 2008 at 7:34 pm
Congratulations!
I’m just wondering, have you tried profiling KStars to see where things are getting bottlenecked?
<pimp class=”shameless”>
Anjuta has a great profiler plugin, which gives you a beautiful call graph and stats table among other things…
</pimp>
June 6, 2008 at 1:45 am
Well… I’m still far from using an IDE. I don’t know how to use KDevelop / Eclipse / Anjuta. Some day, I must spend time converting EMacs into the powerful IDE that it can be. Something that integrates with my terminal would be heaven.
Well, you must teach me how to use Anjuta to do such things. I dunno if other IDEs provide such features, so I might want to use Anjuta for these things.
Thanks for the tip, and happy hacking.
June 8, 2008 at 4:00 pm
Looks like very interesting work
Nice to know 
Just in case, you might probably also try to use placement new[1] to allocate StarObject.
Also, would it be possible to load the star data in separate thread so that you can be responsive atleast. Ofcourse this doesn’t increase the loading time, but the stars will get loaded in the background.
[1]
http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10
http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.14
June 8, 2008 at 8:41 pm
Hmm… I hope the placement new will not “construct” the object. Thanks for that information. I’ll test the speed of this operator and of it doesn’t cause any overhead, I’ll consider modifying the code to use that.
We were already considering loading stars in a different thread, but maybe that’s for later. Besides, this portion of the implementation is only for the stars that are loaded once, during startup, where loading in a different thread is of no use. However, that, and nonblocking file reading will help immensely for dynamic loading of stars, during runtime, when we need to display fainter stars.