Laggy performance with large data sets

Is there a way to prevent or manage the relatively laggy performance in Igor that happens with large data sets. I have a number of experiment files that are ~250MB in size. These contain about 12,000 waves that are ~5000 points long, and which are in 10 separate graphs with ~600 waves per graph. With this organization, it takes Igor about 40-50 seconds to load the PXP file and run the recreation macros.

Slow loading in itself would not be much of a problem, but following it, running many operations such as appending a wave to a graph, opening the Curve Fit panel window, or performing the fit itself, are accompanied by a pause that lasts a few seconds.

This issue is not the system I'm using (17-inch MacBook Pro 2011, with 16GB RAM, 2.5GHz Quad Core i7, 512GB SSD with ~100GB free, running OS X 10.8.4), since other applications run very fast even when Igor is crunching its numbers on these experiment files. Igor is also using only about 500MB RAM, which is significantly lower than what I'd expect it could use, given that 32-bit applications have a ~4GB limit.

Is there some way to increase Igor's memory allocation if this is a contributing factor, or otherwise make loading and using large data sets faster?
The total amount of memory that you are using is more than likely not affecting your performance here.

I suspect the main issue is that you have many waves in one data folder. Before you do anything else I recommend that you close the Data Browser window. The DB is going to spend time looking at your data and with 12k waves it may take some time. Next, the presence of a large number of waves also affects the execution of any command that applies to one or more waves as these have to be looked up. To get around this issue you might consider concatenating your individual waves into a 2D matrix where each wave is a single column. These are still easy to display and could reduce the over-all number of waves by a factor of ~600 if you are displaying that many waves per graph.

I hope this helps,

A.G.
WaveMetrics, Inc.
Simply drawing a graph with that many traces and points in it may be slow. A curve fit that adds a fit curve to a graph will cause it to redraw every iteration of the curve fit. To prevent that, try adding the /N flag to the CurveFit or FuncFit command. You could also remove the /D flag from the end of the command if you don't need the fit curve.

Anything else that causes data in the graph to change will cause the graph to redraw. You could try closing the graph and saving the recreation macro while you work on the analysis.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Thanks for the tips. Closing the data browser does not seem to have an effect, and the curve fit tweaks do help a touch, but this issue is an overall slowdown and not just for the curve fitting.

If I cut out 4000 of the waves so there are only 8000 then the responsiveness greatly increases, even though the waves are still displayed 600 to a graph. This suggests it may be how the waves are stored more than anything else, and that there's some threshold after which Igor gets bottlenecked. For now I will try organizing the waves into subdirectories, and while using a 2D wave approach might help, in the long term it would take me redoing close to 5 years and thousands of lines of programming in Igor to implement that approach (ugh!). However, I'll give it a shot, at least for the routines I'm using for this one project.
Igor stores waves in each data folder as a linked list. When it needs to look up a wave name, it must traverse that linked list. When you have thousands of waves in a data folder, this lookup can be slow. That's why splitting your waves across data folders or concatenating multiple 1D waves into a single 2D wave can dramatically decrease lookup time.

I don't think there's much you can do to make loading of experiments you've already saved be faster. But maybe it's possible for you to load each experiment and reorganize the waves in that experiment into multiple data folders.

In any code you write, make sure you're using functions and not macros as much as possible.
aclight wrote:
Igor stores waves in each data folder as a linked list. When it needs to look up a wave name, it must traverse that linked list. When you have thousands of waves in a data folder, this lookup can be slow.


Since there is a big re-write going on for IP7, I wonder if this is the time to jump to using a C++ container? I think using a map container offers O(lnN) performance, which would be faster than O(N) for a linked list. There is also unordered_map.
andyfaff wrote:
Since there is a big re-write going on for IP7, I wonder if this is the time to jump to using a C++ container? I think using a map container offers O(lnN) performance, which would be faster than O(N) for a linked list. There is also unordered_map.

And a hash map has constant-time lookup.

It's on the list on the whiteboard in my office. But it is relatively low priority compared to getting a working application out to beta. This topic comes up from time to time- I wonder how many people are affected by lengthy look-up when there are many waves.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
I think thousands of waves in a datafolder is quite common. I would normally do datafolders, but every man to himself.
A map container in c++ would store a wave by the hash of the key (wavename). The map container is in the STL, and will be available in most compilers/platforms, whereas hash_map is not. Perhaps I got big O wrong. unordered_map may also be a contender.

From stackoverflow (treat with caution):
"Some of the key differences are in the complexity requirements.
A map requires O(log(N)) time for inserts and finds.
An unordered_map requires an 'average' time of O(1) for inserts and finds but is allowed to have a worst case time of O(N).
So, usually, unordered_map will be faster, but depending on the keys and the hash function you store, can become much worse."
That's all correct. To get hash_map as part of the C++ standard library, you have to have C++11. That requires newer versions of Visual Studio and Xcode than we can require presently. There are other hashed maps available; we will look at the various benefits when it gets high enough on the priority list.

std::map uses binary search, which results in O(ln(N)). A hash map gets constant time from using the hash as a direct index into an array. The possibility of O(N) comes from hash collisions- when more than one item has the same hash, then the items are simply stored in an array, and the lookup goes back to simply walking the array. But if your hash is reasonably good, the arrays will be very small.

We also have to be careful that in implementing a fast wave lookup that we don't slow down wave *creation*. Some containers feature fast lookup, but slow insertion.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Just one minor comment:

unordered_map (aka hash map) in namespace std::tr1:: was introduced with C++98 TR1, which e. g. VS2008 already supports. And recent xcode versions I guess also.