How fast can high-speed, multichannel data acquisition run?

Hi,

I have a high-bandwidth data acquisition task that I initially doubted could be done in Igor. To my surprise, it can be done, but not quite at the speed I need it to run.

The task is to acquire analog input from 64 channels at 10 kHz, save it to disk, analyze each chunk of data as it comes in, and then output to a digital port. My hardware is an NI M-series card (USB-6255) and a PC with Windows 7. The rub is that I’d like the read-write-analyze loop to run every 10 ms to minimize the delay between input and output.

Currently, Igor can reach a loop speed of 40 ms when setting up the acquisition with DAQmx_Scan and a background procedure that calls FIFO2wave, etc. When I decrease the loop speed to 30 ms, Igor simply crashes, even though the background procedure execution time is about 5 ms. The loop period can go down to 30 ms when tying the FIFO directly to a file, but then I can’t access the input signals AND it’s still too slow.

So my questions are:
Would a faster machine help me get to a 10 ms loop period?
Are XOPs my best and only option from here?

Many thanks,
Tobi
Background tasks can only run as fast as Igor's main loop, and that runs every 20 mS. I think to get faster, you're going to have to write this all as a function that doesn't return. Use DoXOPIdle to give the scan a chance to get the latest data, then do the processing that used to be in your background task. Then go back around for more.

You realize that you are pushing the envelope in a program that wasn't originally written for real-time applications, right?

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
1) Is there a reason why you have to write to disk? This can be quite slow.
2) What is the analysis you have to do?

One possible design pattern would be to have 1 thread reading the data and posting it to a queue, 1 thread to read the queue and analyze. Depending on the chores in that thread you could do the digital out from there.

Unfortunately the FIFO ops don't look like they are threadsafe, perhaps that 1st thread could be a background task.

Could you post some code (if it's not too long?)
Dear Tobi,

if you want to do it fast, the standard way is typically to have:
1: hardware which does acquisition alone, just getting a trigger from outside and have a cache to save the data
2: get the data asynchronous from the machine
3. if you need to do real fast calculation do it directly on the hardware (DSP etc.)
4. don't use very slow interfaces like USB ...

Stefan
Tobi-

You're getting some good advice here.

I think the solutions proposed by andyfaff and stefanm would require that you write your own custom XOP. That might not be much harder than coercing Igor and NIDAQ Tools MX to do it for you.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Quote:
You realize that you are pushing the envelope in a program that wasn't originally written for real-time applications, right?

Too true; the main reason that I’m doing this is so that the analysis code can be written in Igor, which everybody in my laboratory uses regularly! That 20 ms loop speed means that ANY analysis done in Igor will have a minimal latency of 20 ms – that’s the key parameter here.

Quote:
1) Is there a reason why you have to write to disk? This can be quite slow.

We save to disk so we can also do off-line analysis. Since the signals come from a mildly complicated biological preparation, they’re worth keeping around.
Quote:
2) What is the analysis you have to do?

The analysis is a little ill-defined right now, and will depend a lot on the end user (not me: I’m just the developer!). Minimal case is finding thresholds and a little bit of matrix math (almost trivial); next highest is multi-dimensional waveform discrimination based on the shape of short segments (aka, spike sorting). The ultimate goal is to compute the response properties of neurons or neural populations. Ultimately, there’s much more flexibility here than in the acquire step.

The multi-thread queue idea is a good way to speed up the analysis loop: if I write the acquire task as an XOP, that’s where I’ll start.

StefanM:
Fortunately, these task parameters are pretty stable and unlikely to get more demanding! The hard part is to make a system that’s flexible and easy enough for others to take advantage of...hence why I’m trying to do it in Igor in the first place.

Quote:
4. don't use very slow interfaces like USB ...

I was surprised that USB is fast enough for this data stream, too!

Summing up this advice, it seems like I could achieve an output response latency of >20 ms with Igor – less than half of the current speed (40+ ms). A latency of 1-10 ms comes from the custom XOP loop (say) and 20 ms from Igor - might be good enough.

Thank you all for your help!

Tobi
If I understand, one reason you want to process is to have a feedback loop for control???? Otherwise, you could read fast in to storage (for example a solid state device) and process "at your convenience".

In this regard, could you gain anything on processing speed by selecting the absolute minimal processing requirement that is demanded by the feedback, leaving all other processing for later? If all you need for example is the average intensity of a signal over a time span as a feedback to adjust an input gain, then a simple sum of all incoming intensity in that time span should serve (followed by a hardware gain adjustment control). All other processing can be dumped off to a "non-real-time" mode.

You may already be this far in your developments ... just thought I'd ask.

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
jjweimer wrote:
If I understand, one reason you want to process is to have a feedback loop for control????

That's exactly it. I wish the analysis step were the rate limiting one--it's much easier to speed that up. The problem is that nothing can go faster than Igor's 20 ms loop time. So you don't gain anything by reducing your analysis speed from 10 to 1 ms!!

Tobi
Hello Tobi

TSzuts wrote:
The problem is that nothing can go faster than Igor's 20 ms loop time.


Not so. One solution is to make use of asynchronous acquisition and/or background threads. Basically you need to make some or all of your tasks independent of the timing of the main loop, which means the following:
1) As stefanm suggests, run the acquisition on the device itself, and read out the data asynchronously. Unfortunately, Igor's Nidaq tools cannot not be run in a preemptive thread, so you would do readout using a background task. That is not a problem since the device will do its own buffering. Also, the added advantage is that this will likely guarantee you the highest timing accuracy (depending on the device). So I would seriously recommend looking into this.

2) If you cannot run the acquisition on the device itself, your best option is to run it in a preemptive thread, and to transfer the data from the preemptive thread to a background task. Since Igor's Nidaq is not thread-safe, you would have to write your own XOP to communicate with the device. If you have any familiarity with XOP programming then that is actually not as hard as it may seem.

3) If any of your downstream processing is slow (and writing to disk certainly is), you will also want to decouple this from the main loop, and run it in an Igor thread. Basically this thread receives data folders with new data to write (think ThreadGroupGetDF in the thread and ThreadGroupPutDF in the background task) and takes care of it. Even better is that this is fire-and-forget, so the background task doesn't have to care about any of it once the data has been submitted.

4) The same is true for the analysis. If it is slow, then run it in its own thread. If not then just do it in the main thread.

See also the "Slow Data Acq" example experiment included with Igor.

Threading can be complex. However, if you have a clear mental picture of how everything fits together it can actually be surprisingly straightforward. Where it becomes ugly is usually in situations involving error reporting, progress updates, or aborts. And if you can get the device to function asynchronously (strongly recommended), then you may not need to do any threading at all.

By the way, if you were using something other than Igor, you would likely end up with very similar solutions.
I would echo 741's sentiments. Asynchronous programming is definitely the way to go.

Hi 741,
741 wrote:
One solution is to make use of asynchronous acquisition and/or background threads.

True - I was being overly general when I said that "nothing can go faster than Igor's 20 ms loop time." What I meant to say was that the process chain (acquire, analyze, output), will be limited to that loop time if any one part of it is written as an Igor function - slowest step wins. Johnweeks says that the time limit applies to background tasks too, unfortunately. If I write everything in XOPs, that takes away the advantage of letting others write analysis code in Igor quickly and easily.

Tobi
TSzuts wrote:
What I meant to say was that the process chain (acquire, analyze, output), will be limited to that loop time if any one part of it is written as an Igor function - slowest step wins. Johnweeks says that the time limit applies to background tasks too, unfortunately.


I don't really see the limitation. First of all, I don't think there is typically a need to perform GUI updates or data saving at a rate exceeding 50 Hz. If faster GUI updates are important to you, Igor is probably not the appropriate tool for the job.

I think the only aspect where the loop period is excessively slow is in the data acquisition. If you perform direct sampling of the I/O ports on the DAQ device from the background task, you will be limited to sampling every 20 ms. However, I think that you will want to avoid doing that for a number of reasons. Simply use the Nidaq tools to set up a continuous sampling, and read that out in the background task (disclaimer - I'm not all that familiar with nidaq tools – but I would find it surprising if that functionality didn't exist).

Then set up threads for the data saving and analysis. You can have the logging function take multiple samples at once, which means that you can simply fire off a new batch every 20 ms. Have the background task call ThreadGroupPutDF with the new samples, and have the thread block on ThreadGroupGetDF until data is available. Because these are queues, Igor will make sure that your data is safe while it's waiting to be written to disk. Here's an overview:
// background task
myData = GetMyData() // your acquisition function. Calls into nidaq tools at some point.
DFREF freeFolder = NewFreeDataFolder()
Duplicate myData, freeFolder:myData
ThreadGroupPutDF mySavingThreadID, freeFolder
// do other stuff or end this iteration of the task

// in the thread that saves the data
for (;;)
    DFREF newData = ThreadGroupGetDF(0, inf) // wait forever -- possible deadlock if you're doing things wrong
    // check for some magic 'stop' value in newData to stop the thread
    // you need to send this magic data when the acquisition stops
    wave myNewData = newData:myData
    savemynewdata(myNewData) // your function. Uses Fprintf or whatever
endfor


The same is true for the analysis. Have the background task call ThreadGroupPutDF, and the analysis thread likewise blocks on ThreadGroupGetDF. Then, to get the results back out, you do the reverse, but make sure that the background thread calls ThreadGroupGetDF with a small timeout value so that the background task isn't blocked while the newest data unit is still executing.

TSzuts wrote:
If I write everything in XOPs, that takes away the advantage of letting others write analysis code in Igor quickly and easily.


You can have Igor code in the analysis and use it without blocking the background task by using threading. The only limitation is that the analysis needs to be threadsafe. Consider the following:
// prototype for a data analysis. Assume that all analysis you're interested in fits in this template.
// but you're free to modify this template to whatever you want.
ThreadSafe Function /WAVE MyAnalysisPrototype(data)
    wave data // contains the newly acquired analysis data
End

ThreadSafe Function DoAnalysisWorker(myAnalysisFunction)
    FUNCREF MyAnalysisPrototype myAnalysisFunction
   
    for (;;)
        DFREF newData = ThreadGroupGetDF(0, inf) // wait forever -- possible deadlock if you're doing things wrong
        wave myNewData = newData:myData
        wave result = myAnalysisFunction(myNewData) // this is cool – the analysis can be completely different
        // depending on what function the user provides in ThreadGroupCreate, as long as it is
        // based on MyAnalysisPrototype. But you don't have to change any part of the threading!

        // make a new free data folder and return it to the main thread with ThreadGroupPutDF
    endfor
End


And then simply pass the appropriate function when you call ThreadGroupCreate. The users of your software can make whatever function they want, in Igor code, provided that it is threadsafe and conforms to your prototype, and it will play nice with your code. Best of all is that, even if the function is slow, your background task is still firing away at 20 ms because it doesn't have to care about the preemptive thread! (Assuming that you're not sucking up all the CPU power in the background threads - unlikely on modern PCs with more than one CPU and if your analysis uses just a single thread.)
TSzuts wrote:
True - I was being overly general when I said that "nothing can go faster than Igor's 20 ms loop time." What I meant to say was that the process chain (acquire, analyze, output), will be limited to that loop time if any one part of it is written as an Igor function - slowest step wins. Johnweeks says that the time limit applies to background tasks too, unfortunately. If I write everything in XOPs, that takes away the advantage of letting others write analysis code in Igor quickly and easily.Tobi

Actually, I wasn't quite clear about the 20 mS limit. What I meant was that the recurrence time for a background task is limited by the main event loop, which runs at 20 mS intervals. A tight loop in a user-defined function can run much faster than 20 mS. BUT, a user-defined function *prevents* the main loop from running, and NIDAQ Tools depends on the main loop to get messages for transferring data. You can work around that problem by calling DoXOPIdle in your user-defined function. Calling DoXOPIdle is just like running the main event loop, except that you decide when it should happen.

An alternative might be to use fDAQmx_ScanGetAvailable() to get data in a tight loop inside a user function. That might be better than DoXOPIdle in that it will only run the NIDAQ Tools XOP, and not every other XOP in your Igor installation.

It may not be quite applicable to your situation, but my intention for this sort of feedback loop was that you would call DAQmx_AI_SetupReader and fDAQmx_AI_GetReader to run your servo loop. Those will work in a tight loop in a user function, and should be much faster than what you're trying to do. Your loop would have to save results to the FIFO itself, rather than relying on DAQmx_Scan to do it, though.


John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
TSzuts wrote:
...If I write everything in XOPs, that takes away the advantage of letting others write analysis code in Igor quickly and easily.


It seems to me, the only "analysis" parts you need to write in XOP are those that are needed to calculate the feedback signal from the input signal in order to apply it to the output signal. Everything else could be Igor code.

Unless of course you are expecting that your users will want to operate at a refresh rate above 50 Hz too :-)

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
That's a good way to circumvent the main loop speed, John!! I was imagining a series of background tasks - since so many people were suggesting them, they seemed like the way to go - but if the code could be written in a tight loop without background tasks, timing could be arbitrary and I could implement all the good ideas that have come up: separate threads to increase analysis speed/avoid conflicts, queuing to conserve processor overhead, writing to disk at a slower rate, etc. In that case, the only parts that would benefit from an XOP treatment would be acquiring data (to allow threading, since Igor's NiDAQ tools aren't thread-safe) and possibly analysis (if computationally intensive).

Tobi
TSzuts wrote:
That's a good way to circumvent the main loop speed, John!! I was imagining a series of background tasks - since so many people were suggesting them, they seemed like the way to go

Even that wouldn't work- background tasks are checked and run once each time around the main event loop. If you have 5 background tasks whose times have all expired, they will all get run one after another. They will all then not run again until the main event loop goes around again.
Quote:
but if the code could be written in a tight loop without background tasks, timing could be arbitrary and I could implement all the good ideas that have come up: separate threads to increase analysis speed/avoid conflicts, queuing to conserve processor overhead, writing to disk at a slower rate, etc. In that case, the only parts that would benefit from an XOP treatment would be acquiring data (to allow threading, since Igor's NiDAQ tools aren't thread-safe) and possibly analysis (if computationally intensive).

That's true. Keep in mind that even a tight loop in a user-defined function is much slower than C/C++ code in an XOP.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com