multithreaded curvefitting | Igor Pro by WaveMetrics

That would be because a polynomial fit is linear in the coefficients and uses totally separate code from the nonlinear, iterative fits. The linear fits (line fits, polynomial and poly2d) should still be faster than the equivalent nonlinear fit because they don't iterate.

Log in or register to post comments

April 30, 2024 at 09:26 am - Permalink

thomas_braun

@tony Can you provide a MWE for playing around? I also don't see a reason why poly order 5 doesn't benefit from multithreading.

Log in or register to post comments

May 1, 2024 at 06:31 am - Permalink

johnweeks

OK, I misunderstood a bit. I was thinking of the automatic multithreading internal to the iterated nonlinear fits. But this is a question about doing multiple fits simultaneously via the Multithread keyword.

My guess is that the polynomial fits are sufficiently fast that for whatever number of fits you're doing, the threading overhead is swamping the benefit of multithreading. Just a guess. As Thomas suggests, some play would be in order.

Log in or register to post comments

May 1, 2024 at 09:49 am - Permalink

tony

Turns out that the slowdown happens only when a mask wave is involved. For poly fitfunc, not for others.

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 295399, ST time: 911160, Speedup: 3.0845

  Masked poly fit. poly order: 5, MT time: 8.34295e+06, ST time: 938055, Speedup: 0.112437

Attachments poly fitting speed test_1.ipf

Log in or register to post comments

May 2, 2024 at 03:05 am - Permalink

tony

There is also an unexpected dependence on the length of waves to be fitted:

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 209471, ST time: 486002, Speedup: 2.32014

  Masked poly fit. poly order: 5, MT time: 211773, ST time: 509983, Speedup: 2.40816

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 7.55799e+06, ST time: 547901, Speedup: 0.0724929

  Masked poly fit. poly order: 5, MT time: 257492, ST time: 683144, Speedup: 2.65307

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 293396, ST time: 917342, Speedup: 3.12663

  Masked poly fit. poly order: 5, MT time: 8.86679e+06, ST time: 945262, Speedup: 0.106607

Other fit functions don't show this kind of behaviour.

•MTspeedtest("gauss", 5, 200)

  Unmasked gauss fit. poly order: 5, MT time: 214601, ST time: 977831, Speedup: 4.55651

  Masked gauss fit. poly order: 5, MT time: 139170, ST time: 460551, Speedup: 3.30926

•MTspeedtest("gauss", 5, 500)

  Unmasked gauss fit. poly order: 5, MT time: 566795, ST time: 2.91364e+06, Speedup: 5.14056

  Masked gauss fit. poly order: 5, MT time: 306634, ST time: 1.56379e+06, Speedup: 5.09985

•MTspeedtest("gauss", 5, 1000)

  Unmasked gauss fit. poly order: 5, MT time: 1.14809e+06, ST time: 5.9246e+06, Speedup: 5.16039

  Masked gauss fit. poly order: 5, MT time: 658464, ST time: 3.2876e+06, Speedup: 4.99284

Edit: add system info

•Print IgorInfo(3)

  OS:macOS;OSVERSION:14.3.1;LOCALE:US;IGORFILEVERSION:9.06B01;

•Print ThreadProcessorCount

  8

Model Name:   MacBook Pro
Model Identifier:   MacBookPro15,2
Processor Name:   Quad-Core Intel Core i5
Processor Speed:   2.3 GHz
Number of Processors:   1
Total Number of Cores:   4
L2 Cache (per Core):   256 KB
L3 Cache:   6 MB
Hyper-Threading Technology:   Enabled
Memory:   16 GB

Log in or register to post comments

May 2, 2024 at 03:25 am - Permalink

johnweeks

That is bizarre. Share the experiment so I can do some profiling?

Log in or register to post comments

May 2, 2024 at 12:19 pm - Permalink

tony

did you take a look at the ipf linked above? is it a. reproducible for you and b. enough to do some profiling?

Log in or register to post comments

May 2, 2024 at 12:23 pm - Permalink

ilavsky

There must be more in this, my system (IP9.05 on MacOS ARM, M1PRO) shows very different results:

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 1.14053e+06, ST time: 913149, Speedup: 0.800638

  Masked poly fit. poly order: 5, MT time: 1.15828e+06, ST time: 757275, Speedup: 0.653794

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 1.17211e+06, ST time: 987406, Speedup: 0.842417

  Masked poly fit. poly order: 5, MT time: 1.21521e+06, ST time: 902780, Speedup: 0.742902

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 1.11386e+06, ST time: 1.17587e+06, Speedup: 1.05567

  Masked poly fit. poly order: 5, MT time: 1.236e+06, ST time: 1.30618e+06, Speedup: 1.05678

Log in or register to post comments

May 2, 2024 at 01:12 pm - Permalink

tony

Now that is strange.

From my home computer, an older macbook pro:

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 234119, ST time: 421007, Speedup: 1.79826

  Masked poly fit. poly order: 5, MT time: 194209, ST time: 396807, Speedup: 2.0432

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 585567, ST time: 540801, Speedup: 0.923551

  Masked poly fit. poly order: 5, MT time: 283139, ST time: 613617, Speedup: 2.16719

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 369582, ST time: 925710, Speedup: 2.50474

  Masked poly fit. poly order: 5, MT time: 1.03486e+06, ST time: 1.08674e+06, Speedup: 1.05013

compared with gaussian fits (poly order doesn't have any meaning here):

•MTspeedtest("gauss", 5, 200)

  Unmasked gauss fit. poly order: 5, MT time: 352307, ST time: 977642, Speedup: 2.77497

  Masked gauss fit. poly order: 5, MT time: 174721, ST time: 478166, Speedup: 2.73674

•MTspeedtest("gauss", 5, 500)

  Unmasked gauss fit. poly order: 5, MT time: 1.01302e+06, ST time: 2.87903e+06, Speedup: 2.84203

  Masked gauss fit. poly order: 5, MT time: 578342, ST time: 1.55051e+06, Speedup: 2.68096

•MTspeedtest("gauss", 5, 1000)

  Unmasked gauss fit. poly order: 5, MT time: 2.20035e+06, ST time: 7.81715e+06, Speedup: 3.55269

  Masked gauss fit. poly order: 5, MT time: 1.24213e+06, ST time: 3.38625e+06, Speedup: 2.72616

Hardware Overview:

Model Name:   MacBook Pro
Model Identifier:   MacBookPro14,2
Processor Name:   Dual-Core Intel Core i5
Processor Speed:   3.3 GHz
Number of Processors:   1
Total Number of Cores:   2

Log in or register to post comments

May 2, 2024 at 01:32 pm - Permalink

KZarzana

There's even more excitement in the Wintel world. Gauss fits with any data length are fine, as are 4th order poly fits of any length. However, with 5th order poly fits, if the datalength is greater than 300 or so, the multithread fitting causes Igor to essentially lock up and makes the computer super laggy. The memory usage stays the same, but the CPU quickly goes to 100% and the process never completes. I can abort the process, but when I try to close the experiment I often get a window saying that Igor has crashed. 4th order polynomial and Gauss fits seem fine, as do single threaded 5th order polynomial fits. I've shipped the crash report off to WM support.

System info: Intel 12th Gen, i5-12600H
Windows 10 Enterprise (22H2), 10.0.19045
Igor 9.0.5.1

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 1.09726e+06, ST time: 1.32983e+06, Speedup: 1.21196

  Masked poly fit. poly order: 5, MT time: 785966, ST time: 1.43032e+06, Speedup: 1.81983

•MTspeedtest("poly", 5, 300)

  Unmasked poly fit. poly order: 5, MT time: 1.17299e+06, ST time: 1.43175e+06, Speedup: 1.2206

  Masked poly fit. poly order: 5, MT time: 861461, ST time: 1.45183e+06, Speedup: 1.68531

•MTspeedtest("gauss", 5, 200)

  Unmasked gauss fit. poly order: 5, MT time: 344054, ST time: 663625, Speedup: 1.92884

  Masked gauss fit. poly order: 5, MT time: 173156, ST time: 299527, Speedup: 1.72981

•MTspeedtest("gauss", 5, 500)

  Unmasked gauss fit. poly order: 5, MT time: 490591, ST time: 1.78317e+06, Speedup: 3.63475

  Masked gauss fit. poly order: 5, MT time: 233319, ST time: 1.02946e+06, Speedup: 4.41223

•MTspeedtest("gauss", 5, 1000)

  Unmasked gauss fit. poly order: 5, MT time: 877258, ST time: 3.63709e+06, Speedup: 4.14598

  Masked gauss fit. poly order: 5, MT time: 324798, ST time: 2.05203e+06, Speedup: 6.31787

•MTspeedtest("poly", 4, 1000)

  Unmasked poly fit. poly order: 4, MT time: 819395, ST time: 1.67335e+06, Speedup: 2.04218

  Masked poly fit. poly order: 4, MT time: 648839, ST time: 1.79097e+06, Speedup: 2.76027

•Print IgorInfo(3)

  OS:Microsoft Windows 10 Enterprise (22H2);OSVERSION:10.0.19045.4291;LOCALE:US;IGORFILEVERSION:9.0.5.1;

•Print IgorInfo(4)

  Intel

•Print ThreadProcessorCount

  16

Log in or register to post comments

May 2, 2024 at 04:15 pm - Permalink

johnweeks

It's not reproducible:

•mtspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 279780, ST time: 491701, Speedup: 1.75746

  Masked poly fit. poly order: 5, MT time: 709739, ST time: 516056, Speedup: 0.727106

•mtspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 354910, ST time: 430078, Speedup: 1.21179

  Masked poly fit. poly order: 5, MT time: 705499, ST time: 526373, Speedup: 0.7461

•mtspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 3.62641e+08, ST time: 477736, Speedup: 0.00131738

  Masked poly fit. poly order: 5, MT time: 709477, ST time: 525866, Speedup: 0.741202

Log in or register to post comments

May 2, 2024 at 05:14 pm - Permalink

jjweimer

MacBook Pro 16in, 2.4GHz i9, 32GB RAM

•MTspeedtest("poly",5,200)

  Unmasked poly fit. poly order: 5, MT time: 754293, ST time: 406586, Speedup: 0.539029

  Masked poly fit. poly order: 5, MT time: 753732, ST time: 408524, Speedup: 0.542001

•MTspeedtest("poly",5,500)

  Unmasked poly fit. poly order: 5, MT time: 2.29552e+06, ST time: 464351, Speedup: 0.202285

  Masked poly fit. poly order: 5, MT time: 793827, ST time: 576895, Speedup: 0.726727

•MTspeedtest("poly",5,1000)

  Unmasked poly fit. poly order: 5, MT time: 869120, ST time: 838253, Speedup: 0.964485

  Masked poly fit. poly order: 5, MT time: 2.15979e+07, ST time: 849671, Speedup: 0.0393404

•MTspeedtest("gauss",5,1000)

  Unmasked gauss fit. poly order: 5, MT time: 580758, ST time: 4.89651e+06, Speedup: 8.43124

  Masked gauss fit. poly order: 5, MT time: 334341, ST time: 2.64889e+06, Speedup: 7.92272

•Print IgorInfo(3)

  OS:macOS;OSVERSION:14.4.1;LOCALE:US;IGORFILEVERSION:9.06B01;

•Print ThreadProcessorCount

  16

I am getting consistent times on the poly 5 1000.

Dare I propose that this is related to an i5 versus i7 versus i9 chip?

Interesting that the M1 chip does worse in multi-threading versus single threading in the poly 5 1000 test compared to the i9. Indeed, if I read this correctly, I should certainly not switch my i9 for an M1 as long as I am planning to need heavy processing with Igor Pro.

Log in or register to post comments

May 2, 2024 at 08:17 pm - Permalink

tony

If instead of using CurveFit poly I fit a user-defined polynomial fitting function (FuncFit UserPoly), fitting is of course way slower, but I see the expected speed increase with multithreading. No surprise there.

It's clear that something is amiss with CurveFit poly. I am curious about how the M1pro fares with other fit functions. I would expect to see significant speed improvement with multithreading.

Log in or register to post comments

May 3, 2024 at 02:11 am - Permalink

jtigor

Another windows data point. As KZarzana observed, poly 5 starts to get slow between 300 and 400 points and CPU usage remains at 100% until completion. However, there was no problem with the computer responding to, say, mouse movement.

Completion time for unmasked multithreaded poly 5 hit 2e8 for 400 and above points with approximately the same time beyond that number of points. Single threaded times were on the order of 1e6. Time for the Masked MT fit took a big jump some where between 500 and 1000 points.

Dell Precision 7760 - 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz; 32 GB RAM; 8 cores/16 logical processors

Poly 5:

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 208505, ST time: 363385, Speedup: 1.74281

  Masked poly fit. poly order: 5, MT time: 140017, ST time: 411845, Speedup: 2.94138

•MTspeedtest("poly", 5, 300)

  Unmasked poly fit. poly order: 5, MT time: 127913, ST time: 418590, Speedup: 3.27246

  Masked poly fit. poly order: 5, MT time: 132262, ST time: 392124, Speedup: 2.96474

•MTspeedtest("poly", 5, 400)

  Unmasked poly fit. poly order: 5, MT time: 2.09506e+08, ST time: 424655, Speedup: 0.00202693

  Masked poly fit. poly order: 5, MT time: 141391, ST time: 537311, Speedup: 3.80017

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 2.16166e+08, ST time: 424359, Speedup: 0.00196312

  Masked poly fit. poly order: 5, MT time: 172648, ST time: 647535, Speedup: 3.7506

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 2.02592e+08, ST time: 841479, Speedup: 0.00415357

  Masked poly fit. poly order: 5, MT time: 2.06259e+08, ST time: 884786, Speedup: 0.00428968

Gauss:

•MTspeedtest("gauss", 5, 200)

  Unmasked gauss fit. poly order: 5, MT time: 117796, ST time: 566506, Speedup: 4.80923

  Masked gauss fit. poly order: 5, MT time: 94848.7, ST time: 248933, Speedup: 2.62453

•MTspeedtest("gauss", 5, 500)

  Unmasked gauss fit. poly order: 5, MT time: 235547, ST time: 1.86926e+06, Speedup: 7.9358

  Masked gauss fit. poly order: 5, MT time: 134776, ST time: 876038, Speedup: 6.49997

 •MTspeedtest("gauss", 5, 1000)

 Unmasked gauss fit. poly order: 5, MT time: 462936, ST time: 3.61267e+06, Speedup: 7.80383

  Masked gauss fit. poly order: 5, MT time: 273130, ST time: 1.99793e+06, Speedup: 7.31495

Log in or register to post comments

May 3, 2024 at 06:23 am - Permalink

ilavsky

Macbook Pro, M1PRO (ARM), Sonoma 14.4.1, IP 9.0.5.1. I think it is working with Gauss as expected.

Igor manages to load all cores. Since some cores are performance and some are efficiency, I wonder how that makes multithreading complicated as results from different cores must be arriving at very different speeds.

MTspeedtest("gauss", 5, 200)

  Unmasked gauss fit. poly order: 5, MT time: 126119, ST time: 859307, Speedup: 6.81346

  Masked gauss fit. poly order: 5, MT time: 101142, ST time: 434436, Speedup: 4.29531

•MTspeedtest("gauss", 5, 500)

  Unmasked gauss fit. poly order: 5, MT time: 290593, ST time: 2.53254e+06, Speedup: 8.71508

  Masked gauss fit. poly order: 5, MT time: 149553, ST time: 1.34725e+06, Speedup: 9.00853

•MTspeedtest("gauss", 5, 1000)

  Unmasked gauss fit. poly order: 5, MT time: 632526, ST time: 5.23632e+06, Speedup: 8.27842

  Masked gauss fit. poly order: 5, MT time: 319750, ST time: 2.88794e+06, Speedup: 9.03187

•MTspeedtest("gauss", 5, 10000)

  Unmasked gauss fit. poly order: 5, MT time: 5.79273e+06, ST time: 1.33819e+07, Speedup: 2.31012

  Masked gauss fit. poly order: 5, MT time: 3.36825e+06, ST time: 1.1933e+07, Speedup: 3.54279

Log in or register to post comments

May 3, 2024 at 07:29 am - Permalink

johnweeks

KZarzana has sent us a report of an actual crash while doing the multithreaded poly fit test.

Given that the test uses enoise() to set the poly coefficients, it's highly likely that sometimes there is a pathological data set that causes unusual problems with the fitting.

The erratic results (see my test results above with three runs of N=500 above, where everthing looks mostly OK, but one test gives a result 1.5 orders of magnitude higher) plus KZarzana's crash, I'm betting on some sort of thread safety violation in the poly fit code.

That wouldn't be surprising given that the basic code was first written 30 years ago. At present, it is using a SVD solution based on Numerical Recipes edition 1.something, from before they tightened up their licensing restrictions. That would be suspect...

I'll let y'all know what I find.

Log in or register to post comments

May 3, 2024 at 09:13 am - Permalink

tony

Yeah, using enoise is a poor choice for making speed comparisons. But I doubt that a pathological data set could be created using this method. The fact that problems arise only under restricted combinations of inputs rather suggests otherwise. For this test, creating an input wave of zeroes would probably show the same trends.

I remember coding plenty of numerical recipes stuff in Igor in the days before we had so many built in methods available. I used to translate from the fortran version because that was the only code I understood.

Log in or register to post comments

May 3, 2024 at 01:26 pm - Permalink

KZarzana

Two more data points. On an AMD Ryzen 3700X system (Win10, Igor 9.0.5.1):

•MTspeedtest("Gaus", 5, 200)

  Unmasked Gaus fit. poly order: 5, MT time: 10070.2, ST time: 35370.4, Speedup: 3.51238

  Masked Gaus fit. poly order: 5, MT time: 9805.9, ST time: 35446.2, Speedup: 3.61478

•MTspeedtest("Gaus", 5, 500)

  Unmasked Gaus fit. poly order: 5, MT time: 10190.5, ST time: 61877.8, Speedup: 6.07211

  Masked Gaus fit. poly order: 5, MT time: 9685.1, ST time: 90618.8, Speedup: 9.35652

•MTspeedtest("Gaus", 5, 1000)

  Unmasked Gaus fit. poly order: 5, MT time: 12094.6, ST time: 111623, Speedup: 9.22915

  Masked Gaus fit. poly order: 5, MT time: 11537.4, ST time: 136792, Speedup: 11.8564

•MTspeedtest("Poly", 4, 200)

  Unmasked Poly fit. poly order: 4, MT time: 115901, ST time: 377673, Speedup: 3.25859

  Masked Poly fit. poly order: 4, MT time: 122661, ST time: 370966, Speedup: 3.02433

•MTspeedtest("Poly", 4, 500)

  Unmasked Poly fit. poly order: 4, MT time: 336137, ST time: 611126, Speedup: 1.81809

  Masked Poly fit. poly order: 4, MT time: 159058, ST time: 522750, Speedup: 3.28653

•MTspeedtest("Poly", 4, 1000)

  Unmasked Poly fit. poly order: 4, MT time: 274522, ST time: 837962, Speedup: 3.05244

  Masked Poly fit. poly order: 4, MT time: 351762, ST time: 884927, Speedup: 2.5157

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 372060, ST time: 510901, Speedup: 1.37317

  Masked poly fit. poly order: 5, MT time: 413478, ST time: 460336, Speedup: 1.11333

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 289436, ST time: 641480, Speedup: 2.21631

  Masked poly fit. poly order: 5, MT time: 409983, ST time: 685766, Speedup: 1.67267

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 426360, ST time: 974424, Speedup: 2.28545

  Masked poly fit. poly order: 5, MT time: 399927, ST time: 924882, Speedup: 2.31263

There weren't any slow down issues on the Ryzen Windows system with the poly 5 fits.

And on an M2 Max (macOS Ventura 13.5.2):

•MTspeedtest("Gauss", 5, 200)

  Unmasked Gauss fit. poly order: 5, MT time: 121638, ST time: 591668, Speedup: 4.86416

  Masked Gauss fit. poly order: 5, MT time: 84415.9, ST time: 295797, Speedup: 3.50404

•MTspeedtest("Gauss", 5, 500)

  Unmasked Gauss fit. poly order: 5, MT time: 218543, ST time: 1.79662e+06, Speedup: 8.22089

  Masked Gauss fit. poly order: 5, MT time: 145612, ST time: 966459, Speedup: 6.63721

•MTspeedtest("Gauss", 5, 1000)

  Unmasked Gauss fit. poly order: 5, MT time: 431504, ST time: 3.63349e+06, Speedup: 8.42052

  Masked Gauss fit. poly order: 5, MT time: 255647, ST time: 2.09708e+06, Speedup: 8.20304

•MTspeedtest("poly", 4, 200)

  Unmasked poly fit. poly order: 4, MT time: 1.84797e+06, ST time: 630529, Speedup: 0.3412

  Masked poly fit. poly order: 4, MT time: 2.62976e+06, ST time: 698544, Speedup: 0.265631

•MTspeedtest("poly", 4, 500)

  Unmasked poly fit. poly order: 4, MT time: 1.86755e+06, ST time: 726264, Speedup: 0.388885

  Masked poly fit. poly order: 4, MT time: 2.57265e+06, ST time: 850029, Speedup: 0.330409

•MTspeedtest("poly", 4, 1000)

  Unmasked poly fit. poly order: 4, MT time: 1.87716e+06, ST time: 969844, Speedup: 0.516654

  Masked poly fit. poly order: 4, MT time: 2.66254e+06, ST time: 1.07144e+06, Speedup: 0.402413

•MTspeedtest("poly", 5, 200)

  Unmasked poly fit. poly order: 5, MT time: 2.20938e+06, ST time: 671477, Speedup: 0.30392

  Masked poly fit. poly order: 5, MT time: 3.08876e+06, ST time: 773982, Speedup: 0.25058

•MTspeedtest("poly", 5, 500)

  Unmasked poly fit. poly order: 5, MT time: 2.22977e+06, ST time: 887805, Speedup: 0.39816

  Masked poly fit. poly order: 5, MT time: 3.16569e+06, ST time: 910858, Speedup: 0.287728

•MTspeedtest("poly", 5, 1000)

  Unmasked poly fit. poly order: 5, MT time: 2.25792e+06, ST time: 1.13934e+06, Speedup: 0.504595

  Masked poly fit. poly order: 5, MT time: 3.2675e+06, ST time: 1.16903e+06, Speedup: 0.357776

Log in or register to post comments

May 3, 2024 at 03:23 pm - Permalink