Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations GregLocock on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Abaqus Standard 64 cpus is slower than 16.

Status
Not open for further replies.

fixitben

Computer
Aug 14, 2015
3
thread799-332932
thread799-341011

I have the same problem as the threads above.
We just purchased a new server Cisco UCS c460 M4 with 4 physical processors at 18 cores each running RHEL 6.5. A total of 72 cpus with hyper threading turned off.
We also have a HP DL580 g7 with 4 physical processors at 10 cores each Running RHEL 5.5. A total of 40 cores. We also have a Super micro cluster that is 5 years old with 64 cores across 8 nodes.

I have ran the performace runs abaqus supplies to compare speeds and both of our dl580 and c460 don't scale past 16cpus. The run actually gets slower with more cpus.
I have a ticket open with abaqus, but they haven't gotten anywhere in two weeks. I figured I would ask on here and see if anybody else has seen this or knows of a solution.

With the 5 year old cluster the job scales all the way out to 64 and is faster each time.

The job I am using the s6.inp provided in the performance benchmarks from abaqus. abq6141 fetch job=s6

I have ran the other jobs and the ones with high iterations don't scale past 16-18 cpus either. The run gets slower. The cisco c460 has 512gb of ram and 2tb raid 0 ssd drives. The disk isn't the bottle neck on this.

Thanks
 
Replies continue below

Recommended for you

Try testing an explicit run. I wonder if the inter-core/node communication is behind the drop in performance past a certain threshold # of cores.

*********************************************************
Are you new to this forum? If so, please read these FAQs:

 
I just ran the explicit examples e1.inp and e2.inp.

They seem to scale all the way up to 72 cpus with Explicit.


Capture_s0blcw.jpg


Thanks
 
I think there must be a inflexion point to the implicit runs wherein additional CPUs cause performance degradation.

Note that these comparisons are tricky and highly problem dependent. What you might want to do is having a library of 'benchmark'/typical models.

For more read the Job Execution chapter in the Analysis User's Guide in the documentation.

*********************************************************
Are you new to this forum? If so, please read these FAQs:

 
Well I got a reply back from Abaqus after doing a bunch of testing. They had me add this option to the command -mp_host_split 8

abaqus -inp s4e -j s4ec16m2 -cpus 16 -mp_host_split 2
abaqus -inp s4e -j s4ec32m2 -cpus 32 -mp_host_split 2
abaqus -inp s4e -j s4ec32m4 -cpus 32 -mp_host_split 4
abaqus -inp s4e -j s4ec64m4 -cpus 64 -mp_host_split 4
abaqus -inp s4e -j s4ec64m8 -cpus 64 -mp_host_split 8

This basically simulates a 8 node cluster and breaks the problem into 8 standard solvers on the same machine. It now scales much better than running it without. Without the option 16 cpus wasn't any faster than 72. It is a workaround. I really think they have optimized Abaqus to run on clusters and never even looked at high core count machines. I think they need to look at this deeper and see what is the underlying problem. In theory this machine should be faster than almost any 72 core cluster out there because there is no interconnect latency. I told them they need to work on this more and do some more development since the core counts are only going to get higher. You can get a 2 proc machine with 36 cores now and I am pretty sure it will have the same issue past 16-18 cores.

Hopefully this helps others out there.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor