Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Abaqus Parallel Execution to Maximize Throughput 3

Status
Not open for further replies.

testing

Aerospace
Jul 19, 2013
127
0
0
US
I'm trying to run Abaqus as fast as possible on a cluster. My model isn't large, but I'm coupling it to another solver that is more expensive. I'm trying to minimize the time spent in analysis by Abaqus, but I'm not seeing any speedup going from one compute node with OpenMP to two compute nodes with MPI. I'm not concerned with efficiency of the abaqus solve itself, just with straight-up throughput so my expensive analysis is not sitting idle while waiting on Abaqus. Does anyone know ways to increase speed? This can include disabling all Abaqus output since the variables I care about will be duplicated in the other code's output. Right now, I'm not getting more than 90 iterations per minute of wallclock time, but I'd like to get that down.
 
Replies continue below

Recommended for you

I guess you may have already reached the max. throughput for this particular problem with 1 node (with 20 or 24 cores). Try using 2, 4, 8, 16 cores on 1 node and see how your wallclock time changes. You could try the same experiment with the same number of cores over multiple nodes and look at your response. With more nodes, the I/O will come in to play as well. Think of doing a DOE.

Also, look in to GPGPUs. I have read claims of up to 40% speed-up using graphical hardware.

*********************************************************
Are you new to this forum? If so, please read these FAQs:

 
You spurred me on to create a batch file that I've been planning to do. Basically edit the file in a text editor and it will report the clock time between solves with different CPU and GPU options. I didn't do anything fancy like loops or anything. Just dump the results into Excel to see how the different simulations stack up. I have attached the BAT, INP and XLSX in the attached ZIP file.

For my trivial simulation splitting up to multiple domains did not help. Larger jobs typically see better scale up. This was a 1000 element linear model.

The contents of the batch file are below. I hope this helps.

Rob

@echo off
SetLocal EnableDelayedExpansion
>AbqDOE.txt (
echo !time!
call abaqus job=DOE cpus=1 gpus=0 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=2 gpus=0 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=3 gpus=0 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=4 gpus=0 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=1 gpus=1 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=2 gpus=1 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=3 gpus=1 ask_delete=OFF -seq
echo !time!
call abaqus job=DOE cpus=4 gpus=1 ask_delete=OFF -seq
echo !time!
)
20160315165249-OPTIMAL-ROB_owucil.png


Rob Stupplebeen
Rob's Engineering Blog
Rob's LinkedIn Profile
 
@IceBreakerSours: I've got some other runs in the queue to look at what I can get from GPU acceleration. I need to go back and do the lower core counts to see how quickly it falls off before saturating the speedup gains.

@rstupplebeen: Thanks! My problem isn't quite that small, but it's not too much bigger either. This is probably what I'll see after a few cores. On a different problem, I was seeing speedups up to 6 cores with diminishing returns.

I need to look at potentially subcycling between the external solver and Abaqus to see the gain I want. The more optimistic conclusion would be that I can run a much finer Abaqus model given the available core count without much of a wall clock hit.

Thanks for the responses!
 
Status
Not open for further replies.
Back
Top