Wall clock times lower for cpu only versus cpu+gpu simulation

donaldhume · Oct 8, 2018

Hi all,

I'm running a strain model with millions of elements which takes several hours. Our group has recently started running simulation work on GPU clusters and I thought I might try to take advantage of the new hardware for some of these ABAQUS/Implicit jobs. I've found the results to be quite perplexing in R2018x. I've included all of the timestamp data for each of two cases and was hoping someone might either point me toward why I'm seeing this discrepancy, and/or literature that highlights best practices for job formulations which take advantage of GPU+CPU combination. I was under the impression my job which takes advantage of the gpu would run significantly faster.

I do monitor the gpu using nvidia-smi and it is most certainly being used for parts of the simulation.

job=x cpus=16 double=both int
JOB TIME SUMMARY
USER TIME (SEC) = 2.24008E+05
SYSTEM TIME (SEC) = 12359.
TOTAL CPU TIME (SEC) = 2.36367E+05
WALLCLOCK TIME (SEC) = 20171

job=x cpus=16 gpus=1 double=both int
JOB TIME SUMMARY
USER TIME (SEC) = 1.99401E+05
SYSTEM TIME (SEC) = 18010.
TOTAL CPU TIME (SEC) = 2.17411+05
WALLCLOCK TIME (SEC) = 39364

These two cases are run on the same box. As far as I can tell, the job with the added GPU takes almost twice as long to run based on the wall clock time. It is possible I am misunderstanding this information. I would appreciate any insight the community can lend.

JBlack68 · Oct 9, 2018

Is the run with gpu not ~11% faster? (user time). Could the other quoted times be "virtual" because of the GPU use?

donaldhume · Oct 9, 2018

Thanks for your response, JXB. I very well may have misunderstood but I thought that wallclock time meant the time passed as seen by the "clock on the wall". If user-time is the more appropriate metric for passed real world time, then I suppose that would make sense.

Can someone confirm?

donaldhume · Nov 8, 2018

I thought I should add a follow up to this post.

I ran my jobs in MATLAB and timed both cases. I was correct in my initial assumption of WALLCLOCK TIME referring to the time 'as passed by the clock on the wall'. For some reason the simulation run with GPU takes almost twice as long as the simulation which doesn't use the GPU. This boggles the mind.

We are running a K2200 -- so I don't think the speed of the card should be a limiting factor. Anyway, back to the drawing board.

StefCon · Nov 9, 2018

The K2200 is not a new card. It is from 2014.
For double the price of a K2200 you get the P4000 with better performance (~3x *for certain applications*). Still, I'm not sure that gpu acceleration is really meant to be used with quadro cards (crippled double precision floating point).

Link

Your K2200 has 2.4% of the floating point performance as a Tesla K80 for example.

donaldhume · Nov 9, 2018

Hi StefCon,

Thank you for your response. You are 100% correct, and in fact it is the Quadro P4000.

Getting to your point though about GPU acceleration not being best utilized on the quadro cards, would you mind elaborating? Would I see better results running a similar test on the a GTX1080ti?

Also does anyone know how the problem scaling works? It's not completely clear to me how the GPU supplements and/or replaces work on the CPU. Would I see fairly linear improvement as I move from 1 to 2 to 4 gpus. We have a system with 4 GPUs and 64 CPU threads and I'm trying to figure out what sort of hardware utilization would see best performance improvements over our standard cpus=16 flag.

Thanks for the feedback.

Dave442 · Nov 11, 2018

i don't use GPUs so cant answer your questions.

But have you tried contacting your Simulia technical rep?

We typically get a very detailed response quite quickly from them.

StefCon · Nov 12, 2018

Hello donaldhume,

What I mean is that most modern Nvidia cards allow for gpu acceleration.
Consumer grade (e.g. GTX1080Ti) will not allow to be used with Abaqus I think. It is not a professional card.
Quadro cards are professinal cards and can be used for GPU acceleration. Nvidia does however make the quadro chips slower for calculations compared to the Tesla cards.

From what I've read, you need a balance between CPU and GPU. Adding another GPU might not give you the boost you thought.

I second Dave442's suggestion of contacting Simulia.

donaldhume · Nov 12, 2018

Thank you for the feedback.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Wall clock times lower for cpu only versus cpu+gpu simulation

donaldhume

Bioengineer

JBlack68

Aerospace

donaldhume

Bioengineer

donaldhume

Bioengineer

StefCon

Mechanical

donaldhume

Bioengineer

Dave442

Mechanical

StefCon

Mechanical

donaldhume

Bioengineer

Similar threads

Part and Inventory Search

Sponsor