[FLASH-USERS] investigating runtime failures

Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] daniel.s.kokron at nasa.gov
Tue Dec 12 11:32:50 EST 2017


Klaus,

Thanks for the suggestions.

I understand I haven't provided much information.  That's an indication as to how far I am from a solution.  Investigating sporadic failures without a test case is never fun.

Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
________________________________________
From: Klaus Weide [klaus at flash.uchicago.edu]
Sent: Monday, December 11, 2017 15:29
To: Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
Cc: flash-users at flash.uchicago.edu
Subject: Re: [FLASH-USERS] investigating runtime failures

On Mon, 11 Dec 2017, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] wrote:

> FLASH users,
>
> We have a user on the Pleiades system who is experiencing unexplained
> run time failures while using a variant of FLASH-4.2.2.  The error
> message indicates memory corruption, possibly related to memory used for
> MPI communication, but we haven't been able to isolate the cause.
>
> Is anyone aware of changes since FLASH-4.2.2 that might have addressed this type of failure mode?
>
> Daniel Kokron
> NASA Ames (ARC-TN)
> SciCon group


Daniel,

Without more information about the run time failures it is hard to say
whether the cause may have been removed since FLASH 4.2.2.  But that
version of FLASH was released nearly 3.5 years ago, while FLASH 4.5 was
released only 3 days ago, and it is quite likely that among the changes in
between there has been a related fix.  In fact I am sure there has been at
least some change that relates to "memory used for MPI communication".

The user should be encouraged to try a newer FLASH version. It is likely
that FLASH 4.5 can be used in place of 4.2.2 without major (or any)
changes.

There have been mentions of MPI-related failures on "pleiades" in the
mailing list before. I do not know whether they are related, or whether
and how they have been resolved. The search field at the bottom of <URL:
http://flash.uchicago.edu/site/flashcode/user_support/> can be useful for
finding related messages in the mailing list archives.

Klaus



More information about the flash-users mailing list