Why Pymechanical Remote gives a solver error on HPC- Linux CentOS 7?

Options
CollinsJnr
CollinsJnr Member Posts: 17
First Comment Name Dropper

I have had this error for over a month now and couldn't find a solution on the Pyansys Githut repo.

Anytime I run Pymechanical in remote mode it has a problem with solver.

When it gets to the line for the solver to solve the analysis, it produces an error.

This is a simple case from the Pymechanical documentation example:

Link : https://examples.mechanical.docs.pyansys.com/examples/00_basic/example_01_simple_structural_solve.html#sphx-glr-examples-00-basic-example-01-simple-structural-solve-py

Kindly follow the link to see the Pymechanical Code I ran.

I copied all the code into a python script and ran it on a HPC.

I already installed ansys-mechanical-core in my virtual environment.

I noticed that when I removed the line

Solve static analysis.

STAT_STRUC.Solution.Solve(True)

Result Without SOlver Line

(Pyansys) [HPC ~]$ python /users/Projects/Cantilever-Project/HPC_Test.py
Downloaded the geometry file to: /users/.local/share/ansys_mechanical_core/examples/example_01_geometry.agdb
Initialize() started
Initialize() done
Ansys Mechanical [Ansys Mechanical Enterprise]
Product Version:232
Software build date: 05/30/2023 05:40:47

project directory = /users/.ansys/AnsysMech4B92/Project_Mech_Files/
Uploading example_01_geometry.agdb to 127.0.0.1:10000:/users/.ansys/AnsysMech4B92/Project_Mech_Files/.: 100%|███████████████████████████████████████████████████| 17.0k/17.0k [00:00<00:00, 3.92MB/s]
part_file_path on server: /users/.ansys/AnsysMech4B92/Project_Mech_Files/example_01_geometry.agdb
exit code: 0
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:48:39 2024
AnsMeshingServer, okay, Mon Apr 29 10:48:40 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:48:40 2024
AnsMeshingServer, okay, Mon Apr 29 10:48:41 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:48:42 2024
AnsMeshingServer, okay, Mon Apr 29 10:48:42 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:48:43 2024
AnsMeshingServer, okay, Mon Apr 29 10:48:44 2024
exit code: 255
{"Maximum": "0 [m]", "Minimum": "0 [m]", "Average": "0 [m]"}

The code runs just fine without solving although without results, shwoing that the gRPC connection is okay

But with this line, the solver generates this error:

Error with SOlver Line:

(Pyansys) [HPC ~]$ python /user/Projects/Cantilever-Project/HPC_Test.py
Downloaded the geometry file to: /users/.local/share/ansys_mechanical_core/examples/example_01_geometry.agdb
Initialize() started
Initialize() done
Ansys Mechanical [Ansys Mechanical Enterprise]
Product Version:232
Software build date: 05/30/2023 05:40:47

project directory = /users/.ansys/AnsysMechD1B5/Project_Mech_Files/
Uploading example_01_geometry.agdb to 127.0.0.1:10000:/users/.ansys/AnsysMechD1B5/Project_Mech_Files/.: 100%|███████████████████████████████████████████████████| 17.0k/17.0k [00:00<00:00, 4.06MB/s]
part_file_path on server: /users/.ansys/AnsysMechD1B5/Project_Mech_Files/example_01_geometry.agdb
exit code: 0
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:51:18 2024
AnsMeshingServer, okay, Mon Apr 29 10:51:19 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:51:19 2024
AnsMeshingServer, okay, Mon Apr 29 10:51:20 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:51:21 2024
AnsMeshingServer, okay, Mon Apr 29 10:51:21 2024
exit code: 255
AnsMeshingServer, compiled May 30 2023 07:02:06, DS Mesher, Mon Apr 29 10:51:22 2024
AnsMeshingServer, okay, Mon Apr 29 10:51:23 2024
exit code: 255
Running Solver : /opt/apps/testapps/el7/software/staging/ANSYS/2023R2/v232/aisol/../ansys/bin/ansys232 -b nolist -s noread -i dummy.dat -o solve.out -dis -np 50 -p ansys
/opt/apps/testapps/el7/software/staging/ANSYS/2023R2/v232/aisol/.workbench: line 271: 13947 Killed $Pgm $Args
CRITICAL - - logging - handle_exception - Uncaught exception
Traceback (most recent call last):
File "/users/Projects/Cantilever-Project/HPC_Test.py", line 29, in
output = mechanical.run_python_script(
File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/ansys/mechanical/core/mechanical.py", line 981, in run_python_script
result_as_string = self.call_run_python_script(
File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/ansys/mechanical/core/mechanical.py", line 1702, in __call_run_python_script
for runscript_response in self._stub.RunPythonScript(request):
File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/grpc/_channel.py", line 542, in __next

return self._next()
File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/grpc/_channel.py", line 968, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Socket closed" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-04-29T10:52:12.118576839+01:00"}" >
(Pyansys) [HPC ~]$ (Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)

-----------------------------------------------------------------------------
| |
| RUN COMPLETED |
| |
|-----------------------------------------------------------------------------|
|
| Ansys MAPDL 2023 R2 Build 23.2 UP20230530 LINUX x64 |
| |
|
|-----------------------------------------------------------------------------|
| |
| Database Requested(-db) 1 MB Scratch Memory Requested 0 MB |
| Maximum Database Used 1 MB Maximum Scratch Memory Used 1 MB |
| |
|-----------------------------------------------------------------------------|
| |
| CP Time (sec) = 0.189 Time = 10:52:42 |
| Elapsed Time (sec) = 26.000 Date = 04/29/2024 |
| |
*--------------------------------------------------------------------------

slurmstepd: error: Detected 1624 oom_kill events in StepId=2511326.5. Some of the step tasks have been OOM Killed.
I_MPI_JOB_TIMEOUT = -1 second(s): job ending due to startup timeout
srun: error: node002: task 0: Out Of Memory

Answers

  • Rajesh Meena
    Rajesh Meena Moderator, Employee Posts: 67
    First Anniversary Solution Developer Community of Practice Member Ansys Employee 5 Likes
    Options

    Can you please verify if you are able to run this job manually on this HPC? You seem to be using 50 cores.

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    edited April 29
    Options

    Thank you
    @Rajesh Meena

    I have not tried to run this job manually but I have tried to run jobs manually and with Pymechanical and I still got exactly the same error.

    How can you tell that I am using 50 cores?

  • Rajesh Meena
    Rajesh Meena Moderator, Employee Posts: 67
    First Anniversary Solution Developer Community of Practice Member Ansys Employee 5 Likes
    Options

    @CollinsJnr Are you saying that you are not able to run any job on HPC? Is this issue specific to PyMechanical?

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    Options

    @Rajesh Meena

    I am able to submit jobs (input files from Mechanical GUI) on the HPC and it runs smoothly.

    But when I submit the job as Pymechanical it gives this error

  • Rajesh Meena
    Rajesh Meena Moderator, Employee Posts: 67
    First Anniversary Solution Developer Community of Practice Member Ansys Employee 5 Likes
    Options

    This seems to be solve command being triggered:

    /opt/apps/testapps/el7/software/staging/ANSYS/2023R2/v232/aisol/../ansys/bin/ansys232 -b nolist -s noread -i dummy.dat -o solve.out -dis -np 50 -p ansys

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    edited April 29
    Options

    @Rajesh Meena
    Yes it is.

    But after that,

    CRITICAL - - logging - handle_exception - Uncaught exception
    Traceback (most recent call last):
    File "/users/Projects/Cantilever-Project/HPC_Test.py", line 29, in
    output = mechanical.run_python_script(
    File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/ansys/mechanical/core/mechanical.py", line 981, in run_python_script
    result_as_string = self.call_run_python_script(
    File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/ansys/mechanical/core/mechanical.py", line 1702, in __call_run_python_script
    for runscript_response in self._stub.RunPythonScript(request):
    File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/grpc/_channel.py", line 542, in __next
    return self._next()
    File "/users/.conda/envs/Pyansys/lib/python3.10/site-packages/grpc/_channel.py", line 968, in _next
    raise self

    And the gRPC connection closes

    grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Socket closed" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-04-29T10:52:12.118576839+01:00"}" >

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    Options

    @Rajesh Meena

    Any idea what the problem might be?

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    Options

    @Pernelle Marone-Hitz

    @Rajesh Meena

    Any ideas as to what the problem could be?

    Any help would be highly appreciated.

  • Pernelle Marone-Hitz
    Pernelle Marone-Hitz Member, Moderator, Employee Posts: 804
    First Comment First Anniversary Ansys Employee Solution Developer Community of Practice Member
    Options

    @CollinsJnr It seems you have already reported this in Github (https://github.com/ansys/pymechanical/issues/694) where the issue is marked as closed. Can you please clarify?

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    Options

    @Pernelle Marone-Hitz
    Thank you.

    The conclusion on github was that it is not a Pymechanical problem, so I had to close the issue and post it here.

  • Pernelle Marone-Hitz
    Pernelle Marone-Hitz Member, Moderator, Employee Posts: 804
    First Comment First Anniversary Ansys Employee Solution Developer Community of Practice Member
    Options

    Thanks for the info @CollinsJnr. My understanding from the Github issue is that the issue is not at all related to code, since it also happens when you are running from the UI. This forum is dedicated to scripting questions, so you should not have been routed here.
    If you are a paying customer I would advise you to contact your local support provider and simply mention the issue that you are having through the UI. If you do not have access to official support, you can create a post in this other forum, which takes care of all other aspects except scripting: https://forum.ansys.com/

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    edited May 3
    Options

    Thank you, @Pernelle Marone-Hitz

    So I loaded the Ansys GUI on the HPC and I noticed that the material is not showing

    Kindly see picture below

    meanwhile I can see the materials in the Engineering Data Sources

    Is there a material loading setting to see all available materials in Mechanical?

  • Pernelle Marone-Hitz
    Pernelle Marone-Hitz Member, Moderator, Employee Posts: 804
    First Comment First Anniversary Ansys Employee Solution Developer Community of Practice Member
    Options

    @CollinsJnr We can only help with scripting questions on this forum. This is likely an installation issue. Please contact your local support provider, or create a post in this other forum, which takes care of all other aspects except scripting: https://forum.ansys.com/

  • CollinsJnr
    CollinsJnr Member Posts: 17
    First Comment Name Dropper
    Options

    @Pernelle Marone-Hitz

    Thank you so much.
    Super grateful for your continued help and support.