Skip to content
Snippets Groups Projects

Rename directories in multi-gpu regression tests

When running the regression tests on 4 and 8 GPUs at the same time, one of them fails due to missing files.

ℹ Copying files...
✔  Done
🚀  Launched job 2684546
✔  Wrote job ID 2684546 to file hpcrocket4GPU.log
ℹ Collecting files...
❌ FileNotFoundError: multigpu_test/output/4GPU/
✔  Done
ℹ Cleaning files...
❌ ResourceNotFound: resource '' not found
               ╷          ╷              
  ID           │ Name     │ State        
╶──────────────┼──────────┼─────────────╴
  2684546      │ Regr4GPU │ ❌ FAILED    
  2684546.bat+ │ batch    │ ❌ FAILED    
  2684546.ext+ │ extern   │ ✔ COMPLETED  
               ╵          ╵              

I think the cause is the clean stage in the rocketGPU.yml file.

clean:
  - multigpu_test/*

Since the directory had the same name for both tests, one test may have deleted the data of the other test. Therefore, I renamed the directories.

Merge request reports

Checking pipeline status.

Approved by

Merged by Anna WellmannAnna Wellmann 1 year ago (Feb 15, 2024 3:41pm UTC)

Merge details

  • Changes merged into develop with 362a6629 (commits were squashed).
  • Deleted the source branch.
  • Auto-merge enabled

Pipeline #37082 passed

Pipeline passed for 362a6629 on develop

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading