Difference: CondorHowTo (r2 vs. r1)

r2 - 19 Sep 2017 - 18:52 - ForrestPhillips r1 - 14 Sep 2017 - 17:52 - ForrestPhillips

HTCondor

HTCondor

What is HTCondor?
What is HTCondor?

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.

In the case of MSU's tier3, condor is made up of several interacting systems.

In the case of MSU's tier3, condor is made up of several interacting systems.

  
  1. The login nodes (green, maron, and white at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
  2. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  3. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  4. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
  1. The login nodes (green, maron, and white at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
  2. Home directories: Technically part of the login nodes? This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  3. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks.
  4. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
  
Using HTCondor
  

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.

-- ForrestPhillips - 14 Sep 2017

-- ForrestPhillips - 14 Sep 2017

r2 - 19 Sep 2017 - 18:52 - ForrestPhillips r1 - 14 Sep 2017 - 17:52 - ForrestPhillips

View topic | View difference interwoven | History: r10 < r9 < r8 < r7 | More topic actions
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback