HTCondor

What is HTCondor?

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.

In the case of MSU's tier3, condor is made up of several interacting systems.
  1. The login nodes (green, maron, and white at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
  2. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  3. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  4. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.

Using HTCondor

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.

-- ForrestPhillips - 14 Sep 2017
Topic revision: r3 - 19 Oct 2017, ForrestPhillips
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback