You are here: Foswiki>ATLAS_Tier3 Web>CondorHowTo (revision 4)EditAttach

HTCondor

What is HTCondor?

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.

In the case of MSU's tier3, condor is made up of several interacting systems.
  1. The login/submit nodes (heracles, alpheus, and maron at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
  2. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  3. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  4. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.

The Login Nodes

Storage Areas

The tier3 has several different storage disks you can make use of to store code, datasets, plots, etc.

Home Areas

The home areas (~ or /home/<username>) are on a machine named gorgo, but everything is setup so that you see them at /home/ This disk has 1.8 TB of space on it, but is shared among all the users. That means the home areas are not meant for storing large amounts of information, such as data files.

The home area is backed up periodically though (just not daily), which means it is the perfect place to store lightweight files like important code and plots. It is still recommended that you use a version control system such as SVN or git to manage and backup your code though, as well as copy plots to your desktop or laptop.

Work Disks

Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.). These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>". At the time of writing, the disks available (and the amount of total storage they have) are:
  • t3work1: 12 TB
  • t3work2: 12 TB
  • t3work3: 19 TB
  • t3work4: 19 TB
  • t3work5: 19 TB
  • t3work6: 19 TB
  • t3work7: 28 TB
  • t3work8: 28 TB
  • t3work9: 37 TB

After the disks have been mounted (using the "ls" command as before), you can use the command "df -h" to see how much total space, used space, and available space they have on them (note that the space associated with the users home areas is shared between them). Use this command to find a good disk for you to store your datasets and the like.  Note that t he work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".

Using HTCondor

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
Edit | Attach | Print version | History: r11 | r5 < r4 < r3 < r2 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r4 - 18 Sep 2019, ForrestPhillips
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback