What is HTCondor? VIDEO
HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.
In the case of MSU's tier3, condor is made up of several interacting systems.
- The login/submit nodes (heracles, alpheus, and maron at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
- Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
- The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
- The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8 or 12 cores. Each of these cores is setup to run 1 job. In total, there are about 450 job slots at the time of writing.
The Login Nodes
At the time of writing there are 3 login nodes for submitting jobs: heracles, alpheaus, and maron.
These nodes can be logged into from a terminal or x2go.
The command for ssh'ing to one of the login nodes from a terminal is:
$ ssh <username>@<login node>.aglt2.org
The tier3 has several different storage disks you can make use of to store code, datasets, plots, etc.
Home Areas VIDEO
The home areas (~ or /home/<username>) are on a machine named gorgo, but everything is setup so that you see them at /home/
This disk has 1.8 TB of space on it, but is shared among all the users.
That means the home areas are not meant for storing large amounts of information, such as data files.
The home area is backed up periodically though (just not daily), which means it is the perfect place to store lightweight files like important code and plots.
It is still recommended that you use a version control system such as SVN or git to manage and backup your code though, as well as copy plots to your desktop or laptop.
Work Disks VIDEO
Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.).
These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>
The work disks and the storage available on them can be found here
After the disks have been mounted (using the "ls" command as before), you can use the command "df -h" to see how much total space, used space, and available space they have on them (note that the space associated with the users home areas is shared between them).
Use this command to find a good disk for you to store your datasets and the like.
Note that the work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".
There are approximately 50 worker nodes with about 500 job slots available for running various types of computing tasks.
A summary of how much these nodes are working can be found here
To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here
, as well as information on more advanced topics.