Difference: CondorHowTo (1 vs. 11)

Revision 11
17 Oct 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Line: 45 to 45
 

Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.). These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>".
Changed:
<
<
At the time of writing, the disks available (and the amount of total storage they have) are:
  • t3work1: 12 TB
  • t3work2: 12 TB
  • t3work3: 19 TB
  • t3work4: 19 TB
  • t3work5: 19 TB
  • t3work6: 19 TB
  • t3work7: 28 TB
  • t3work8: 28 TB
  • t3work9: 37 TB
>
>
The work disks and the storage available on them can be found here.
 

After the disks have been mounted (using the "ls" command as before), you can use the command "df -h" to see how much total space, used space, and available space they have on them (note that the space associated with the users home areas is shared between them). Use this command to find a good disk for you to store your datasets and the like.
Revision 10
03 Oct 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Changed:
<
<

What is HTCondor?

>
>

What is HTCondor? VIDEO

 

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.
Line: 31 to 32
 

The tier3 has several different storage disks you can make use of to store code, datasets, plots, etc.
Changed:
<
<

Home Areas

>
>

Home Areas VIDEO

 

The home areas (~ or /home/<username>) are on a machine named gorgo, but everything is setup so that you see them at /home/ This disk has 1.8 TB of space on it, but is shared among all the users.
Line: 40 to 41
  The home area is backed up periodically though (just not daily), which means it is the perfect place to store lightweight files like important code and plots. It is still recommended that you use a version control system such as SVN or git to manage and backup your code though, as well as copy plots to your desktop or laptop.
Changed:
<
<

Work Disks

>
>

Work Disks VIDEO

 

Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.). These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>".
Revision 9
03 Oct 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Line: 21 to 23
 

The command for ssh'ing to one of the login nodes from a terminal is:
Changed:
<
<
$ ssh @.aglt2.org
>
>
$ ssh @.aglt2.org
 


Revision 8
20 Sep 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Line: 16 to 16
 

The Login Nodes

Added:
>
>
At the time of writing there are 3 login nodes for submitting jobs: heracles, alpheaus, and maron. These nodes can be logged into from a terminal or x2go.

The command for ssh'ing to one of the login nodes from a terminal is:
$ ssh <username>@<login node>.aglt2.org
 


Line: 51 to 58
 

Worker Nodes

Added:
>
>
There are approximately 50 worker nodes with about 500 job slots available for running various types of computing tasks. A summary of how much these nodes are working can be found here.
 


Revision 7
19 Sep 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Line: 11 to 11
 
  1. The login/submit nodes (heracles, alpheus, and maron at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
  2. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  3. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
Changed:
<
<
  1. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
>
>
  1. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8 or 12 cores. Each of these cores is setup to run 1 job. In total, there are about 450 job slots at the time of writing.
 


Line: 46 to 46
 

After the disks have been mounted (using the "ls" command as before), you can use the command "df -h" to see how much total space, used space, and available space they have on them (note that the space associated with the users home areas is shared between them). Use this command to find a good disk for you to store your datasets and the like.
Changed:
<
<
 Note that t he work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".
>
>
Note that the work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".


Worker Nodes

 


Revision 6
18 Sep 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Line: 13 to 13
 
  1. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  2. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
Added:
>
>

 

The Login Nodes

Added:
>
>

 

Storage Areas

The tier3 has several different storage disks you can make use of to store code, datasets, plots, etc.
Line: 27 to 31
  It is still recommended that you use a version control system such as SVN or git to manage and backup your code though, as well as copy plots to your desktop or laptop.

Work Disks

Changed:
<
<
Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.).
>
>
Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.).
  These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>". At the time of writing, the disks available (and the amount of total storage they have) are:
  • t3work1: 12 TB
Line: 44 to 48
  Use this command to find a good disk for you to store your datasets and the like.  Note that t he work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".
Added:
>
>

 

Using HTCondor

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
Revision 5
18 Sep 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Added:
>
>
 

What is HTCondor?

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.
Revision 4
18 Sep 2019 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Deleted:
<
<

What is HTCondor?

 
Added:
>
>

What is HTCondor?

  HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.

In the case of MSU's tier3, condor is made up of several interacting systems.
Changed:
<
<
  1. The login nodes (green, maron, and white at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
>
>
  1. The login/submit nodes (heracles, alpheus, and maron at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
 
  1. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  2. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  3. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
Deleted:
<
<

Using HTCondor

 
Changed:
<
<
To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
>
>

The Login Nodes

Storage Areas

The tier3 has several different storage disks you can make use of to store code, datasets, plots, etc.

Home Areas

The home areas (~ or /home/<username>) are on a machine named gorgo, but everything is setup so that you see them at /home/ This disk has 1.8 TB of space on it, but is shared among all the users. That means the home areas are not meant for storing large amounts of information, such as data files.
 
Changed:
<
<
-- ForrestPhillips - 14 Sep 2017
>
>
The home area is backed up periodically though (just not daily), which means it is the perfect place to store lightweight files like important code and plots. It is still recommended that you use a version control system such as SVN or git to manage and backup your code though, as well as copy plots to your desktop or laptop.

Work Disks

Located at /msu/data are several disks meant for storage of large amounts of information, such as datasets (CxAODs, etc.). These disks automount once called for, so they might not be available until you use a command such as "ls -l /msu/data/<diskname>". At the time of writing, the disks available (and the amount of total storage they have) are:
  • t3work1: 12 TB
  • t3work2: 12 TB
  • t3work3: 19 TB
  • t3work4: 19 TB
  • t3work5: 19 TB
  • t3work6: 19 TB
  • t3work7: 28 TB
  • t3work8: 28 TB
  • t3work9: 37 TB

After the disks have been mounted (using the "ls" command as before), you can use the command "df -h" to see how much total space, used space, and available space they have on them (note that the space associated with the users home areas is shared between them). Use this command to find a good disk for you to store your datasets and the like.  Note that t he work disks do not come with user areas by default, instead you must make one yourself using the command "mkdir /msu/data/<diskname>/<username>".

Using HTCondor

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
Revision 3
19 Oct 2017 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

Changed:
<
<
What is HTCondor?
>
>

What is HTCondor?

 

HTCondor is a high throughput computing system that can run multiple related tasks simultaneously. Most commonly, this means splitting up a job that runs over N events into M jobs that each run N/M events. However, that is just a simple example of what HTCondor can do.
Line: 10 to 10
 
  1. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  2. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
  3. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
Changed:
<
<
Using HTCondor
>
>

Using HTCondor

 

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
Revision 2
19 Sep 2017 - Main.ForrestPhillips
Line: 1 to 1
 
META TOPICPARENT name="WebHome"

HTCondor

What is HTCondor?
Line: 7 to 7
 

In the case of MSU's tier3, condor is made up of several interacting systems.
  1. The login nodes (green, maron, and white at time of writing): Which are used to submit jobs, run small scripts, and manage files. Note: Should not be used for resource intensive computing.
Changed:
<
<
  1. Home directories: Technically part of the login nodes? This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  2. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks.
>
>
  1. Home directories: A file server separate from the login nodes, but is where a user is dropped when they login. This is where a users home folder is located. These are backed up and should be used for things like code, plots, theses, etc.
  2. The work disks (t3work1-9 for example): These are file servers with large amounts of disk space (many TBs). These are used for storing large data. They use RAID 6(?) to protect against failed disks. They are not backed up.
 
  1. The worker nodes/job slots: At the time of writing, there are about 50 machines that are used for running jobs. Each of these has 8, 12, or 24 cores. Most of these cores run 1 job, while others are hyper-threaded and can run more than 1 job. In total, there are about 450 job slots at the time of writing.
Added:
>
>
Using HTCondor

To use HTCondor, a user minimally needs an executable and to create a condor submit scipt. The process for this can be found here, as well as information on more advanced topics.
 

-- ForrestPhillips - 14 Sep 2017
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback