Monday, June 24, 2019

High Throughput Computing (HTC)

  • What is HTCondor?
HTCondor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, HTCondor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to HTCondor, HTCondor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.
https://research.cs.wisc.edu/htcondor/description.html

  • High Throughput Computing (HTC) 
For many scientists, the quality of their research is heavily dependent on computing throughput. It is not uncommon to find problems that require weeks or months of computation to solve. Scientists involved in this type of research need a computing environment that delivers large amounts of computational power over a long period of time. Such an environment is called a High-Throughput Computing (HTC) environment. In contrast, High-Performance Computing (HPC) environments deliver a tremendous amount of power over a short period of time. HPC environments are often measured in terms of FLoating point OPerations per Second (FLOPS). Many scientists today do not care about FLOPS; their problems are on a much larger scale. These people are concerned with floating point operations per month or per year. They are interested in how many jobs they can complete over a long period of time.

As computers became smaller, faster and less expensive, scientists moved away from mainframes and purchased personal computers or workstations. An individual or a small group could afford a computing resource that was available whenever they wanted it. The resource might be slower than the mainframe, but it provided exclusive access. Recently, instead of one large computer for an institution, there are many workstations. Each workstation is owned by its user. This is distributed ownership. While distributed ownership is more convenient for the users, it is also less efficient. Machines sit idle for long periods of time, often while their users are busy doing other things. HTCondor takes this wasted computation time and puts it to good use. The situation today matches that of yesterday, with the addition of clusters in the list of resources. These machines are often dedicated to tasks. HTCondor manages a cluster's effort efficiently, as well as handling other resources.

To achieve the highest throughput, HTCondor provides two important functions. First, it makes available resources more efficient by putting idle machines to work. Second, it expands the resources available to users, by functioning well in an environment of distributed ownership.

http://research.cs.wisc.edu/htcondor/overview/
High Throughput Computing Facilities

High throughput computing(HTC) is an efficient and effective way to solve many research problems – by breaking the problems up into numerous small, independent sub-tasks and distributing work across a grid of many different computers. HTC is a complement to supercomputing and is particularly well suited to applications in which there is much data to be analyzed but little need for communication - such as data mining, molecular docking, etc.
https://www.its.hku.hk/services/research/htc/system
What Is High Throughput Distributed Computing
Parallel & Cluster Computing High Throughput Computing

  • In this tutorial, we will learn how to apply DAGMan to help us manage jobs and job interdependencies. First, we will revisit the optimization example from in the previous section. Second, we will manage a set of molecular dynamics (MD) simulations using the NAMD program. NAMD is conventionally used in highly parallel HPC settings, scaling to thousands of cores managed by a single job. One can achieve the same scaling and ease of management in HTC systems using thousands of individual jobs using workflow tools such as DAGMan. 

https://swc-osg-workshop.github.io/OSG-UserTraining-Internet2-2018/novice/DHTC/04-dagman.html

  • DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor. It manages dependencies between jobs at a higher level than the HTCondor Scheduler.

https://research.cs.wisc.edu/htcondor/dagman/dagman.html