Monday, July 29, 2019

workload managers

Slurm and Moab
Slurm and Moab are two workload manager systems that have been used to schedule and manage user jobs run on Livermore Computing (LC) clusters. Currently, LC runs Slurm natively on most clusters, and provides Moab "wrappers" now that Moab has been decommissioned. This tutorial presents the essentials for using Slurm and Moab wrappers on LC platforms

What is a Workload Manager?
The typical LC cluster is a finite resource that is shared by many users.
In the process of getting work done, users compete for a cluster's nodes, cores, memory, network, etc.
In order to fairly and efficiently utilize a cluster, a special software system is employed to manage how work is accomplished.
Commonly called a Workload Manager. May also be referred to (sometimes loosely) as:
Batch system
Batch scheduler
Workload scheduler
Job scheduler
Resource manager (usually considered a component of a Workload Manager)
Tasks commonly performed by a Workload Manager:
Provide a means for users to specify and submit work as "jobs"
Evaluate, prioritize, schedule and run jobs
Provide a means for users to monitor, modify and interact with jobs
Manage, allocate and provide access to available machine resources
Manage pending work in job queues
Monitor and troubleshoot jobs and machine resources
Provide accounting and reporting facilities for jobs and machine resources
Efficiently balance work over machine resources; minimize wasted resources
https://computing.llnl.gov/tutorials/moab/
Deploying a Burstable and Event-driven HPC Cluster on AWS Using SLURM, Part 1
Google Codelab for creating two federated Slurm clusters on Google Cloud Platform
OpenStack and HPC Workload Management
Increasing Cluster Performance by Combining rCUDA with Slurm
Docker vs Singularity vs Shifter in an HPC environment
Helix - HPC/SLURM Tutorial


  • SchedMD® is the core company behind the Slurm workload manager software, a free open-source workload manager designed specifically to satisfy the demanding needs of high performance computing. 

https://www.schedmd.com/

  • Slurm vs Moab/Torque on Deepthought HPC clusters

Intro and Overview: What is a scheduler?
A high performance computing (HPC) cluster (hereafter abbreviated HPCC) like the Deepthought clusters consists of many compute nodes, but at the same time have many users submitting many jobs, often very large jobs. The HPCC needs a mechanism to distribute jobs across the nodes in a reasonable fashion; this is the task of a program called a scheduler.
This is a complicated tasks: the various jobs can have various requirements e.g. CPU, memory, diskspace, network transportation, etc. ) as well as differing priorities. And because we want to enable large parallel jobs to run, the scheduler needs to be able to reserve nodes for larger jobs (i.e. if an user submits a job requiring 100 nodes, and only 90 nodes are currently free, the scheduler might need to keep other jobs off the 90 free nodes in order that the 100 node job might eventually run). The scheduler must also account for nodes which are down, or have insufficient resources for a particular job, etc. As such, a resource manager is also needed (which can either be integrated with the scheduler or run as a separate program). The scheduler will also need to interface with an accounting system (which also can be integrated into the scheduler) to handle the charging of allocations for time used on the cluster.

The original Deepthought HPC cluster at the University of Maryland originally used the Maui scheduler for scheduling jobs, along with the Torque Resource Manager and the Gold Allocation Manager.
In 2009, we migrated to the Moab scheduler, still keeping Torque as our resource manager and Gold for allocation management.Moab derived from Maui, and so the user interface was mostly unchanged during this migration.
Slurm includes its own resource management and accounting system, so Torque and Gold are no longer used.

http://hpcc.umd.edu/hpcc/help/slurm-vs-moab.html
Intelligent HPC Workload Management Across Infrastructure and Organizational Complexity
Running computations on the Torque cluster
Workload Management in HPC and Cloud
Cluster as a Service: Managing multiple clusters for openstack clouds and other diverse frameworks

Overview of the UL HPC Viridis cluster, with its OpenStack-based private Cloud setup.

OpenStack and Virtualised HPC
How the Vienna Biocenter powers HPC with OpenStack

Tuesday, July 9, 2019

How Emails Work

  • How Emails Work

First the sender needs to enter the email address of the recipient along with the message using an email application. This should be done at the local computers. Once it is finished and the “Send” button is clicked, the email will be going to the MTA (The Mail Transfer Agent). This communication is done via the SMTP protocol.

The next step is DNS lookup. The system sends a request to find out the corresponding MTA of the recipient. This will be done with the help of the MX record. In the DNS zone, for the receiver address’ domain, there will be an MX record (stands for Mail Exchanger record). This is a DNS resource record which specifies the mail server of a domain. So, after the DNS lookup, a response is given to the requested mail server with the IP address of the recipient’s mail server. This way the ‘to’ mail server is identified.

The next step is transferring the message between the mail servers. The SMTP protocol is used for this communication. Now our message is with the recipient mail server (MTA).

Now, this message is transferred to the Mail Delivery Agent and then it is transferred to the recipient’s local computer. As we have seen earlier, two protocols can be used here. If we use POP3, then the whole email will be downloaded to the local computer and the copy at the server gets deleted. If the protocol used is IMAP, then the email message is stored in the mail server itself, but the user can easily manipulate the emails on the mail server as in the local computer. This is the difference when using both the protocols and this is how your email gets delivered. If some error occurred to send the email, the emails will be delayed. There is a mail queue in every mail server. These mails will be pending in the mail queue. The mail server will keep trying to resend the email. Once the email sending fails permanently, the mail server may send a bounce back email message to the sender’s email address.

This explains why you maybe getting bounce back emails sometimes. The reason for bouncing back will be explained in the message. There are many reasons for getting an email to bounce back such as as incorrect email address in the ‘to’ field.
https://www.interserver.net/tips/kb/exactly-emails-works-steps-explanation/

  • What is SMTP?


SMTP is part of the application layer of the TCP/IP protocol. Using a process called "store and forward," SMTP moves your email on and across networks. It works closely with something called the Mail Transfer Agent (MTA) to send your communication to the right computer and email inbox.

SMTP spells out and directs how your email moves from your computer's MTA to an MTA on another computer, and even several computers. Using that "store and forward" feature mentioned before, the message can move in steps from your computer to its destination.
https://whatismyipaddress.com/smtp

  • 250 – This SMTP server response simply means everything went well and your message was delivered to the recipient server.

450 – Your message was not delivered because the other user mailbox was not available. This can happen if the mailbox is locked or is not routable
https://sendgrid.com/blog/smtp-server-response-codes-explained/

  • Mail servers can be expressed in DNS servers with the MX records. MX stands for Mail Exchanger. MX records specified the Domain name related mail server IP address. This mail server accepts mail with smtp protocol from senders.

https://www.poftut.com/linux-dig-command-tutorial-examples/


  • To retrieve domain MX records simply use MX option in combination to domain name you wish to query.

Use the +short option to retrieve only mail exchange (MX) records:

If you need to troubleshoot your own DNS server locally while the domain's name server is not yet set, you can point dig to any local or remote DNS server you wish to query by using @HOST/IP syntax.
https://linuxconfig.org/how-to-check-domain-s-mx-mail-exchange-records-using-dig-command-on-linux
Email Client
SMTP Server - mail carrier
Hello Command
MTA - local post office
DNS - to map out path
SMTP Server
Email Client
user agent(UA) - email application such as outlook, thunderbird
web-based email - gmail etc
private email system - thunderbird outlook etc
UA - MTA1 - MTA2 - UA
MTA uses SMTP to communicate each other
MIME Header - what the email contains, txt image etc