fakecineaste

Wednesday, September 19, 2018

hub switch router modems

Routers, Switches, Packets and Frames

Modem vs Router - What's the difference?

Hub, Switch, & Router Explained - What's the difference?

Hub Switches are used to create networks

Routers are used to connect networks

Port Forwarding Explained
IP address along with port number, Router looks at IP address and forwards packet.
Ports range from 0-65535
priveleged ports range from 0-1023

Thursday, September 13, 2018

Consensus is a fundamental problem in fault-tolerant distributed systems. Consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).

https://raft.github.io/

A curated selection of artisanal consensus algorithms and hand-crafted distributed lock services.

https://github.com/dgryski/awesome-consensus

Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failure.

Consensus protocols are the basis for the state machine replication approach to distributed computing

https://en.wikipedia.org/wiki/Paxos_(computer_science)

Paxos algorithm, which is obtained by the straightforward application of consensus to the state ma-
chine approach for building a distributed system
The Problem
Assume a collection of processes that can propose values. A consensus al-
gorithm ensures that a single one among the proposed values is chosen.
https://www.microsoft.com/en-us/research/uploads/prod/2016/12/paxos-simple-Copy.pdf

Paxos is a simple protocol that a group of machines in a distributed system can use to agree on a value proposed by a member of the group
The basic idea is that each proposal has a unique number. Higher numbered proposals override lower-numbered ones
http://read.seas.harvard.edu/%7Ekohler/class/08w-dsi/mazieres07paxos.pdf

What is Raft?

Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. The difference is that it's decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems.

An Introduction to the Raft Distributed Consensus Algorithm

Wednesday, September 12, 2018

responsibility vs task vs duty

Difference Between Duties and Responsibilities

Every one has come across the terms duty and responsibility. Duty is a moral commitment to something or someone, whereas responsibility is a condition of being responsible

As duty refers to moral commitment, it denotes an active feeling for doing something. Once a person engages himself with some duty or if he has been entrusted with a duty, then that person fully commits himself to it. In the case of duty, the person will be involved in activity without any self-interest. As a citizen of a country, a person has many duties to perform. It is his duty to adhere to the constitution

Read more: Difference Between Duties and Responsibilities | Difference Between http://www.differencebetween.net/miscellaneous/difference-between-duties-and-responsibilities/#ixzz5QrBxqqn9

Responsibility can be termed as an ability to act at one’s own will, without any supervision. It is the obligation to successfully complete an assigned task. In responsibility, a person takes upon the duty to compete the task and to make the task a success.

http://www.differencebetween.net/miscellaneous/difference-between-duties-and-responsibilities/

Job responsibilities are what an organization uses to define the work that needs to be performed in a role and the functions that an employee is accountable for. Job responsibilities also include the information most vital to your other talent management processes since it defines the criteria that should be used for employee assessment and development.

The detailed task list is perhaps the easiest approach, and the more traditional way of describing job responsibilities.
https://www.saba.com/resources/how-tos/writing-effective-job-responsibilities-essential-functionscompetencies

The distinction hinges on the question, “Why do I do this?” The responsibility is high level, and the task is specific. One responsibility may carry five (or more) associated tasks. If you can eliminate one responsibility through clarification, you may eliminate several tasks. You carry out tasks to fulfill responsibilities.

https://www.whatsbestnext.com/2010/06/the-difference-between-responsibilities-and-tasks/

Tuesday, September 11, 2018

sre vs devops

For the purposes of this chapter, we’ll define toil as the repetitive, predictable, constant stream of tasks related to maintaining a service.

System maintenance inevitably demands a certain amount of rollouts, upgrades, restarts, alert triaging, and so forth. These activities can quickly consume a team if left unchecked and unaccounted for. Google limits the time SRE teams spend on operational work (including both toil- and non-toil-intensive work) at 50%

What Is Toil?

Here, we provide a concrete example for each toil characteristic:

Manual

When the tmp directory on a web server reaches 95% utilization, engineer Anne logs in to the server and scours the filesystem for extraneous log files to delete.

Repetitive

A full tmp directory is unlikely to be a one-time event, so the task of fixing it is repetitive.

Automatable

If your team has remediation documents with content like “log in to X, execute this command, check the output, restart Y if you see…,” these instructions are essentially pseudocode to someone with software development skills! In the tmp directory example, the solution has been partially automated. It would be even better to fully automate the problem detection and remediation by not requiring a human to run the script. Better still, submit a patch so that the software no longer breaks in this way.

Nontactical/reactive

When you receive too many alerts along the lines of “disk full” and “server down,” they distract engineers from higher-value engineering and potentially mask other, higher-severity alerts. As a result, the health of the service suffers.

Lacks enduring value

Completing a task often brings a satisfying sense of accomplishment, but this repetitive satisfaction isn’t a positive in the long run. For example, closing that alert-generated ticket ensured that the user queries continued to flow and HTTP requests continued to serve with status codes < 400, which is good. However, resolving the ticket today won’t prevent the issue in the future, so the payback has a short duration.

Grows at least as fast as its source

Many classes of operational work grow as fast as (or faster than) the size of the underlying infrastructure. For example, you can expect time spent performing hardware repairs to increase in lock-step fashion with the size of a server fleet. Physical repair work may unavoidably scale with the number of machines, but ancillary tasks (for example, making software/configuration changes) doesn’t necessarily have to.

Case Study 1: Reducing Toil in the Datacenter with Automation

Case Study 2: Decommissioning Filer-Backed Home Directories

Conclusion

At minimum, the amount of toil associated with running a production service grows linearly with its complexity and scale.

Automation is often the gold standard of toil elimination, and can be combined with a number of other tactics. Even when toil isn’t worth the effort of full automation, you can decrease engineering and operations workloads through strategies like partial automation or changing business processes.

https://sre.google/workbook/eliminating-toil/

Prior to Google’s creation of the SRE position, System Administrators ran company operations.

System Administrators worked on the “operations” side of things, whereas engineers worked on the “development” side of things.

Now let’s say that one of the servers in the data center went down and needed to be replaced. With the “old way,” a new server would be configured manually by a system administrator. What this means is that the sysadmin would manually make sure the new machine has the proper operating system, software, tags, etc. Now imagine that 1,000 servers need to be replaced. See where I am going with this? It would take forever, or the company would need a lot of sysadmins to do the labo

“You will automate the server provisioning process to reduce the labor of our networking engineering and datacenter operations teams. Once we plug a new server in, it walks itself through all aspects of provisioning to join the fleet without any human involvement.”

“SREs are Software Engineers who specialize in reliability. SREs apply the principles of computer science and engineering to the design and development of computer systems: generally, large distributed ones.”

By eliminating human interaction through automation, SREs make systems more reliable

https://hackernoon.com/so-you-want-to-be-an-sre-34e832357a8c

Site Reliability Engineering

This systems administrator, or sysadmin, approach involves assembling existing software components and deploying them to work together to produce a service.
Sysadmins are then tasked with running the service and responding to events and updates as they occur.
As the system grows in complexity and traffic volume, generating a corresponding increase in events and updates, the sysadmin team grows to absorb the additional work. Because the sysadmin role requires a markedly different skill set than that required of a product’s developers, developers and sysadmins are divided into discrete teams: "development" and "operations" or "ops."

The sysadmin model of service management has several advantages. For companies deciding how to run and staff a service, this approach is relatively easy to implement: as a familiar industry paradigm, there are many examples from which to learn and emulate.

The sysadmin approach and the accompanying development/ops split has a number of disadvantages and pitfalls. These fall broadly into two categories: direct costs and indirect costs.

Direct costs are neither subtle nor ambiguous. Running a service with a team that relies on manual intervention for both change management and event handling becomes expensive as the service and/or traffic to the service grows, because the size of the team necessarily scales with the load generated by the system.

The indirect costs of the development/ops split can be subtle, but are often more expensive to the organization than the direct costs. These costs arise from the fact that the two teams are quite different in background, skill set, and incentives. They use different vocabulary to describe situations; they carry different assumptions about both risk and possibilities for technical solutions; they have different assumptions about the target level of product stability. The split between the groups can easily become one of not just incentives, but also communication, goals, and eventually, trust and respect

Traditional operations teams and their counterparts in product development thus often end up in conflict, most visibly over how quickly software can be released to production. At their core, the development teams want to launch new features and see them adopted by users. At their core, the ops teams want to make sure the service doesn’t break while they are holding the pager. Because most outages are caused by some kind of change—a new configuration, a new feature launch, or a new type of user traffic—the two teams’ goals are fundamentally in tension.

("We want to launch anything, any time, without hindrance" versus "We won’t want to ever change anything in the system once it works"). And because their vocabulary and risk assumptions differ

The ops team attempts to safeguard the running system against the risk of change by introducing launch and change gates

The dev team quickly learns how to respond. They have fewer "launches" and more "flag flips," "incremental updates," or "cherrypicks." They adopt tactics such as sharding the product so that fewer features are subject to the launch review.

our Site Reliability Engineering teams focus on hiring software engineers to run our products and to create systems to accomplish the work that would otherwise be performed, often manually, by sysadmins.

What exactly is Site Reliability Engineering, as it has come to be defined at Google? My explanation is simple: SRE is what happens when you ask a software engineer to design an operations team

As a whole, SREs can be broken down into two main categories.

50–60% are Google Software Engineers, or more precisely, people who have been hired via the standard procedure for Google Software Engineers. The other 40–50% are candidates who were very close to the Google Software Engineering qualifications (i.e., 85–99% of the skill set required), and who in addition had a set of technical skills that is useful to SRE but is rare for most software engineers. By far, UNIX system internals and networking (Layer 1 to Layer 3) expertise are the two most common types of alternate technical skills we seek.

Therefore, SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, design and implement automation with software to replace human labor.

Eventually, a traditional ops-focused group scales linearly with service size: if the products supported by the service succeed, the operational load will grow with traffic. That means hiring more people to do the same tasks over and over again.

we want systems that are automatic, not just automated. In practice, scale and new features keep SREs on their toes.

DevOps or SRE?
Its core principles—involvement of the IT function in each phase of a system’s design and development, heavy reliance on automation versus human effort, the application of engineering practices and tools to operations tasks—are consistent with many of SRE’s principles and practices

One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel. One could equivalently view SRE as a specific implementation of DevOps with some idiosyncratic extensions.

https://landing.google.com/sre/book.html

while R&D focused on creating new features and pushing them to production, the Operations group was trying to keep production as stable as possible—the two teams were pulling in opposite directions.

Instead of having an Ops team built solely from system administrators, software engineers—with an R&D background and mentality—could enrich the way the team worked with the development group, change its goals and help with automating solutions.

According to Google, SRE engineers are responsible for the stability of the production environment, but at the same time are committed to new features and operational improvement.
Google decided its SRE teams should be composed of 50 percent software engineers and 50 percent system administrators.

Both SRE and DevOps are methodologies addressing organizations’ needs for production operation management
While DevOps raise problems and dispatch them to Dev to solve, the SRE approach is to find problems and solve some of them themselves.
While DevOps teams would usually choose the more conservative approach, leaving the production environment untouched unless absolutely necessary,

For each service, the SRE team sets a service-level agreement (SLA) that defines how reliable the system needs to be to end users.

For each service, the SRE team sets a service-level agreement (SLA) that defines how reliable the system needs to be to end users. If the team agrees on a 99.9 percent SLA, that gives them an error budget of 0.1 percent. An error budget is exactly what the name suggests: the maximum allowable threshold for errors and outages. Here’s the interesting thing: the development team can “spend” this error budget in any way they like. If the product is currently running flawlessly, with few or no errors, they can launch whatever they want, whenever they want. Conversely, if they have met or exceeded the error budget, and are operating at or below the defined SLA, all new releases are frozen until they reduce the number of errors to a level that allows the launch to proceed

To retain their positions, SysAdmins should now be more code-oriented, have better technological knowledge and be receptive to new methods of conducting the work they already do.
https://devops.com/sre-vs-devops-false-distinction/

The DevOps movement began because developers would write code with little understanding of how it would run in production. They would throw this code over the proverbial wall to the operations team, which would be responsible for keeping the applications up and running. This often resulted in tension between the two groups, as each group's priorities were misaligned with the needs of the business

DevOps emerged as a culture and a set of practices that aims to reduce the gaps between software development and software operation. However, the DevOps movement does not explicitly define how to succeed in these areas

SRE happens to embody the philosophies of DevOps, but has a much more prescriptive way of measuring and achieving reliability through engineering and operations work.
In other words, SRE prescribes how to succeed in the various DevOps areas.

If you think of DevOps like an interface in a programming language, class SRE implements DevOps.
DevOps and SRE are not two competing methods for software development and operations, but rather close friends designed to break down organizational barriers to deliver better software faster.

SLIs, SLOs and SLAs tie back closely to the DevOps pillar of "measure everything" and one of the reasons we say class SRE implements DevOps.

https://cloudplatform.googleblog.com/2018/05/SRE-vs-DevOps-competing-standards-or-close-friends.html

an SDET should have at least the following skills and attributes:

Has a tester mindset, is curious and can come up with interesting test scenarios
Has a solid understanding of testing principles and methodologies
Knows that all testing is exploratory in nature and appreciates the difference between testing and checking.
Can apply appropriate test methods for a given scenario
knows the difference between testing and QA
Can code in at least one scripting or programming language (Java and Javascript happen to be the most popular)
Understands HTTP and how modern web applications are built
Can write UI as well as API automated tests. One or the other is not good enough!
Knows Git, Pull Requests, Branching, etc…
Is agile in nature and knows how testing fits in the agile model
Can write performance test scripts (Gatling and/or JMeter)
Thinks about security and is aware of OWASP
Understands CI/CD and Build pipelines
Knows the services offered by cloud platform providers such as AWS, Azure and Google Cloud

https://www.testingexcellence.com/sdet-hiring-software-developers-in-test/

Site reliability engineers will spend up to 50% of their time doing "ops" related work such as handling issues, integrating services, building up CI/CD flows. Since the software system that an SRE is working with is expected to be highly automatic and self-healing, the SRE should spend the other 50% of their time on development tasks like adding new features, such as autoscaling or doing automation.

A site reliability engineer is usually either a software engineer with a good integration knowledge and troubleshooting approach or a skilled system administrator with knowledge of coding and automation.

https://www.impressiongroup.biz/site-reliability-engineer

What Problems Do SREs Solve?

Site Reliability Engineering teams focus on safety, health, uptime, and the ability to remedy unforeseen problems

during an incident, helping devise remedies for problems until the engineering teams can make proper remediation

combating incidents, and SREs spend a good deal of time making sure the firefight doesn’t occur with their vast expertise.

By removing some of the complex burdens in how to scale and maintain uptime in distributed systems, SRE practices allow development teams to focus on feature development instead of the nuances of achieving and maintaining service level commitments.

SLAs, SLOs, and SLIs

Both DevOps and SRE teams value metrics, as you can’t improve on what you can’t measure. Indicators and measurements of how well a system is performing can be represented by one of the Service Level (SLx) commitments. There is a trio of metrics, SLAs, SLOs, and SLIs, that paint a picture of the agreement made vs the objectives and actuals to meet the agreement. With SLOs and SLIs, you can garner insight into the health of a system.

SLAs

Service Level Agreements are the commitment/agreement you make with your customers. Your customers might be internal, external, or another system. SLAs are usually crafted around customer expectations or system expectations. SLAs have been around for some time, and most engineers would consider an SLA to be “we need to reply in 2000ms or less,” which in today’s nomenclature would actually be an SLO. An SLA, in that case, would be “we require 99% uptime.”

SLOs

Service Level Objectives are goals that need to be met in order to meet SLAs. Looking at Tom Wilkie’s RED method can help you come up with good metrics for SLOs: requests, errors, and duration. In the above example of “we need to reply in 2000ms or less 99% of the time,” that would fall under duration, or the amount of time it takes to complete a request in your system. Google’s Four Golden Signals are also great metrics to have as SLOs, but also includes saturation. Measuring SLOs is the purpose of SLIs.

SLIs

Service Level Indicators measure compliance/conformance with an SLO. Harping on the “we need to reply in 2000ms or less 99% of the time” SLO from above, the SLI would be the actual measurement. Maybe 98% of requests have a reply in less than 2000ms, which is not up to the goal of the SLO. If SLOs/SLIs are being broken, time should be spent to remedy/fix issues related to the slowdowns.

https://harness.io/blog/sre-vs-devops#:~:text=In%20a%20nutshell%2C%20DevOps%20Engineers,operational%2Fscale%2Freliability%20problems.

SLA vs. SLO vs. SLI: What’s the difference?

SLAs, SLOs, and SLIs—three initialisms that represent

the promises we make to our users,

the internal objectives that help us keep those promises,

and the trackable measurements that tell us how we’re doing.

The goal of all three things is to get everybody—vendor and client alike—on the same page about system performance

How often will your systems be available?

How quickly will your team respond if the system goes down?

What kind of promises are you making about speed and functionality?

https://www.atlassian.com/incident-management/kpis/sla-vs-slo-vs-sli

Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring

Although the term commonly refers to wireless data transfer mechanisms (e.g., using radio, ultrasonic, or infrared systems), it also encompasses data transferred over other media such as a telephone or computer network, optical link or other wired communications like power line carriers. Many modern telemetry systems take advantage of the low cost and ubiquity of GSM networks by using SMS to receive and transmit telemetry data.

A telemeter is a physical device used in telemetry. It consists of a sensor, a transmission path, and a display, recording, or control device. Electronic devices are widely used in telemetry and can be wireless or hard-wired, analog or digital. Other technologies are also possible, such as mechanical, hydraulic and optical

Telemetry may be commutated to allow the transmission of multiple data streams in a fixed frame.

https://en.wikipedia.org/wiki/Telemetry

Sunday, September 2, 2018

AWS cloud

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure

https://aws.amazon.com/cloudtrail/

AWS specific tactics to automate your infrastructure

http://dumay.info/pdf/S3/5.pdf

Virtual Private Cloud (VPC)

A VPC is a set of contained subnets with a common Classless Inter-Domain Routing (CIDR) block (up to a /16 netmask) running in a single geographic area (Region) across multiple data centers (Availability Zones). A VPC is like a virtual data center, except that it’s physically spread out across Availability Zones. VPCs have network connectivity within the Region in which they are created. You can use Internet connectivity, virtual private network (VPN) connectivity, and VPC peering to connect VPCs to other networks
https://aws.amazon.com/blogs/apn/amazon-vpc-for-on-premises-network-engineers-part-one/

Scenario 2: Extend On-Premises AD DS Installation to the AWS Cloud

This scenario is for users who want to use their existing installation of AD DS and extend their on-premises network to the VPC, when a new deployment of AD DS is not an option
https://docs.aws.amazon.com/quickstart/latest/active-directory-ds/scenario-2.html

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts

https://aws.amazon.com/cloudformation/

This Quick Start automates the deployment of a Puppet master and Puppet agents from scratch, using AWS CloudFormation templates.

https://aws.amazon.com/about-aws/whats-new/2016/03/puppet-on-the-aws-cloud-quick-start-reference-deployment/

The configuration for this scenario includes a virtual private cloud (VPC) with a public subnet and a private subnet. We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren't publicly accessible

A common example is a multi-tier website, with the web servers in a public subnet and the database servers in a private subnet. You can set up security and routing so that the web servers can communicate with the database servers.
The instances in the public subnet can send outbound traffic directly to the Internet, whereas the instances in the private subnet can't.
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html

This topic explains how to use the AWS Management Console to upload one or more files or entire folders to an Amazon S3 bucket.

Before you can upload files and folders to an Amazon S3 bucket, you need write permissions for the bucket.
When you upload a file to Amazon S3, it is stored as an S3 object. Objects consist of the file data and metadata that describes the object. You can have an unlimited number of objects in a bucket.
You can upload files by dragging and dropping or by pointing and clicking. To upload folders, you must drag and drop them. Drag and drop functionality is supported only for the Chrome and Firefox browsers.
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/upload-objects.html

Running Kubernetes on AWS EC2

To create a Kubernetes cluster on AWS, you will need an Access Key ID and a Secret Access Key from AWS
conjure-up is an open-source installer for Kubernetes that creates Kubernetes clusters with native AWS integrations on Ubuntu
https://kubernetes.io/docs/setup/turnkey/aws/

There are two main ways to use Kubernetes on AWS, run it yourself on Amazon EC2 virtual machine instances, or use the Amazon EKS service

https://aws.amazon.com/kubernetes/

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS.
Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.
https://aws.amazon.com/ecs/

as explained in the Kubernetes documentation either using conjure-up, Kubernetes Operations (kops), CoreOS Tectonic or kube-aws. Out of those options I found kops extremely easier to use and its nicely designed for customizing the installation, executing upgrades and managing the Kubernetes clusters over time. In this article I will explain how to use Kubernetes Operations tool to install a Kubernetes Cluster on AWS in few minutes.

https://medium.com/containermind/how-to-create-a-kubernetes-cluster-on-aws-in-few-minutes-89dda10354f4

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.

https://aws.amazon.com/kinesis/?nc1=f_ls

Friday, August 31, 2018

scrum interview questions

What is “Agile”

It’s a set of guidelines that helps you deliver useful, valuable increments of software to customers at the end of each sprint (like two weeks). It is done by constantly re-prioritizing the backlog of requirements and reviewing incremental progress with the product owner.

What are the differences between Agile and traditional project management (Waterfall)?

Agile encourages that a little of everything, including design, development, and testing is done at the same time. Conversely, the traditional approach to projects closes and completes one phase before the next begins. Agile encourages short, frequent feedback loops and embraces changes to requirements. In Waterfall, feedback is usually not collected until the very end of the project, and changes are discouraged

When should you use Waterfall over Scrum?
Use waterfall if the requirements are simple, predictable, fully defined and understood, and will not change.

Name some other Agile frameworks.
such as Kanban, Test Driven Development, and Feature Driven Development.

What are the most important components of Agile?

Daily stand-up meetings.

CRC (Class Responsibilities and Collaborators) cards

timeboxed task boards.

TDD (Test Driven Development), Continuous Integration, regular code reviews, pair programming, automated builds, continuous deployment and delivery, etc.

You have iteration planning meetings and carry out iterative development.

How study Board can be defined in agile?

A Story Board is a visual representation of a software project’s progress. There are generally four columns ‘To do’, In Progress’, ‘Test’, and ‘Done’. Different colored post, its notes are placed in each column indicating the progress of individual development items. A story board is typically used in agile development.

What do daily stand up meetings entail?

This meeting addresses SCRUM’s three questions listed below.
– What have you completed since the last meeting?
– What do you plan to complete by the next meeting?
– What is getting in your way?

What is the Daily Stand-Up?
Every day, preferably in the morning, the team meets for no more than 15 minutes to answer three questions:
What did you do yesterday?
What do you plan on doing today?
Are there any blocks or impediments that keep you from doing your work?
This Scrum ceremony is not meant to be a status meeting for stakeholders, but a way to energize the team and get them to set focus for the day.

What is a Release candidate?

A Release candidate is a build or version of software that can be released to production.

Further, testing such as UAT may be performed on this version of the product.

What is difference between Epic, User stories & Tasks?
Epic is a group of related user stories.
User Stories define the actual business requirement. Generally created by the business owner.
Task: To accomplish the business requirements, development team create tasks.

Explain Velocity in Agile?
Velocity is a metric that is calculated by addition of all efforts estimates associated with user stories completed in one iteration. It predicts how much work Agile can complete in a sprint and how much time will it require to complete a project.

What is velocity?
Velocity is the average number of points from that past 3 – 4 sprints. It is used to help predict when backlog items will be delivered.

How the velocity of sprint is measured?
If capacity is measured as a percentage of 40 hours weeks then
completed= story points * team capacity
If capacity is measured in man hours then
completed story points / team capacity.

Have you ever tracked velocity? How would you determine a teams Sprint velocity?
Team velocity is tracked, using the number of estimated story points, over the actual completed story points.
This can be measured on the burndown chart. You can guess your team velocity overtime, from the previous sprints

Explain what is a product backlog in Scrum?
Before the scrum sprint initiates, product owner reviews the list of all new features, change requests, enhancements and bug reports and determines the priority.

What type of metrics or reports have you used?
Sprint, release burn-down and burn-up charts are standard reports

What does a scrum burn down chart comprise?
A scrum burn down chart should consist of
X-axis that displays working days
Y-axis that displays remaining effort
Ideal effort as guideline
Real progress of effort

What is a burn-down chart?
A burn-down chart displays the amount of work a team has burned through—such as hours during the sprint

What is a retrospective?
A retrospective is a meeting to inspect and adapt the process

What is Scrum Sprint?
A Scrum Sprint is a regular, repeated work cycle in scrum methodology during which work is completed and made ready for review.
Generally, scrum sprints are less than 30 days long.

Describe what happens in the Sprint planning meeting.
In Sprint planning, the Product Owner presents the goal of the sprint and discusses the high priority product backlog items. The Delivery team then chooses the amount of work for the next sprin

What are the roles in Scrum?
Scrum prescribes only three roles: the Product Owner, Scrum Master, and the Delivery Team.
These roles should ideally be cross-functional and not shared among other projects.

What is the role of the Scrum Master?
The Scrum Master serves the team and shields them from any distractions that could prevent them from completing a sprint goal.
They also remove blocks, teach the team to become self-organized and serve as a coach who teaches Agile and Scrum values and principles

How many Scrum teams have you managed at one time?
Notice the use of the word “managed” versus “led.” Scrum Masters do not manage, they lead teams.you may be required to lead more than one team

What are the artifacts of Scrum process?

Sprint backlog – The Sprint Backlog is the set of Product Backlog items selected for the Sprint, plus a plan for delivering the product Increment and realizing the Sprint Goal. The Sprint Backlog is a forecast by the Development Team about what functionality will be in the next Increment and the work needed to deliver that functionality into a “Done” Increment

Product backlog – The Product Backlog is an ordered list of everything that might be needed in the product and is the single source of requirements for any changes to be made to the product. The Product Owner is responsible for the Product Backlog, including its content, availability, and ordering.

Velocity chart- A velocity chart shows the sum of estimates of the work delivered across all iterations

Burn-down chart – It is a chart that shows how quickly you and your team are burning through your customer’s user stories. It shows the total effort against the amount of work we deliver on each iteration.

https://intellipaat.com/interview-question/agile-scrum-master-interview-questions/
https://www.glassdoor.com/Interview/us-project-manager-scrum-master-interview-questions-SRCH_IL.0,2_IN1_KO3,31.htm
https://easybacklog.com/
https://www.simplilearn.com/agile-scrum-master-interview-questions-article

What is Kanban?

Kanban is a method for managing the creation of products with an emphasis on continual delivery while not overburdening the development team. Like Scrum, Kanban is a process designed to help teams work together more effectively.
Kanban is based on 3 basic principles:

Visualize what you do today (workflow): seeing all the items in context of each other can be very informative
Limit the amount of work in progress (WIP): this helps balance the flow-based approach so teams don€™t start and commit to too much work at once
Enhance flow: when something is finished, the next highest thing from the backlog is pulled into play

https://www.versionone.com/what-is-kanban/

Scrum

Scrum (software development)

Scrum is an agile framework for managing work with an emphasis on software development.
It is designed for development teams of between three to nine members who break their work into actions that can be completed within timeboxed iterations, called sprints (30 days or less, most commonly two weeks) and track progress and re-plan in 15-minute stand-up meetings, called daily scrums

Scrum is an iterative and incremental agile software development framework for managing product development.
https://en.wikipedia.org/wiki/Scrum_(software_development)

Product Backlog: an ordered list of the work to be done in order to create, maintain and sustain a product. Managed by the Product Owner.

Sprint Planning: time-boxed event of 8 hours, or less, to start a Sprint. It serves for the Scrum Team to inspect the work from the Product Backlog that’s most valuable to be done next and design that work into Sprint backlog.

Sprint Review: time-boxed event of 4 hours, or less, to conclude the development work of a Sprint. It serves for the Scrum Team and the stakeholders to inspect the Increment of product resulting from the Sprint, assess the impact of the work performed on overall progress and update the Product backlog in order to maximize the value of the next period.

Daily Scrum: daily time-boxed event of 15 minutes, or less, for the Development Team to re-plan the next day of development work during a Sprint. Updates are reflected in the Sprint Backlog.

Sprint Retrospective: time-boxed event of 3 hours, or less, to end a Sprint. It serves for the Scrum Team to inspect the past Sprint and plan for improvements to be enacted during the next Sprint.

Sprint: time-boxed event of 30 days, or less, that serves as a container for the other Scrum events and activities. Sprints are done consecutively, without intermediate gaps.

Scrum Team: a self-organizing team consisting of a Product Owner, Development Team and Scrum Master.

Scrum Master: the role within a Scrum Team accountable for guiding, coaching, teaching and assisting a Scrum Team and its environments in a proper understanding and use of Scrum

Product Owner: the role in Scrum accountable for maximizing the value of a product, primarily by incrementally managing and expressing business and functional expectations for a product to the Development Team(s).

Burn-down Chart: a chart showing the evolution of remaining effort against time. Burn-down charts are an optional implementation within Scrum to make progress transparent.

Velocity: an optional, but often used, indication of the average amount of Product Backlog turned into an Increment of product during a Sprint by a Scrum Team, tracked by the Development Team for use within the Scrum Team

Increment: a piece of working software that adds to previously created Increments, where the sum of all Increments -as a whole - form a product.

https://www.scrum.org/scrum-glossary

scrum framework

https://s3.amazonaws.com/scrumorg-website-prod/drupal/2016-06/ScrumFramework_17x11.pdf

The Scrum Team consists of a Product Owner, the Development Team, and a Scrum Master.

Scrum Teams are self-organizing and cross-functional.
Self-organizing teams choose how best to accomplish their work, rather than being directed by others outside the team
Cross-functional teams have all competencies needed to accomplish the work without depending on others not part of the team
https://www.scrum.org/resources/what-is-scrum

Professional Scrum Developer™ (PSD) is a 3-day course where students make up an entire Scrum Team where they concurrently do requirements engineering, design, development, testing, integration and deployment within a single iteration.

Course Topics

Using Scrum
Working within a Scrum Team
Definition of Done
Development Practices
Test Driven Development
Pair Programming
Code Review
Using ALM tools with Scrum

https://www.scrum.org/courses/professional-scrum-developer-java-munchen-2017-10-18-7795

The Scrum Events

Prescribed events are used in Scrum to create regularity and to minimize the need for meetings not defined in Scrum
All events are time-boxed.
Once a Sprint begins, its duration is fixed and cannot be shortened or lengthened.
Sprint
Sprint Planning
Daily Scrum
Sprint Review
Sprint Retrospective

Scrum Artifacts
Scrum’s artifacts represent work or value to provide transparency and opportunities for inspection and adaptation.
Product Backlog
Sprint Backlog
Increment
https://www.scrum.org/resources/what-is-scrum

iceScrum is a web application for using Scrum while keeping the spirit of a collaborative workspace. It also offers virtual boards with post-its for sprint backlog, product backlog and others.

https://github.com/icescrum/iceScrum

Planning Poker® is a consensus-based estimating technique. Agile teams around the world use Planning Poker to estimate their product backlogs

To start a poker planning session, the product owner or customer reads an agile user story or describes a feature to the estimators.
Each estimator is holding a deck of Planning Poker cards with values like 0, 1, 2, 3, 5, 8, 13, 20, 40 and 100, which is the sequence we recommend. The values represent the number of story points, ideal days, or other units in which the team estimates.
The estimators discuss the feature, asking questions of the product owner as needed. When the feature has been fully discussed, each estimator privately selects one card to represent his or her estimate. All cards are then revealed at the same time.
If all estimators selected the same value, that becomes the estimate. If not, the estimators discuss their estimates. The high and low estimators should especially share their reasons. After further discussion, each estimator reselects an estimate card, and all cards are again revealed at the same time.

The poker planning process is repeated until consensus is achieved or until the estimators decide that agile estimating and planning of a particular item needs to be deferred until additional information can be acquired.

How does poker planning work with a distributed team?

Simple: go to PlanningPoker.com. A product owner, ScrumMaster or agile coach can log in and preload a set of items to be estimated. A private URL can then be shared with estimators who log in and join a conference call or Skype session. Agile estimating and planning then proceeds as it would in person.

https://www.mountaingoatsoftware.com/agile/planning-poker

What’s a Spike?

Sometimes a story is too large or overly complex. Perhaps the implementation or a 3rd party tool or library is poorly understood. The team can’t estimate the story. Perhaps we’re unsure if we’ll be able to complete the story due to some potential blocker.
In these cases, we might want to build a functional or technical experiment to figure it out. We might want to look into something for a day. We might want to look up alternatives. Do some googling. Do an experiment with some other library or software package. Consider alternative refactoring paths.
In these cases, we might want to build a functional or technical experiment to figure it out. We might want to look into something for a day. We might want to look up alternatives. Do some googling. Do an experiment with some other library or software package. Consider alternative refactoring paths.
https://www.leadingagile.com/2016/09/whats-a-spike-who-should-enter-it-how-to-word-it/

The first sprint

Now that the prep work?whether in the form of a DAD Inception phase, a "sprint zero," or a "project before the project"?has been completed, it's time for that first sprint.
Your first agile sprint is a baseline and getting everything "right" isn't as important as getting the team to understand the general spirit of agile. The iterations are short so the team should be able to quickly gather feedback and continue to adapt and improve over time.

https://techbeacon.com/your-first-agile-sprint-survival-guide

Weighted Shortest Job First is a technique for a) assigning a weight, or value, to each job, and then b) dividing that by the length of the job, in order to c) determine a relative ranking

https://techbeacon.com/prioritize-your-backlog-weighted-shortest-job-first-wsjf-improved-roi

Weighted Shortest Job First (WSJF) is a technique for backlog prioritization recommended by Dean Leffingwell. The calculation involves measures for User Value, Time Value, Risk Reduction / Opportunity Enablement Value, and Job Size::

WSJF = (User Value + Time Value + RROE Value) / Job Size
https://www.scrum.org/forum/scrum-forum/5509/weighted-shortest-job-first

Velocity is calculated at the end of the Sprint by totaling the Points for all fully completed User Stories.

https://www.scruminc.com/velocity/

Large-Scale Scrum (LeSS)

a lightweight (agile) framework for scaling Scrum to more than one team.
LeSS consists of the LeSS Principles, the Framework, the Guides and a set of experiments. The LeSS framework is divided into two frameworks: basic LeSS for 2-8 teams and LeSS Huge for 8+ teams.
https://www.agilealliance.org/resources/sessions/introduction-to-large-scale-scrum-less/

Scaling Scrum starts with understanding standard one-team Scrum. From that point, your organization must be able to understand and adopt LeSS, which requires examining the purpose of one-team Scrum elements and figuring out how to reach the same purpose while staying within the constraints of the standard Scrum rules.

LeSS provides two different large-scale Scrum frameworks.
The two frameworks – which are basically single-team Scrum scaled up – are:
LeSS: Up to eight teams (of eight people each).
LeSS Huge: Up to a few thousand people on one product.

In LeSS, you will find:
a single Product Backlog (because it’s for a product, not a team),
one Definition of Done for all teams,
one Potentially Shippable Product Increment at the end of each Sprint,
one Product Owner,
many complete, cross-functional teams (with no single-specialist teams),
one Sprint
https://less.works/less/framework/index.html

Scaled Agile Framework® (SAFe®) empowers complex organizations to achieve the benefits of Lean-Agile software and systems development at scale.
SAFe is designed to help businesses continuously and more efficiently deliver value on a regular and predictable schedule. It provides a knowledge base of proven, integrated principles and practices to support enterprise agility.

https://www.scaledagileframework.com/

The Scaled Agile Framework (abbreviated as SAFe), is a set of organization and workflow patterns intended to guide enterprises in scaling lean and agile practices.
https://en.wikipedia.org/wiki/Scaled_agile_framework

In traditional scaling frameworks, specific practices (e.g. daily standups) are how the framework is executed, whereas the Spotify model focuses on how businesses can structure an organization to enable agility.

Key elements of the Spotify model

Squads

Similar to a scrum team, Squads are cross-functional, autonomous teams (typically 6-12 individuals) that focus on one feature area. Each Squad has a unique mission that guides the work they do, an agile coach for support, and a product owner for guidance. Squads determine which agile methodology/framework will be used.

Tribes

When multiple Squads coordinate within each other on the same feature area, they form a Tribe. Tribes help build alignment across Squads and typically consist of 40 - 150 people in order to maintain alignment.Each Tribe has a Tribe Lead who is responsible for helping coordinate across Squads and for encouraging collaboration.

Chapter

Even though Squads are autonomous, it’s important that specialists (e.g. Javascript Developer, DBAs) align on best practices. Chapters are the family that each specialist has, helping to keep engineering standards in place across a discipline. Chapters are typically led by a senior technology lead, who may also be the manager for the team members in that Chapte

Guild

Team members who are passionate about a topic can form a Guild, which essentially is a community of interest. Anyone can join a Guild and they are completely voluntary. Whereas Chapters belong to a Tribe, Guilds can cross different Tribes. There is no formal leader of a Guild. Rather, someone raises their hand to be the Guild Coordinator and help bring people together.

Trio

The Trio (aka TPD Trio) is a combination of a Tribe Lead, product lead, and design lead. Each Tribe has a Trio in place to ensure there is continuous alignment between these three perspectives when working on features areas.

Alliance

As organizations scale, sometimes multiple Tribes need to closely work together to accomplish a goal. Alliances are a combination of Tribe Trios (typically three or more) that work together to help their Tribes collaborate on a goal that is bigger than any one Tribe.

Squads may have ceremonies like sprint planning and retrospectives, but the focus of the Spotify model is on how teams organize around work. It’s up to Squads to figure out the best way to get the job done.

The benefits of the Spotify model

Less formal process and ceremony

More self-management and autonomy

The Spotify model focuses on decentralizing decision making and transferring that responsibility to Squads, Tribes, Chapters, and Guilds.

The challenges of the Spotify model

Some organizations experienced more success than others, but it’s likely no organization experienced the same success as Spotify. The reason? Like any way of working, an organization's current culture and structure need to be taken into account. The model is simple, but the environment it's implemented in is complex.

To some, it may seem like a simple matrix organizational structure where people report to a functional area (Chapter), but work with a cross-functional team (Squad).

Although it may look like a matrix organization, the key cultural elements of the model need to be in place to allow the structure to thrive, such as trust and autonomy.

If an organization doesn’t shift its behaviors (and ultimately its culture), the benefits of the Spotify model will never be realized.

Spotify model best practices

Don’t copy the model

Seek to understand the structure, practices, and mindset behind Spotify’s approach. With that understanding, tweak the aspects of the model to fit your own environment. Your goal is not to be Spotify, but to leverage their model to improve how your organization works together.

Autonomy and trust is key

Allowing teams to pick their own development tools and modify another team's code are just some examples. Within your organization, determine if there are decisions that can be pushed to the teams instead of being mandated by parts of the organization that are disconnected from the day-to-day work.

Transparency with community

Establish your first Guild around the Spotify model adoption and encourage participation from everyone in the organization. Build trust by creating transparent, inclusive ways to gather feedback, and gain alignment on how your organization wants to work in the future.

Encourage mistakes

Spotify doesn’t leverage the original implementation of the Spotify model anymore; they evolved and adapted the model to fit their changing organization. Trios and Alliances are actually newer elements in Spotify as they were brought about to solve new problems the organization faced as it grew larger.

https://www.atlassian.com/agile/agile-at-scale/spotify

Dunbar's number is a suggested cognitive limit to the number of people with whom one can maintain stable social relationships—relationships in which an individual knows who each person is and how each person relates to every other person

https://en.wikipedia.org/wiki/Dunbar%27s_number

Scaled Agile Framework (SAFe):

The current version is SAFe 5.1, which got revised in February 2021. It has four constructs — Essential SAFe, Large Solution SAFe, Portfolio SAFe, and Full SAFe. It would be best if you had either portfolio or full SAFe to achieve business agility through SAFe. The SAFe structure grows when you move from one level to another by adding new roles and responsibilities.

Large-Scale Scrum (LeSS):

It developed by taking Scrum and trying many different experiments to discover what works. In 2013, the experiments were solidified into the LeSS framework rules.

Opposite to SAFe, LeSS talks about descaling Structure and organizational complexity to enable simplicity at scale.

Spotify Model Framework:

Spotify model talks about Squad, tribe, chapter, and gild, where the squad is a cross-functional team designed based on the customer journey.

The tribe is a collection of squads, and the chapter is more like a community of practices. However, the definition of chapter and tribe has changed a lot lately, where it gives a feeling of a matrix organization. The Spotify model doesn’t have any rules or guidelines publicly available like SAFe and LeSS. It gives many organizations a quick win because of multiple factors, including renaming the existing team as a squad. In the Spotify model, teams usages LeSS, SAFe, Scrum, or Kanban at the squad or tribe level.

Nexus Framework

Nexus is similar to LeSS from outside minus nexus integration team. It is a framework build upon Scrum but keeps Scrum untouched. If multiple teams work on the same product and experience coordination chaos, then Nexus may be helpful. A Nexus is a group of approximately three to nine Scrum Teams who work together to deliver a single product; it connects people and things. A Nexus has a single Product Owner who manages a single Product Backlog from which the Scrum Teams work. Unlike LeSS, Nexus doesn’t come up with its principles and uses the Scrum framework’s principles.

Look at some simple parameters to decide:

Is it about product development or service delivery?

Are you talk about program/portfolio or products?

How difficult to change the core system?

Are you a multi-site, distributed team?

Do you have challenges with alignments?

Is it a coordination problem because of silos?

What’s your priority? Innovation or Improvement?

Are you running some digital transformation?

https://agilemania.com/safe-vs-less-vs-spotify-or-nexus-to-be-agile/

Assessment of 6 approaches to scale Scrum

LeSS stays closest to Scrum’s purpose

Scrum@Scale is Scrum scaled for the whole organisation

Spotify Engineering Culture may be a best fit for component teams

SAFe puts the purpose of Scrum under heavy pressure due to it’s top-down approach, additional layers and as a result additional roles, events and artifacts.

Spotify allows for a lot of autonomy for the teams to work according to Scrum. The focus on small decoupled systems however is a different perspective than Scrum’s perspective of optimizing the value of a product. Spotify can be your scaling approach of choice if you don’t have the concerns of systems vs product.

https://medium.com/serious-scrum/assessment-of-6-approaches-to-scale-scrum-46319fcbca1a

SAFe® vs. Scrum@Scale

The Scaled Agile Framework

It is essential to note that SAFe® is intended to accommodate DevOps, a process frequently deemed for future-proof Agile organizations.

It is most desirable for large organizations to retain as much organizational and process structure as possible while reaping the advantages of a decentralized Agile method.

SAFe® is not as efficiently customizable as Scrum at Scale

Scrum@Scale

The Scrum@Scale structure is easily manageable but hard to master. It is made up of 2 cycles, the Scrum Master cycle and Product Owner cycle, and 12 components necessary to execute Scrum at scale.

SAFe® vs. Large-Scale Scrum (LeSS)

Scaled Agile Framework

Benefits of SAFe®

SAFe® separates business strategies into actions, then features, then stories of work at the team level, and maps the pathway.

Drawbacks of SAFe®

The implementation pathway is required to be tweaked to meet the requirements of your organization

Large Scale Scrum (LeSS)

A necessary characteristic of LeSS comprises redirecting team awareness over the entire organization.

Limitations of LeSS

Scaling only works for an organization which has an extensive Scrum foundation

LeSS is formed around Scrum and is not a straightforward extension of other methodologies.

A single Product Owner may attempt to control multiple teams

SAFe® vs Spotify

Spotify: a lightweight framework

Unlike SAFe®, the Spotify model is not considered an extensive toolbox. The Spotify model contributes a relatively lightweight framework that stresses the necessity to generate many interactions to limit the silo side formed by teams.

It would be essential to determine how every team must work in a standard way or have 100% freedom provided to the groups. Moreover, the Spotify model doesn’t deliver any solution to control the Portfolio like SAFe®.

SAFe® a heavy framework

Unlike the Spotify model, everything is already fixed, and only an expert can be expected to have the imagination to improve it.

Challenges of scaling agile principles and practices

The Scaled Agile Framework discusses the obstacles encountered when scaling agile beyond a single team.

https://www.knowledgehut.com/blog/agile/safe-vs-scrum-scale-vs-less-vs-spotify

scrum team becomes squad team

scrum is an agile development approach

agile > scrum

scrume master becomes agile coach

servant leader > process master

autonomous squad

cross-functional squad

self-organizing squad

# members < 8

autonomy is what to build

autonomy is how to build

autonomy is how to work together to build

loosely coupled squad

tightly aligned squad

alignment vs autonomy graph sections 1,2,3,4 detailed

section 1 - low alignment low autonomy

section 1 - micro management culture

section 1 - shut up and follow orders

section 1 - no high level purpose

section 2 - high alignment low autonomy

section 2 - leaders are good at what problems need to be solved

section 2 - leaders tell people how to solve problem

section 3 - high alignment high autonomy

section 3 - leaders focus on what problem is solved

section 3 - leaders let the team figure out how to solve the problem

section 1 - low alignment high autonomy

section 1 - teams do whatever they want

section 2 tells how to solve, section 3 lets people decide on how to solve

alignment enables autonomy

leader's job is to communicate what problem needs to be solved and why.

tribe is a group of squads

team member is squad and chapter member

guild - communicating via mailing list

infra squad

client app squad

feature squad

infra squad is on CD, operations, tools for squad, monitoring etc

self-service model

infra enables teams serve themselves

feature toggle is unfinished hidden feature

feature toggle is for A/B testing, integration testing

unmerged code hides problems

feature toggle reduces the need for code branches

fear kills trust and innovation

fail fast - learn fast - improve fast - long term strategy

driven from below, supported from above - continuous improvement

limited blast radius via decoupled architecture

lean startup - idea/problem - narrative + prototypes - build Minimum Viable Product - Release

toyota improvement kata

This practice of practicing a routine until it became a refined habit now extends beyond martial arts and into business.

the Toyota Motor Corporation developed continuous improvement through practiced patterns.

This technique, called Improvement Kata

Improvement Kata is a method where team leaders and members continually practice a kata routine that develops and channels their abilities to problem-solve. Over time, the practices become second nature.

It seeks to do so with a four-part model:

Understand the direction or challenge

Grasp the current condition

Define the target destination

Move toward the target iteratively, which reveals obstacles to overcome

These techniques are particularly helpful when the route to a destination is unclear, as experimentation can help you better understand the problem and find unique solutions.

What are the benefits of Improvement Kata?

Teams aligned toward a single goal

Experimentation leads to results

Reduce waste significantly

What are some examples of Improvement Kata?

Say you want to build a new service based on an idea, but you’re not sure whether it will work. Rather than trying to get every layer perfectly worked out and incrementally expanding it until it’s feature-complete, try picking a target that delivers some value and brings you closer to the system you envisioned. You’ll probably have lots of unknowns, but you can learn a lot from challenges, and experiment with different ideas to find one that works. Once it does, reevaluate where you are, pick your next target, iterate, and reflect on your progress.

What are the differences between Improvement Kata and lean?

Kata and lean are different, yet compliment each other. Kata and Lean are different in many ways. Lean refers to processes to be implemented while kata refers to techniques to be practiced. Thus, kata became a mainstream business practice when Toyota adopted it into its lean production system. When combined into a unified approach, these concepts provide powerful results.

https://www.atlassian.com/agile/agile-at-scale/using-improvement-kata-to-support-lean