Monday, June 22, 2020

SLx(SLI, SLO, SLA, SLT) and Toil in a nutshell

These terms coined by Google's SRE principles

  • SLx (SL"I/O/A/T")
  • SLI - Service Level Indicator
  • SLO - Service Level Objective
  • SLA - Service Level Agreement
  • SLT - Service Level Targets
  • Error Budget
  • Toil/Toil Budget

SLx

SLx can be drawn with some good monitoring tools and could easily be depicted with some dashboards e.g. Grafana, Newrelic.

SLI (Service Level Indicator)

SLI are quantative measurements.
  • Request latency
  • Batch throughput
  • Failures per request
Example: 95th percentile latency of homepage requests over past 5 minutes < 300ms

SLO ( Service Level Objective) 

Binding target for a collection of SLIs.

Example: 95th percentile homepage SLI will succeed 99.9% over trailing year

SLA (Service Level Agreement)

Business agreement between a customer and service provider typically based on SLOs.

Example: Service credits if 95th percentile homepage SLI succeeds less than 99.5% over trailing year.

SLT (Service Level Targets)

A service level target is a key element of a SLA between you as a service provider and an end user customer. 
SLT measure your performance as a service provider and are designed to avoid disputes between the two parties based on misunderstanding.

Error budget

A rate at which the SLOs can be missed — and track that on a daily or weekly basis.

The main advantage of error budget is that it's a quantitative measurement that's shared between the product and SRE teams, which means that we can balance Innovation(feature rolllouts) and Stability(Reliability) to an appropriate level.

Toil

Toil is the kind of work tied to running a production service that tends to be:
  • Manual
  • Repetitive
  • Automatable
  • Tactical
  • No enduring value
  • and that scales linearly as a service grows

Friday, June 8, 2018

Q&A: any impact on active connections if we change the security/cipher policy of AWS ELB

During the SSL connection negotiation process, the client and the load balancer present a list of ciphers and protocols that they each support, in order of preference. Once these are selected for a connection, they will not change for that connection even if the SSL Security Policies are changed in the load balancer. To rephrase, any new connections after the SSL Policy change will be serviced by the new configuration and all existing connections will continue to be serviced by as per the old policy.


However, please note that for an Application Load Balancer, connections that are kept open for longer than 60 minutes will be forcibly terminated during SSL related configuration changes.

And yes, once the ELB is removed, all the custom policies are also removed.

Wednesday, May 17, 2017

chef: bootstrap windows servers.

NonCloud Machines

Prerequisites

  1.  If the user is not administrator then grant it root access. Execute 

cmd> reg add HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\policies\system /v EnableLUA /t REG_DWORD  /d 0 /f
cmd> shutdown -r

      2.   Allow winrm to execute commands remotely. Execute 

CALL winrm quickconfig -q
CALL winrm set winrm/config/winrs @{MaxMemoryPerShellMB="1024"}
CALL winrm set winrm/config @{MaxTimeoutms="1800000"}
CALL winrm set winrm/config/service @{AllowUnencrypted="true"}
CALL winrm set winrm/config/service/auth @{Basic="true"}
CALL netsh advfirewall firewall add rule name="WinRM 5985" protocol=TCP dir=in localport=5985 action=allow
CALL netsh advfirewall firewall add rule name="WinRM 5986" protocol=TCP dir=in localport=5986 action=allow
CALL net stop winrm
CALL net start winrm

Bootstrap standalone machine



         cmd>  knife bootstrap windows winrm someserver.somedomain.com -x 'somedomain\someuser' -P somePass -E tron-v2_stage -r 'recipe[somecookbook]'


AWS machine

  1. Add below content to cloud formation "User Data": 

"UserData": { "Fn::Base64": { "Fn::Join": ["", [ "<script>","\n", "netsh advfirewall set allprofiles state off","\n", "REM Executing winrm commands","\n", "CALL winrm quickconfig -q","\n", "CALL winrm set winrm/config/winrs @{MaxMemoryPerShellMB=\"300\"}","\n", "CALL winrm set winrm/config @{MaxTimeoutms=\"1800000\"}","\n", "CALL winrm set winrm/config/service @{AllowUnencrypted=\"true\"}","\n", "CALL winrm set winrm/config/service/auth @{Basic=\"true\"}","\n", "CALL net stop winrm","\n", "CALL net start winrm","\n", "REM Chef 11 install","\n", "powershell curl -OutFile 'C:/chef-client.msi' https://opscode-omnibus-packages.s3.amazonaws.com/windows/2008r2/x86_64/chef-client-12.18.12-1.msi","\n", "msiexec.exe /passive /i C:\\chef-client.msi","\n", "set PATH=%PATH%;C:\\opscode\\chef\\embedded\\bin;C:\\opscode\\chef\\bin","\n", "powershell curl -OutFile 'C:/awscli.msi' https://s3.amazonaws.com/aws-cli/AWSCLI64.msi","\n", "msiexec.exe /passive /i C:\\awscli.msi","\n", "set PATH=%PATH%;C:\\Program Files\\Amazon\\AWSCLI\\","\n", "CALL aws s3 cp s3://somebucket//validator.pem C:\\chef\\validation.pem","\n", "CALL aws s3 cp s3://somebucket//client.rb C:\\chef\\client.rb","\n", "chef-client -S https://chefserver.domain.com/organizations/someorg -E some_env -r 'recipe[somecookbook]'","\n", "</script>","\n" ]] } }
Above has formatting issues, so use below for reference. 

Saturday, April 15, 2017

aws: boto3: volume snapshot and ami backup plus retention

About
The idea is just to automate backup of ami and volume snapshot and handle the retention from instance tags

Scripts
Follow the scripts on gitlab:
     https://gitlab.com/vickeyrihal/aws_snapshot_and_ami_retention