Text

Lonely Planet is hiring devs, ops and more!

Lonely Planet has always been a great place to work. The London office is in need of programmer’s of various languages, DBA’s skilled in MS SQL, Postgres and AWS and great Linux engineer’s that are excited about the future of operations. Sounds like you, or someone you know? Contact me!

Do’s and Don’ts

  • You don’t have to have a degree.

  • You do have to be in London.

  • You do have to have enthusiasm.

  • You do want to enable people to make a great product.

  • You do think speed is a requirement.

  • You do think metrics are important.

Text

EC2 and Ephemeral Disks

Every EC2 instance has a number of local disks available to use instead of relying on the typically unpredictable EBS storage. For temporary useage, swap, RAID or even a database backend, this is a far better storage option. Each instance size offers a different amount of ephemeral storage, outlined in this table.

To enable the disks, when booting an instance using the CLI tools, pass the -b option:

ec2-run-instances -t m1.large ami-a29943cb -f default -k admin -b ‘/dev/sdb=ephemeral0’ -b ‘/dev/sdc=ephemeral1’

Text

Airbnb SRE Challenge

Recently, I interviewed with Airbnb for a Site Reliability Engineer role. Airbnb are a really interesting and exciting company with one of the most beautifully designed websites I’ve ever seen. I’ve been aware of them for quite some time as I currently work for Lonely Planet and also travel quite often. Part of the interview process was a technical challenge. This is the writeup I submitted along with my solution.

Challenge Description

You received a hostname and a username/password for an EC2 box. Make sure the credentials work. The user can sudo.

Please install java, git and maven on the host if they’re not already installed. (Completed with Chef.)

Build a script that can start / stop / restart the service as a background process on the provided machine. (Runit, via Chef)

Capture the sysout & syserr from the process while it’s running, and redirect them to Syslog. (Runit logger and rsyslog file monitoring)

Build a tool that periodically checks the service for its health. If a health check fails, the tool should trigger an alert via appropriate channels (e.g. email). (Runit, Nagios)

Make sure the service endpoint returns the expected response. (Nagios)

Parse this page and trigger an alert if:

  • Daemon thread count > 10
  • de.leibert.ExampleResource/p999 > 5ms
  • percent-4xx-15m > 0.4

Bonus points: everything is configurable and it takes very little time to add monitoring for new metrics (Chef, Nagios, Graphite)

Nagios will work, but it’s not the right choice. Ideally I wanted to implement the http://metrics.codahale.com/manual/graphite/ module in to the webapp. Unfortunately, I’m not skilled in Scala. This would be my solution, given access to a dev.

Choice of tools and design

I used Nagios for the monitoring and alerting framework as it’s very easy to add additional checks. Graphite interfaces well with the metrics library in use and allows easy creation of new graphs. Ideally, Graphite would be installed on a separate server. As the application is further developed, I’d encourage the developers to implement statsd or similar and output application metrics to Graphite too.

I opted for a simple chef-solo solution for general server configuration as I like to represent infrastructure as code. When scaling this infrastructure, a chef-client and a dedicated centralised chef-server setup would be preferred. Not all software was installed via Chef, as the ‘solo’ solution doesn’t support some key features and I felt the full server install was too bulky for this project.

To run Chef, after making changes to a cookbook:

cd ~/chef-repo sudo chef-solo -j node.json -c solo.rb

Potential problems

The major potential problem with this system is a severe lack of redundancy due to the limitation of one server.

At a minimum, I’d implement an Amazon ELB and a second app server. EC2 Auto-scaling would be a quick win too.

As there is no sessions or databases, caching is very straight forward so I’d quickly add a Varnish layer.

I’d ideally centralise logging to a syslog server to allow correlation of data across nodes and layers of the web stack. Splunk or Loggly would be great solutions for this which also minimise operational overhead.

Location of code and configuration

A copy of all relevant configuration files and code is available in /home/ubuntu/MarkBarger.tar

Video
Tags: linux ubuntu
Link

Hoping this saves us.

Tags: linux
Link
Photo
nicklovin703:

(via An Update is Available for Your Computer)
Tags: linux
Link

Registration is now open.

Tags: linux tech
Link

I need to end up there, for a while.

Tags: linux tech
Text

Ode to netcat

nc -z, you make me happy.

Tags: linux tech
Text

Persistent environment variables when using sudo

If you’re exporting HTTP_PROXY or other environment variables via /etc/profile.d/ or another login script and then execute a command such as ‘sudo curl’, your command won’t respect the environment variable unless you’ve added HTTP_PROXY to env_keep in /etc/sudoers.

Tags: linux tech