Openstack Lockout Recovery

Story time

The perfect storm culminated in a dire rescue operation.

Some ansible managed machines in a hyper secure cloud environment based on OpenStack. No problem, I’m a partner at OVH and manage a lot of infrastructure there for several clients. OpenStack internals and APIs are things I’m intimately familiar with.

Fire off a playbook to update the machines and everything looks good. Expected updates marked as changes, no anomalies. Reboot time as Ubuntu has has once again updated base-files. Wait… the machines are up and the applications are running, however ssh access is no where to be found. No sweat yet, we’ll just console in and figure out what went awry.

VPN Woes

Load up the VPN client… credentials expired. Alright, we’ll open a ticket for that.

Too many hours later we get a tech on the line and he extends our vpn credentials. We’re back in action and ready to solve our ssh problem.

Console nightmare

Login to openstack, launch the console, painstakingly type 64 character passwords in a race against the login timeout. Eventually, muscle memory kicks in and I can succeed in typing the passwords before the console resets to the username prompt… but no credentials are working.

At this point, I know what the root problem is and how to fix it. I just need access.

Single user attempt

Reboot the machines to init=/bin/bash them. Nope, disabled on cloud based ubuntu images.

Time to reset the password and make it something reasonable to type into a console without copy/paste ability.

OpenStack password

OpenStack has the ability to recover the password via the web interface. That’s been disabled. No worries, we can also use the API to reset it.

openstack server set --root-password <instance>

Disabled too.

Start sweating

Sweat starting to form on the brows.

Get creative

We can generate an image of the existing instance, launch it with the --password option.

openstack backup create instance1
openstack server create --image instance1 --password test \
--flavor 12C64R128G --key-name=vince newinstance

Error, --password disabled.

The sleepies

It’s now 5am. I’m tired, brain fogged, frustrated, and really… a liability. Fire off a mail to tech support asking them if they can enable password injection for us temporarily followed promptly by sleep.

Negative ghostrider

I awake to discover that as expected, provider is unable to enable password injection.

At this point we have exhausted all avenues of access.

Pooched… or maybe not

We have our instances that are completely inaccessible administratively, a backup image generated from them, the cloud provider that cannot assist us by enabling OpenStack features…

Wait. I noticed on the instance launch screen a Post-Creation tab that will execute cloud-init directives.

cloud-init injection

Let’s craft a cloud-init config that injects a user with sudo privileges and launch an instance with it. This was performed via the GUI, but could have easily been performed with a --user-data argument via the openstack api.

  1. New instance based off the backup we took earlier
  2. Post-creation direct input the following cloud-init config to create our tommy test user. The hash below is is the password test. You can generate your own password with mkpasswd --method=SHA-512, but we’re on a console here!
# Add groups to the system
# The following example adds the ubuntu group with members 'root' and 'sys'
# and the empty group cloud-users.
  - tommy: [sudo]
  - cloud-users

# Add users to the system. Users are added after groups are added.
# Note: Most of these configuration options will not be honored if the user
#       already exists. Following options are the exceptions and they are
#       applicable on already-existing users:
#       - 'plain_text_passwd', 'hashed_passwd', 'lock_passwd', 'sudo',
#         'ssh_authorized_keys', 'ssh_redirect_user'.
  - default
  - name: tommy
    gecos: tommy testing
    primary_group: tommy
    groups: users, sudo
    lock_passwd: false
    passwd: $6$KIsw4r/gdE$OpQgMUEPYkDhKBMUKsoPds9neFkHchA7xiGTUusuZgUWUcwZclNjY0pJVc2zA.C/lA1FM57VHA2K6eLEzVEyw/
  1. Launch instance!
  2. Load up console
  3. Tommy comes through for us, we’re in.

The reveal

One of the expected changes from the ansible play was the addition of a default deny firewall rule. There were exceptions for SSH, however, they specified the interface of eth0 explicitly. OpenStack deployments use the ensX interface convention. That’s right, 2 characters created this whole mess.

Time to remove the superflous interface from the iptables rules, and reset all the user passwords.

Image and reimage.

Now that we have an instance deployed and accesible, we should be able to simply redirect traffic to this node instead. That would work on any other provider, however, it would require a ticket and several network changes here. We need to retain our original IP to avoid all of that.

openstack backup create newinstance

Now that we have an image of the fixed instance, we can reimage the original instances with this correct image retaining the original IPs.

Delete that tommy user too.

Console passwords

Why didn’t the passwords work? I haven’t tracked this down yet, but I was passing crypted hashes via the user module. I’ll update this once I get to the root of that, for now, some ansible 2.10.2 output which differs from previous versions of ansible.

changed: [instance1] => (item={'key': 'vince', 'value': {'shell': '/bin/bash', 'create_home': True, 'groups': 'sudo', 'uid': 1031, 'state': 'present', 'update_password': 'always', 'password': '<generator object _wrap_sequence.<locals>.<genexpr> at 0x7781a4f30c78>'}})

Illustration of Vince

Vince Hillier is the President and Founder of Revenni Inc. He is an opensource advocate specializing in system engineering and infrastructure. Outside of building solid infrastructure that doesn't break the bank, he's interested in information security, privacy, and performance.