Story time
The perfect storm culminated in a dire rescue operation.
Some ansible managed machines in a hyper secure cloud environment based on OpenStack. No problem, I’m a partner at OVH and manage a lot of infrastructure there for several clients. OpenStack internals and APIs are things I’m intimately familiar with.
Fire off a playbook to update the machines and everything looks good. Expected updates marked as changes, no anomalies. Reboot time as Ubuntu has has once again updated base-files. Wait… the machines are up and the applications are running, however ssh access is no where to be found. No sweat yet, we’ll just console in and figure out what went awry.
VPN Woes
Load up the VPN client… credentials expired. Alright, we’ll open a ticket for that.
Too many hours later we get a tech on the line and he extends our vpn credentials. We’re back in action and ready to solve our ssh problem.
Console nightmare
Login to openstack, launch the console, painstakingly type 64 character passwords in a race against the login timeout. Eventually, muscle memory kicks in and I can succeed in typing the passwords before the console resets to the username prompt… but no credentials are working.
At this point, I know what the root problem is and how to fix it. I just need access.
Single user attempt
Reboot the machines to init=/bin/bash them. Nope, disabled on cloud based ubuntu images.
Time to reset the password and make it something reasonable to type into a console without copy/paste ability.
OpenStack password
OpenStack has the ability to recover the password via the web interface. That’s been disabled. No worries, we can also use the API to reset it.
openstack server set --root-password <instance>
Disabled too.
Start sweating
Sweat starting to form on the brows.
Get creative
We can generate an image of the existing instance, launch it with the --password
option.
openstack backup create instance1
openstack server create --image instance1 --password test \
--flavor 12C64R128G --key-name=vince newinstance
Error, --password
disabled.
The sleepies
It’s now 5am. I’m tired, brain fogged, frustrated, and really… a liability. Fire off a mail to tech support asking them if they can enable password injection for us temporarily followed promptly by sleep.
Negative ghostrider
I awake to discover that as expected, provider is unable to enable password injection.
At this point we have exhausted all avenues of access.
Pooched… or maybe not
We have our instances that are completely inaccessible administratively, a backup image generated from them, the cloud provider that cannot assist us by enabling OpenStack features…
Wait. I noticed on the instance launch screen a Post-Creation tab that will execute cloud-init directives.
cloud-init injection
Let’s craft a cloud-init config that injects a user with sudo privileges and launch an instance with it. This was performed via the GUI, but could have easily been performed with a --user-data
argument via the openstack api.
- New instance based off the backup we took earlier
- Post-creation direct input the following cloud-init config to create our tommy test user. The hash below is is the password
test
. You can generate your own password withmkpasswd --method=SHA-512
, but we’re on a console here!
#cloud-config
# Add groups to the system
# The following example adds the ubuntu group with members 'root' and 'sys'
# and the empty group cloud-users.
groups:
- tommy: [sudo]
- cloud-users
# Add users to the system. Users are added after groups are added.
# Note: Most of these configuration options will not be honored if the user
# already exists. Following options are the exceptions and they are
# applicable on already-existing users:
# - 'plain_text_passwd', 'hashed_passwd', 'lock_passwd', 'sudo',
# 'ssh_authorized_keys', 'ssh_redirect_user'.
users:
- default
- name: tommy
gecos: tommy testing
primary_group: tommy
groups: users, sudo
lock_passwd: false
passwd: $6$KIsw4r/gdE$OpQgMUEPYkDhKBMUKsoPds9neFkHchA7xiGTUusuZgUWUcwZclNjY0pJVc2zA.C/lA1FM57VHA2K6eLEzVEyw/
- Launch instance!
- Load up console
- Tommy comes through for us, we’re in.
The reveal
One of the expected changes from the ansible play was the addition of a default deny firewall rule. There were exceptions for SSH, however, they specified the interface of eth0 explicitly. OpenStack deployments use the ensX interface convention. That’s right, 2 characters created this whole mess.
Time to remove the superflous interface from the iptables rules, and reset all the user passwords.
Image and reimage.
Now that we have an instance deployed and accesible, we should be able to simply redirect traffic to this node instead. That would work on any other provider, however, it would require a ticket and several network changes here. We need to retain our original IP to avoid all of that.
openstack backup create newinstance
Now that we have an image of the fixed instance, we can reimage the original instances with this correct image retaining the original IPs.
Delete that tommy user too.
Console passwords
Why didn’t the passwords work? I haven’t tracked this down yet, but I was passing crypted hashes via the user module. I’ll update this once I get to the root of that, for now, some ansible 2.10.2 output which differs from previous versions of ansible.
changed: [instance1] => (item={'key': 'vince', 'value': {'shell': '/bin/bash', 'create_home': True, 'groups': 'sudo', 'uid': 1031, 'state': 'present', 'update_password': 'always', 'password': '<generator object _wrap_sequence.<locals>.<genexpr> at 0x7781a4f30c78>'}})