EC2 – Bundling Images

The image bundling process (ec2-bundle-image from the Amazon EC2 AMI Tools) is the first step in creating an amazon machine image (AMI). This process creates an XML manifest and (one or more) image part(s). The image parts are created by compressing, encrypting and then dividing the original image.

For the record, there is an upper limit on the size of an image that can be turned into an AMI (10GB, I believe?)

This is a description of the bundling process as I understand it. Be warned that my understanding is wrong. :( While writing this post I have discovered why my bundled manifests were not working. Sort of.

In the following discussion the file input to the bundling process is mymirage.img.

Input Parameters


  • the image! (mymirage.img)
  • user’s private key
  • user’s certificate


  • ec2 certificate

Encrypting the original image

ec2-bundle-image runs this command:

$ openssl sha1 < /tmp/ec2-bundle-image-digest-pipe-10174 & tar -c -h -S --owner 0 --group 0 -C /tmp mymirage.img | tee /tmp/ec2-bundle-image-digest-pipe-10174 | gzip -9 | openssl enc -e -aes-128-cbc -K aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -iv bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb > ec2_tmp/mymirage.img.tar.gz.enc

The image in tarred, gzipped and then encrypted (AES-CBC with a 128-bit key). The result is stored in mymirage.img.tar.gz.enc. A digest (sha1) of this file is calculated simultaneously. Amazon will attempt to recreate this when launching the VM, terminating the instance if it cannot.

A note on the key & iv

The key and iv are generated using gensymkey

# Load and generate necessary keys.
key = Format::bin2hex( Crypto::gensymkey )
iv = Format::bin2hex( Crypto::gensymkey )

despite the fact that Amazon implements a geniv function. But why?

# Generate an initialization vector suitable use with symmetric cipher.
def Crypto.geniv


# Generate a key suitable for use with a symmetric cipher.
def Crypto.gensymkey

The key and iv both are encrypted twice, once with the user’s X.509 certificate and once with amazon’s (the optional ec2 certificate mentioned earlier). These encryptions are included in the manifest.

Dividing the image into parts

The encryption result mymirage.img.tar.gz.enc is divided into parts of 10MB, resulting in the files

$ ls
# etc

I have yet to implement this. My test mirage images are about 5MB so I’ve just been copying the contents of mymirage.img.tar.gz.enc to a file called mymirage.img.part.0. Which works.

Actually I have tried to implement this. But it is only writing 65536 bytes per part instead of the desired 10MB.

(* split a file into parts of 10MB or less return a list of the names of the files created *)
let split file =
let open Unix in
let chunk_size = 1024 * 1024 * 10 in
let buffer = String.create chunk_size in
let fd_in = openfile file [O_RDONLY] 0 in
let rec copy_loop n ps () =
let part =
let name = Filename.(chop_extension @@ chop_extension @@ name_only file) in
tmp @@ Printf.sprintf "%s.img.part.%i" name n in
let ch_out = openfile part [O_WRONLY; O_CREAT; O_TRUNC] 0o666 in
match read fd_in buffer (n * chunk_size) buffer_size with
| 0 -> close fd_in; ps
| r -> print_endline @@ Printf.sprintf "writing %i bytes" r;
ignore (write ch_out buffer (n * chunk_size) r);
close ch_out;
copy_loop (succ n) (part::ps) () in
copy_loop 0 [] ()

Components of the XML manifest

The xml manifest has this structure:

  • `manifest`
    • `version` a string that identifies the type of manifest
    • `bundler` (optional?) information about who created the manifest
      • `name`
      • `version`
      • `release`
    • `machine_configuration`
      • `architecture` `x86_64` for our purposes
      • `kernel` (optional) a string eg `aki-fc8f11cc`
    • `image`
      • `name` string eg `mymirage.img`
      • `user` 12-digit AWS user id
      • `type` the string “machine” for our purposes
      • `digest` SHA1 digest of the encrypted image (created during the compression & encryption process mentioned above)
      • `size` size of the original image (eg size of `mymirage.img`)
      • `bundled_size` size of the compressed, encrypted image (eg size of `mymirage.img.tar.gz.enc`)
      • `ec2_encrypted_key` RSA public encryption of the key used to encrypt the compressed image. The public key is from the ec2 certificate
      • `user_encrypted_key` RSA public encryption of the key used to encrypt the compressed image. The public key is from the user’s certificate
      • `ec2_encrypted_iv` RSA public encryption of the iv used to encrypt the compressed image. The public key is from the ec2 certificate
      • `user_encrypted_iv` RSA public encryption of the iv used to encrypt the compressed image. The public key is from the user’s certificate
      • `parts` info on the 10MB parts into which the encrypted image was split
        • `part` one or more of these
          • `filename` string eg `mymirage.img.part.0`
          • `digest` SHA1 digest from the contents of the part
    • `signature` SHA1 digest of the XML “ and “ info signed with the user’s private key

A number of these fields are easy to fill (eg size or version). Unfortunately a number of them (eg ec2_encrypted_key) are not easily replicated, making it difficult to pinpoint exactly what is wrong with the manifests I’ve created.

Verifying manifest correctness

ec2-unbundle promises to extract an image given a manifest and private key. Sadly….

$ ec2-unbundle -m mymirage.img.manifest.xml -k onekeytorulethemall.pem --debug
ERROR: padding check failed
/usr/local/ec2/ec2-ami-tools-1.5.3/lib/ec2/amitools/unbundle.rb:49:in `private_decrypt'
/usr/local/ec2/ec2-ami-tools-1.5.3/lib/ec2/amitools/unbundle.rb:49:in `unbundle'
/usr/local/ec2/ec2-ami-tools-1.5.3/lib/ec2/amitools/unbundle.rb:100:in `main'
/usr/local/ec2/ec2-ami-tools-1.5.3/lib/ec2/amitools/tool_base.rb:201:in `run'
/usr/local/ec2/ec2-ami-tools-1.5.3/lib/ec2/amitools/unbundle.rb:109:in `'

…so I hardcoded my keys in…

# Extract key and IV from xml manifest
# key = pk.private_decrypt(Format::hex2bin( manifest.ec2_encrypted_key))
# iv = pk.private_decrypt(Format::hex2bin( manifest.user_encrypted_iv))
key = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
iv = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"

… and discovered that my digest is incorrect!

$ ec2-unbundle -m mymirage.img.manifest.xml -k onekeytorulethemall.pem --debug
Pipeline.execute: command = [/bin/bash -c 'openssl sha1 /tmp/image-unbundle-pipeline-pipestatus-020140731-20445-zhcuih & echo ${PIPESTATUS[1]} > /tmp/image-unbundle-pipeline-pipestatus-120140731-20445-y3ajkk & echo ${PIPESTATUS[2]} > /tmp/image-unbundle-pipeline-pipestatus-220140731-20445-13o3moo & echo ${PIPESTATUS[3]} > /tmp/image-unbundle-pipeline-pipestatus-320140731-20445-5l5cs2 & echo ${PIPESTATUS[4]} > /tmp/image-unbundle-pipeline-pipestatus-420140731-20445-vxbp20']
Pipeline.execute: output = [(stdin)= e65de62e203671c803b532f046aa5479277790d6]
ERROR: invalid digest, expected da39a3ee5e6b4b0d3255bfef95601890afd80709 received e65de62e203671c803b532f046aa5479277790d6

Oddly enough re-creating the manifest /can/ produce the digest amazon calculates, but most times will not. I have no idea why this is, but at least I can identify when it is happening. I suspect the problem is more with the cmd being run than with the OCaml code.

(* calculate digest; compress & encrypt image *)
let pipeline ~digest_pipe ~tar ~key ~iv ~encrypted_destination =
let open Unix in
let cmd = Printf.sprintf "openssl sha1 %s"
digest_pipe tar digest_pipe key iv encrypted_destination in
let ic = open_process_in cmd in
let digest = input_line ic in
close_process_in ic;

What remains to be done

Unfortunately I still haven't launched a proper, working instance:

Warning: unable to open an initial console.
Kernel panic - not syncing: No init found. Try passing init= option to kernel.

A diff (thank you internet) on the original image and ec2-unbundled image showed that I hadn’t, say, inadvertently corrupted something during the bundling process. So I’m about 99.997% certain that (but for the digest issue) the issue here is related to how I turned the xen kernel into an image.

EC2 Documentation (warning: pdf link!) says an initrd needs to be generated. I don’t see my script doing that. I have a feeling I’ve accidentally deleted that part (I am prone to these kinds of accidents…). Alas the original script on the mirage wiki is either broken or has mysteriously disappeared. (I’m actually not sure which it is. The first time I checked it out, the page was empty. The second time, literally nothing happened when I clicked the link..)

But anyway, what I have left to do

  • fix (or find) the image creating script
  • consistently calculate the correct digest

(and less pressingly)

  • split compressed/encrypted large images into 10MB pieces