Recovering VMware snapshot after parent changed
UPDATE (21/05/2010): I've been alerted to the ridiculous amount of comment spam this page has gotten; apologies to those who were further spammed by the email notifications. I have therefore disabled the email and commenting features, and all future comments will be moderated. Damn spammers have to ruin everything, grrrr.
Scroll down to the problem or solution section below if you want to cut to the chase.
I upgraded my Kubuntu installation to Gutsy today - of course, it wasn't as smooth as it should've been. First I had to work out how to do it - the instructions were brief, screenshots confusing, and the process just didn't feel natural. The 'version upgrade' button only appears after you have satisfied certain conditions, conditions that you don't know. It just magically appears when it wants to, after pressing a special sequence of buttons.
Then the 'distribution upgrade' process crashed, packages won't install. Ended up working after a few tries.
For some stupid reason, they still haven't fixed the 'failed to set xfermode' bug that heaps of people have encountered and really cripples the system because the system doesn't boot at all. In fact, it removes the fix for it too - adding irqpoll to the end of the kernel line for the appropriate entry in /boot/grub/menu.lst.
Plus they introduced a new bug by adding tablet settings into /etc/X11/xorg.conf by default, even if no tablet exists, tripping up the system. And did I mention that the network connection is flaky and standby/hibernate still doesn't work? Linux is still Linux it seems.
Anyway, it all worked out in the end after some googling so I went to install VMware Server on it so I could run my virtual machines on it as well as in Windows. There is no package install available for it, so follow the instructions here, however, use this patch instead.
Once all that was working, I ran the VMware Console, about to run my Windows Server 2003 Standard Edition virtual machine, when I thought, hmm..., I don't want this VMware instance fudging with the Windows VMware instance, so I'll create a new virtual machine, and link it to the existing virtual hard disk.
Problem
All sounded cool, until I accidentally linked to the base parent hard disk, and not the latest snapshot. So once I booted it, not only did I not have the latest changes, but when I re-linked it to the latest snapshot, it wouldn't boot anymore. Instead I got the error message, "Cannot open the disk ... Reason: The parent virtual disk has been modified since the child was created."
Did I mention that the virtual machine housed the test instance for this website, including the changes I had been working on all weekend, and I had no other backup? ![]()
After a few minutes of cursing and swearing, banging on tables, wondering wtf I had done, and pondering redoing all those changes again, I did what every self-respecting nerd does when they're stuck - turn to google.
Solution
I found these links:
Here is my solution, which is basically a rewrite of the process in the last link above, with a few more details. I used Linux to do the recovery, mainly because it had commands that I needed. I assume you have some Linux command line knowledge, as all this will be performed in the terminal.
-
Make a copy of the virtual machine folder in case you screw up.
-
Look at the size of the snapshot virtual hard disk. If it is more than 2GB and you're running a 32-bit OS, or it is more than the amount of memory that you have available, the following method will probably not work. You're welcome to try though.
The virtual hard disk files all end in .vmdk. The snapshot one has -xxxxxx on the end of the file name, indicating the snapshot number. For example, if my virtual machine was called Windows Server 2003 Standard Edition, my base parent virtual disk will be named Windows Server 2003 Standard Edition.vmdk, and my snapshot may be named Windows Server 2003 Standard Edition-000002.vmdk. -
Find out the CID of the base parent virtual hard disk. Because this virtual hard disk will most likely be larger than 2GB, you won't be able to open it in nano, vi etc. As we only need to read from it, we can use a linux command to print out only the first 20 or so lines.
head --lines=20 {base parent vmdk path}
The CID is the 8-character random string on the line starting with CID=. Write this down somewhere.
Replace {vmdk path} with the path to the base parent virtual hard disk file, e.g.
head --lines=20 /media/sda1/"Virtual Machines"/"Windows Server 2003 Standard Edition"/"Windows Server 2003 Standard Edition.vmdk" -
Now open up the snapshot virtual hard disk in a text editor, and change the parentCID (not CID) to the CID you recorded in the previous step. Then save. You can use nano, vi or some other Linux editor, e.g.
sudo nano {snapshot vmdk path}
Make sure to sudo the command, and also be patient - it could take a few minutes, during which the console may remain black; it is loading.
I chose to do this in Windows instead, using Editpad Lite which is amazingly fast. -
That's it, your virtual machine should now start up again.
Further explanation
If you're interested, here's a deeper look into what you just did. At the beginning of each vmdk file is a disk descriptor section, which contains the properties of that virtual hard disk in text. The CID is a random unique identifier that identifies a particular state of the virtual disk - each time a change is made to the virtual hard disk, the CID changes.
In normal operation, the CID property of the base parent virtual hard disk is synced with the parentCID property of the snapshot virtual hard disk to show that the two files work together. The snapshot has to work with the base parent to be useful, as it only contains the differences from the base parent virtual hard disk. It is important to note that it is the snapshot's parentCID property that is synced with the base parent's CID property, not just the two CID properties in the virtual hard disks - the two virtual hard disks are in a parent-child relationship.
When you startup the base parent virtual hard disk on its own however, changes are made to that virtual hard disk without being in sync with the snapshot, so the CIDs no longer match.
And when the CIDs no longer match, VMware complains because the snapshot is out of sync and the changes in the snapshot may not apply properly to the base parent anymore, possibly resulting in data corruption.
By forcing the CIDs to match again, you effectively trick VMware into thinking it was never out of sync.
Depending on how complex your virtual machine is though, it may be worth recreating your virtual machine after recovering your data because it won't be known where the corruption is, if any. If you did anything to the base parent virtual hard disk before realising and shutting down, e.g. copied files around, the risk of corruption is higher.
Thank you!
That certainly saved me from my own stupidity. Even before I had a chance to lose any sleep.
From now on my snapshots are going to experience very short lives.
Test and commit shall be the new motto.
fantastic advice
You friggen rock! You saved my 6 hours of a night shift and 2 secs of stupidity!
Thank you! Great manual!
I would like to say thank you very much! This manual was very helpful. Now i will live longer.
if you have windows 32bit system you can open and save big files with the program "winhex". It is very fast - i tried it out because i had not linux on my notebook.
What a day.. This really really saved me. Now I'll have to re-do our backup policy, keep everybody out of our vmware, but most of all CONGRATULATE you for your skills and knowledge. This saved me and now I have a much better understanding of those freaking snapshots. You are the MEN!
The outline of the fix is this:
1) BACK EVERYTHING UP
2) lookup the CID of the parent disk image
3) lookup the (incorrect) parentCID of the curdled snapshot
(you'll need both to make the sed command as restrictive as possible)
4) KEEPING THE BACKUP, remove the original of the curdled snapshot file
5) pipe just the beginning of the curdled snapshot through sed to change the parentCID
and save that as the beginning of the reconstructed snapshot
6) append the rest of the curdled snapshot to the end of the reconstructed snapshot
dd is the tool for snipping pieces of a HUGE file
And here's how it looks in practice:
[root@build12 virtual_machines]# cp -R sea-cm-winvm01 /backup
[root@build12 virtual_machines]# cd sea-cm-winvm01
[root@build12 sea-cm-winvm01]# head -10 /backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk
KDMV
# Disk DescriptorFile
version=1
CID=0d55cd6c
parentCID=b1ce363c <-- INCORRECT PARENT CID
createType="monolithicSparse"
parentFileNameHint="sea-cm-winvm01.vmdk"
# Extent description
RW 83886080 SPARSE "sea-cm-winvm01-000001.vmdk"
[root@build12 sea-cm-winvm01]# head -10 /backup/sea-cm-winvm01/sea-cm-winvm01.vmdk
KDM
Disk DescriptorFile
version=1
CID=d68511e8 <-- CORRECT PARENT CID
parentCID=ffffffff
createType="monolithicSparse"
# Extent description
RW 83886080 SPARSE "sea-cm-winvm01.vmdk"
[root@build12 sea-cm-winvm01]# rm sea-cm-winvm01-000001.vmdk
[root@build12 sea-cm-winvm01]# dd if=/backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk count=10 | sed 's/parentCID=b1ce363c/parentCID=d68511e8/' >sea-cm-winvm01-000001.vmdk
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00722415 seconds, 709 kB/s
[root@build12 sea-cm-winvm01]# dd if=/backup/sea-cm-winvm01/sea-cm-winvm01-000001.vmdk skip=10 seek=10 of=sea-cm-winvm01-000001.vmdk oflag=append
75301238+0 records in
75301238+0 records out
38554233856 bytes (39 GB) copied, 716.488 seconds, 53.8 MB/s
THANK YOU .... YOU JUST SAVED ME WITH YOUR BLOG...
I used "010 Editor" to edit the 30G file, which was very fast.. no loading time even.
THANK YOU .... YOU JUST SAVED ME WITH YOUR BLOG...
I used "010 Editor" to edit the 30G file, which was very fast.. no loading time even.
Thank you so much, you saved my life!
A MLLION THANKS!
I messed the VMDKs of our main production server after attaching the main VMDK to another virtual machine to add some Windows files. When I attached the HD to the original virtual machine, I didn't boot any more, came up with the dreaded "parent modified..." message.
Fixed it on ESX server 3.5 from the console, with "head --lines=20" and "nano", following your instructions. Worked perfectly! the main file was 137Gb and there was 3 snapshot files, about 10Gb each. The snapshots were linked from last to first and then to the main file (3->2->1->original)
After fixing the CIDs, the machine worked fine, even after having writing and then deleting some files inside the VMDK.
You are a Star!
Angel, Santiago de Compostela, Spain.
I have solve the issue follow your steps.
But I didn't work in Linux.
I make a simple tool for windows.
Main Code:
try
{
txtResult.Clear();
StreamReader sr = new StreamReader(txtPath.Text);
decimal Up = nudLines.Value;
decimal i = 0;
while (i < Up)
{
txtResult.Text += sr.ReadLine();
txtResult.AppendText("\r\n");
i++;
}
sr.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
Great solution!!
It save me a lot of time. Because I don't have to reinstall the hole system.
Thankyou very much.
I googled, found you, and you just saved my day. Quick, comprehensive, and easy.
Thanks a lot!
Thank you. You save my life. I moved our primary domain controller only to find it would start up. AHH.
Your fix did the trick. In esx 3.5 the files you mention are much smaller now and the main disk is called ***flat.vdmk
Guys... to sum it up : THANK YOU!!!
I too had the bad luck of a non-booting VM.
This page contains more relevant info than the rest of the web...
Again... THANKS, you guys saved me weeks of work!!
Bert
Pefect this saved my bacon. We had the issue described but the problem occured during a VCB backup.
You saved my day. 2 weeks of work where in that snapshot the i just clocked an old "Copy of ".vmx file.
I had more adrenaline than blood in me. If you are every looking for someone th marry you... ;-)
Thanx Ruediger
Thanks for your post. It got us through quite a pickle last night when ESXi blew up a VM during a snapshot deletion. Great stuff!
Thanks for many hours saved
For me this was the most useful blog entry since the beginning of blogs!
Somehow the CID's got messed up with the vmware-mount.pl command, so be careful with this and make a backup before using this command!
Thanks a lot!
Just a quick note. You can also get the Parent CID from the vmware.log It will say something like "Content ID mismatch (f6c96825 != f6c96826)."
After reading all of the Techno Babble, I finally came to an article that I can understand!!! Thank you many times!!!
I moved my VM (including snapshot) to a different blade, got this error message and had idea what to do. VMWare forums not really that helpfull or clear!
Thanks to your great article I'm up and running again.
Thanks very much, you saved my bacon >8)
And here's one more sucker you saved with this article! Thank you very much!! Yesterday i noticed my vm-harddisk (60Gb) had grown to use 160 Gb (no typo..) of diskspace. Of course i didn't backup at that time because of lack of backup-facilities/diskspace at that moment.. So went further and further from home... :0)
Anyway, thnx once more!!
regards,
Peter
guys, i am new to vmware and have an esx 3i server running with 3 vm's. one of my colleagues has tried to clean up some snapshots and is now getting this error. how do i edit/access the vmdk files? is there a way to do this from the Infrastructure client or can i run the linux commands on the actual VMware server itself? pretty deperate - this is (or was) a live server. i have data backups but dont want to rebuild if the fix here is valid for my situation.
You are a genius! Thank you
I use the following to fix this problem:
1. putty into the host
2. run vmware-cmd -l to find the path of the bad VM
3. CD /path/to/vm/
4. cat NAMEOFTHEDISK-xxxxxx.vmdk (for hard disk 1)
5. (A) cat NAMEOFPARENTDISK.vmdk (shown in the previous command's output for parentFileNameHint
5. (B) keep running cat parent.vmdk until you have displayed each snapshot, it's parent --> to the base.vmdk disk
example...
[root@VMHost01 Server1]# cat SERVER1-000001.vmdk
# Disk DescriptorFile
version=1
CID=fe498eca
parentCID=66ed665b
createType="vmfsSparse"
parentFileNameHint="SERVER1.vmdk"
# Extent description
RW 35358082 VMFSSPARSE "SERVER1-000001-delta.vmdk"
# The Disk Data Base
#DDB
[root@VMHost01 Server1]# cat SERVER1.vmdk
# Disk DescriptorFile
version=1
CID=66ed665b
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 35358082 VMFS "SERVER1-flat.vmdk"
# The Disk Data Base
#DDB
ddb.adapterType = "buslogic"
ddb.geometry.sectors = "63"
ddb.geometry.heads = "255"
ddb.geometry.cylinders = "2200"
ddb.uuid = "60 00 C2 9e 7c 4c 5e c4-ea f5 d8 1e 6c 36 06 40"
ddb.geometry.biosSectors = "63"
ddb.geometry.biosHeads = "255"
ddb.geometry.biosCylinders = "2200"
ddb.toolsVersion = "7299"
ddb.virtualHWVersion = "4"
6. Notice the CID and ParentCID entries of the output:
server1-000001.vmdk
CID=fe498eca
parentCID=332a8cca <---- THIS ONE IS NOT POINTING TO...
server1.vmdk
CID=66ed665b <---- THIS ONE
parentCID=ffffffff
7. run the following:
nano server1-000001.vmdk
edit the parentCID by overwriting 332a8cca with 66ed665b
Do CTRL+X and then answer 'Y' to save the changes
8. now go back and show the output again (use the cat commands like before) each parentCID should be pointing to the parent file that VMWare expects as listed in the parentFileNameHint.
9. once this is completed if you do not need the snapshot you should also go to the VMClient and go to snapshot manager and delete all snapshots. If there are no snapshots to delete, create one, then immediately delete it. This should remove all snapshots.
*** NOTE *** if you have to create a snapshot, you may want to check it's CID/ParentCID for all disks to make sure VMWare didn't do something stupid like create a snapshot file with a CID and ParentCID pointing to itself. If that occurs, just fix the pointers like before and then delete all snapshots.
This works for me 100% of the time when I have any of the corruptRedo log errors, Parent had been modified errors, bad CID/ParentCID issues, or VM in stuck state due to failure to consolidate snapshots after VCB backups
Thanks for your help guys !
You saved me :)
Particulary regarding the tool to edit a 60GB vmdk file very quickly with no delay !! (010 Editor... great tool !)
You are a true Saint. After accidently clicking on the vmdk file while backing up my Mac, I got the dreaded error. 5 hours into the ordeal, I finally got things back up and running. It took hours to back everything up first. Then, I couldn't find a good editor that could handle a 14Gig snapshot on the Mac. I finally found 0xED which Rocked!!!
www.suavetech.com/.../0xed.html
Great text editor loads the file instantly -a quick hex edit of the Parent CID on the snapshot file and I was back up and running. Keep up the good work and thanks for posting such a concise solution. Going to bed now...
I'm having a crisis with the same issue! I need to get the Outlook data off of this silly VMWare Fusion windows XP partition.
Thanks for this article!! You saved my ass, thank you!
Thanks a lot for googleing around and cutting the information down to the essential point - you saved me really much time rescuing my VM :o)
And as it seems, I'm not the only stupid one who screws up links to parent VM-Disks etc. :o)
Hi all,
just a maybe stupid question:
can I recover VM with snapshots from main *.vmdk, *flat.vmdk and only *delta.vmdk (without describtor file *.vmdk)?
more precise, I have:
server.vmdk
server-flat.vmdk
server-000002-delta.vmdk
server-000003-delta.vmdk
I successfully recovered VM from only *flat.vmdk, however, withnout snapshots - so it is possible to recover from *delta.vmdk all the rest files, like *.vmsn, *.vmdk(for *delta.vmdk)
Hi,
i follow your steps but when i try to start vm i get "Failed to retrieve disk information for: xxx.vmdk" Success
and i can't startup :(
can you help me?
In case anyone needs more background to solve similar problems see sanbarrow.com/sickbay.html
Thank You very very much.
I was trying to be slick by using a common VMDK on a RAM disk and run multiple concurrent copies of VMs each with their own vmx and snapshots. That part had been working for months.
Then I set one of my vmx's to use the base vmdk as independent-nonpersistent. That broke the entire chain and nothing would boot! I got this sick sick feeling in my stomach. Then I read your blog and I was hopeful. I updated the CID in my snapshots and it all worked.
Thank You, Thank You, Thank You.
If you have a large amount of snapshots finding where is the broken CID can take a while.
Here is a script that will do that check and many others for you.
And here is a video to help you avoiding surprises with the snapshots.
Works like a charm saved my tons of work
I am about to try this now, wish me luck!
Unfortunately this didn't work for me, I had to use this method rdowell.blogspot.com/.../parent-virtual-disk-has-been-modified.html
Thanks this saved me. BTW. anther JGSoft product PowerGrep (makers of EditPad) lets your find and edit the CID very easily.
Dude! You saved my life with this article. Thank you so much!
New comments have been disabled for this post.
If you have something to ask about this post, drop me a message.
Hi Samuel --
One suggestion: instead of opening the snapshot file to replace the parentCID number (which, as you point out, doesn't work if the snapshot is >2GB), use command line utilities to make the change.
I found my parent CID from the base vmdk with:
grep --text -m2 CID= {base vmdk}
and the "wrong" parent CID in the snapshot vmdk:
grep --text -m2 CID= {snapshot vmdk}
Then replaced the child CID using a sed command:
sed -e 's/{wrong CID}/{right CID}/' {snapshot vmdk} > {snapshot vmdk}
That should get it done!