+ Reply to Thread
Page 1 of 2 1 2 Last
Results 1 to 25 of 29
  1. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #1

    Default Snapshot Removal: Anyone ever have a snapshot removal hang @ 99%?

    Hey guys,

    Have you guys ever had a snapshot removal hang @ 99% and just stay there, so far it's been at 99% for 2 hours on our 6.7TB file server. Been doing snapshot cleanup and all of the other servers take like 30 to 40 minutes max.

    Is it normal for large VM's to take a few hours or so and just sit at 99%?
    Last edited by Deathmage; 06-23-2015 at 09:39 PM.
    Reply With Quote Quote  

  2. SS -->
  3. VCDX in 2017 Essendon's Avatar
    Join Date
    Sep 2007
    Location
    Melbourne
    Posts
    4,489

    Certifications
    VCIX-NV, VCAP5-DCD/DTA/DCA, VCP-5/DT, MCSA: 2008, MCITP: EA, MCTS x5, ITIL v3, MCSA: M, MS in Telecom Engg
    #2
    How big and old was the delta? Were there multiple levels of deltas?
    VCDX: DCV - Round 2 rescheduled (by VMware) for December 2017.

    Blog >> http://virtual10.com
    Reply With Quote Quote  

  4. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #3
    okie... So I had to dig on the internet for these command... but case-in-point, vCenter lies!!!!!

    CLi Rules! it's just taking it's jolly old time to delete!!!!!

    Below:

    Picture 1: yes it's a sliver (white on white lol!!!) you may not see it.... notice the time, it's now 5:28 PM here.

    99% my asssss.JPG


    Picture 2:

    99% ya right....jpg

    it's still working, just really fracking slow....notice the percentage....

    Picture 3:

    aleays trust cli!.JPG
    Last edited by Deathmage; 06-23-2015 at 10:33 PM.
    Reply With Quote Quote  

  5. Member
    Join Date
    Apr 2010
    Location
    Kansas
    Posts
    56

    Certifications
    CCNA:S, CCNA:R&S; VCA(DCV,WM,C); A,Net,Sec,Proj+, Some CIW, BS:IT, AAS
    #4
    It can happen, I've had an old snapshot that was a couple of months old and it hung at 99% for about 5 hours before it finished.
    Reply With Quote Quote  

  6. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #5
    Quote Originally Posted by Reibe View Post
    It can happen, I've had an old snapshot that was a couple of months old and it hung at 99% for about 5 hours before it finished.
    yes thats me this snap shot is from March. Been looking into why the IOPS are so bad. I started digging and found a blog about keeping too many snapshots and we had 5 on the SQL VM. So this is the 1st one of 5 to be purged.

    So we shall see what this does for performance, so far the IO has decreased as this sucker is being removed.

    Learned something new about snapshots, I remember it from the exam but never actually put 1 and 1 together.

    Quote Originally Posted by Essendon View Post
    How big and old was the delta? Were there multiple levels of deltas?
    ...a few delta's as you can see from above...
    Reply With Quote Quote  

  7. VCDX in 2017 Essendon's Avatar
    Join Date
    Sep 2007
    Location
    Melbourne
    Posts
    4,489

    Certifications
    VCIX-NV, VCAP5-DCD/DTA/DCA, VCP-5/DT, MCSA: 2008, MCITP: EA, MCTS x5, ITIL v3, MCSA: M, MS in Telecom Engg
    #6
    I hope you've been backing up this VM.
    VCDX: DCV - Round 2 rescheduled (by VMware) for December 2017.

    Blog >> http://virtual10.com
    Reply With Quote Quote  

  8. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #7
    it's working fine, the snapshot removal went smoothly, just finished a few minutes ago. Only took 5 hours though.

    The server is backed up with Backup Assist and BE 12.5.
    Reply With Quote Quote  

  9. VCDX in 2017 Essendon's Avatar
    Join Date
    Sep 2007
    Location
    Melbourne
    Posts
    4,489

    Certifications
    VCIX-NV, VCAP5-DCD/DTA/DCA, VCP-5/DT, MCSA: 2008, MCITP: EA, MCTS x5, ITIL v3, MCSA: M, MS in Telecom Engg
    #8
    Did you forget about the snapshots or were they left behind by the backups software?

    Take a moment to read this > VMware KB: Best practices for virtual machine snapshots in the VMware environment
    VCDX: DCV - Round 2 rescheduled (by VMware) for December 2017.

    Blog >> http://virtual10.com
    Reply With Quote Quote  

  10. kj0
    kj0 is offline
    Apple and VMware kj0's Avatar
    Join Date
    Apr 2012
    Location
    Brisbane, Australia.
    Posts
    744

    Certifications
    vExpert x 4 | Apple Mac OS X Associate | Cert III - IT.
    #9
    Keep an eye on your Snapshots. I run RVTools regularly to keep up with all those metrics.

    I've had it a few times, It's got to the point where we couldn't wait any longer as it was affecting a customers work, so we started migrating everything off the host, and as soon as everything was off except that VM, it completed straight away.

    If you're backing up VMs, be really careful with your Snapshots as you may end up backing up the snapshot, so you would ultimately end up with a "Backup of a backup" situation.
    2017 Goals: VCP6-DCV | VCIX
    Blog: http://readysetvirtual.wordpress.com
    Reply With Quote Quote  

  11. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #10
    Ya I basically forgot about them.... just got done with a 3 week wireless project and I neglected the cluster for a few weeks. I mean I'd check vCOPS and graphs but slipped my mind of snapshots.

    going to be way more overcautious from now on though.

    I am going to sit in cli more often though, I hear for the VCAP I need to graft it to my skull
    Reply With Quote Quote  

  12. Senior Member
    Join Date
    Oct 2013
    Posts
    1,145

    Certifications
    RHCE
    #11
    If you're using SnapShot Manager, the snapshot usually is deleted at that point (95%-99%) but can take hours to update host management agents. I ran into this issue a few times and found an article that explained what was happening and how to fix it. Of course the commands were different for our version of ESXi:

    ESXi Remove All Snapshots hangs at 99% | Blog-Stack.net

    This article had the correct commands:

    VMware KB: Committing snapshots when there are no snapshot entries in the Snapshot Manager

    You basically end up restarting the host management agents (I end up having to run services.sh restart since the other commands almost never work on 4.1):

    VMware KB: Restarting the Management agents on an ESXi or ESX host

    Don't spend 5 hours picking your nose...follow those links and get the process finished.
    Last edited by Verities; 06-23-2015 at 11:50 PM.
    Reply With Quote Quote  

  13. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #12
    Then I presume migrating the VM's to our other hosts and rebooting the troubled host would be just as good as reset the agents. The up-time on the hosts in 4 months now, maybe a reboot would be good.
    Reply With Quote Quote  

  14. Reticulating splines... iBrokeIT's Avatar
    Join Date
    Jul 2013
    Location
    Twin Cities, MN
    Posts
    1,045

    Certifications
    GCIH, GSEC, VCAP5-DCA, VCP5-DCV, MCITP:EA, MCSA 2003/08
    #13
    Quote Originally Posted by kj0 View Post
    Keep an eye on your Snapshots. I run RVTools regularly to keep up with all those metrics.
    I love RVTools for exactly this reason. It is nice to run monthly to make sure your LUNs are staying clean of orphaned files.
    Reply With Quote Quote  

  15. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #14
    Quote Originally Posted by kj0 View Post
    Keep an eye on your Snapshots. I run RVTools regularly to keep up with all those metrics.

    I've had it a few times, It's got to the point where we couldn't wait any longer as it was affecting a customers work, so we started migrating everything off the host, and as soon as everything was off except that VM, it completed straight away.

    If you're backing up VMs, be really careful with your Snapshots as you may end up backing up the snapshot, so you would ultimately end up with a "Backup of a backup" situation.
    RVtools huh, is that addon or a command-line?
    Reply With Quote Quote  

  16. kj0
    kj0 is offline
    Apple and VMware kj0's Avatar
    Join Date
    Apr 2012
    Location
    Brisbane, Australia.
    Posts
    744

    Certifications
    vExpert x 4 | Apple Mac OS X Associate | Cert III - IT.
    #15
    RVTools - Home

    Do you use twitter - Its very popular on there
    2017 Goals: VCP6-DCV | VCIX
    Blog: http://readysetvirtual.wordpress.com
    Reply With Quote Quote  

  17. Reticulating splines... iBrokeIT's Avatar
    Join Date
    Jul 2013
    Location
    Twin Cities, MN
    Posts
    1,045

    Certifications
    GCIH, GSEC, VCAP5-DCA, VCP5-DCV, MCITP:EA, MCSA 2003/08
    #16
    The vHealth tab is a good place to start
    Reply With Quote Quote  

  18. Senior Member
    Join Date
    Oct 2013
    Posts
    1,145

    Certifications
    RHCE
    #17
    Quote Originally Posted by Deathmage View Post
    Then I presume migrating the VM's to our other hosts and rebooting the troubled host would be just as good as reset the agents. The up-time on the hosts in 4 months now, maybe a reboot would be good.
    Not to sound like a dick, but you presume wrong; the host management agents are services that are restarted on the host and doesn't affect the VMs as long as you don't have the automatic startup/shutdown option enabled. 4 months uptime on a host is really good.....ESXi does not need to be rebooted for no reason.
    Reply With Quote Quote  

  19. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #18
    Naaa no disrespect seen. I have pretty thick skin

    I'll give it a try next time.

    With each snapshot being deleted the performance of the array is improving. Just two more to go and I'm done. Going to let them cook while I sleep and check them in theon ing on the remote terminal server.

    Also doing a Defrag on all the except sql. We're doing a database packing Saturday.
    Reply With Quote Quote  

  20. Senior Member
    Join Date
    Oct 2013
    Posts
    1,145

    Certifications
    RHCE
    #19
    Quote Originally Posted by Deathmage View Post
    Naaa no disrespect seen. I have pretty thick skin

    I'll give it a try next time.

    With each snapshot being deleted the performance of the array is improving. Just two more to go and I'm done. Going to let them cook while I sleep and check them in theon ing on the remote terminal server.

    Also doing a Defrag on all the except sql. We're doing a database packing Saturday.
    A few tips for SQL DBs:

    -Verify the size of the logs vs what vCenter expected size is based on logging level

    -Shrink the DB, by reducing white space

    -Truncate tables

    I've run into issues like 91GB transaction logs on a 7 host/56VM setup with level 3 logging. Its one of those things people don't think about until vCenter starts to run slow as mud.
    Reply With Quote Quote  

  21. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #20
    Thanks for the pointers.

    I've seen over the past few months, SQL is it's own kind of animal it sometimes defies logic. More-so than my previous employments, especially concerning is SQL queries that aren't written for speed but just-to-get-rr-done copy-n-paste and 'ooo look it works, lets leave it like this' (16 lines of code when it just needs to be 2) and a SQL database that has never been packed in 7+ years, ya know the things normal people call 'preventative maintenance'. -- case-in-point I did a purge a few months back of temp files and other safety removed system cache files like memory dumps, windows/dns/font logs, on the SQL server it was 65.6 GB's in size for TEMP FILES!!!!!!!!

    I'm having to read blog after blog on SQL performance because my previous predecessor had no idea of it (makes me wonder if I should get my 'MCSE: SQL' before SI next before my VCAP) and my co-worker the programmer is shaking-his-head over the programming code and how inefficient it is. Some queries take fracking forever and it's not the array it's just the length of the poor coding.

    I'm having to think like it's a women (SQL) and treat it that way and be very very cautious by what changes are made to the systems.

    It's funny our SQL and Syteline ERP VM's are the only problem children, all the others are working fine, but you always have those select few that give you headaches and stress.
    Last edited by Deathmage; 06-24-2015 at 11:52 AM.
    Reply With Quote Quote  

  22. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #21
    These number look much better this morning. Still two snapshots to be removed but there from last week. Will purge them tonight after-hours and do a consolidation afterwards of the logs.

    Be nice to see what the database packing will do next week.

    much better.jpg
    Reply With Quote Quote  

  23. Senior Member
    Join Date
    Oct 2013
    Posts
    1,145

    Certifications
    RHCE
    #22
    Quote Originally Posted by Deathmage View Post
    These number look much better this morning. Still two snapshots to be removed but there from last week. Will purge them tonight after-hours and do a consolidation afterwards of the logs.

    Be nice to see what the database packing will do next week.

    Attachment 6881
    These performance charts remind me of a 5 year old trying to draw mountains on a piece of paper.
    Reply With Quote Quote  

  24. Google Ninja jibbajabba's Avatar
    Join Date
    Jun 2008
    Location
    Ninja Cave
    Posts
    4,240

    Certifications
    TechExam Certified Alien Abduction Professional
    #23
    I remember panicking back in the day about a removal getting stuck for ages. It took 36!!! hours to remove. Colleague made the mistake of shutting the VM down in the hope it speeds things up, it didn't. Maybe it did and it would have taken 72hrs but because it was in the middle of the removal you weren't able to power it back on so effectively the server was down for over a day.

    And that was an orphaned Veeam snapshot.
    Reply With Quote Quote  

  25. Senior Member
    Join Date
    Apr 2013
    Posts
    2,413
    #24
    Yup I learned that too. I took down the print server after hours at 7pm in the middle of the removal and I also found you can't power it back on while the removal is happening.

    Glad the last two snapshots were successfully removed from the Sql last night so the performance is definitely way better.

    I guess sometimes books smarts doesn't always teach you real-life stuff with VMware or anything IT for that matter...

    Won't let this happen again...
    Reply With Quote Quote  

  26. kj0
    kj0 is offline
    Apple and VMware kj0's Avatar
    Join Date
    Apr 2012
    Location
    Brisbane, Australia.
    Posts
    744

    Certifications
    vExpert x 4 | Apple Mac OS X Associate | Cert III - IT.
    #25
    I attempted to migrate a 1.5TB vmdk across two datastores to convert it to thin provision before business hours (was a school. and a video library server) ... Lets just say that cancelling it at 13% after an hour still took 2 hours to rollback!
    2017 Goals: VCP6-DCV | VCIX
    Blog: http://readysetvirtual.wordpress.com
    Reply With Quote Quote  

+ Reply to Thread
Page 1 of 2 1 2 Last

Social Networking & Bookmarks