00:01
Well, thank you all so much for joining us. I'm personally very excited to hear this story again. My name is John Kimmerly, and for Pure, I support healthcare providers globally. Um, if I talk with healthcare CIOs, CTOs, three things always top the list: cyber, AI, and creating options for virtualization. And that's what — the last one is what
00:23
we're gonna touch on today within the Epic environment. I'll let Mark introduce himself. We'll go through this legacy story and afterwards we'll actually have a panel with some other industry perspectives, so I'm looking forward to that as well. But as I think about virtualization, the legacy story, I am reminded of Abraham Lincoln's quote that
00:45
the best way to predict the future is to create it, or the best way to manage uncertainty and unpredictability in the future is just to create the reality. So legacy story is about that — really just taking that into their own hands. So with that, I'm gonna let Mark introduce himself.
01:07
Himself and tell the legacy story, and again we'll reserve 10 or so minutes at the end to actually have a panel discussion. So Mark, over to you. Great, thank you, John. I'm Mike, so I'm good. So my name is Mark Hendricks. I am a principal infrastructure engineer for Legacy Legacy Health System.
01:26
I'm gonna add just a couple of quick disclaimers. I'm an engineer and now a public speaker. My opinions are my own and not necessarily of Legacy Health, or Pure Storage, or Nutanix, or some of the fantastic partners you've been able to work with for this project. I have forgotten the order of my slides,
01:44
so we're gonna learn together. So Legacy Health is Portland, Oregon's largest non-denominational nonprofit community health system. We run Portland's Tier 1 trauma center, Randall Children's Hospital, and the Unity Center for Behavioral Health, along with five additional hospital systems spanning about a 30-mile range up in the
02:08
southwest Washington. We have over 100 clinics and 100 locations, including medical imaging, GoHealth, and we have about 14,000 employees and 3,000 providers inside that network. We also offer research. We partner with Life Flight of Oregon and we run, uh, owner. We have an ownership of, uh, PacificSource,
02:33
uh, healthcare system. So our healthcare plans, excuse me. We serve 2.5 million Oregonians and Washingtonians. We have 1.6 million clinic visits a year, with 300,000 inpatient following that. You know, tens of thousands of surgeries, um.
02:53
And Legacy is unique in that we serve, uh, the most, um disparate proportion of Medicare and Medicaid in the Portland area. So 70% of our patients are some of the most vulnerable in our region. Um, we give $640 million back in charitable care because of that every year. We're about a $2.8 billion dollar company.
03:18
And I bring it up because, uh, of a lot of things going on inside this world that would greatly impact our ability to serve care. So keep that in mind as we talk through that. So I wanna quickly show you guys our journey with Pure Storage. Uh, this started long before my time at Legacy, uh, back with some good old M20 and
03:39
M50R2s. Uh, we're coming up on 10 years, and there are a couple of arrays missing from here because we've done some upgrades and replacements and migrations, uh, but we have been a very happy Pure customer for the last decade, or part of a decade. So our environment when I joined Legacy had
04:00
just gone through a major shift. We went from 180 racks down to 18 racks using a traditional three-tier model with a flash stack set up, right? Cisco UCS MDS, Pure Storage backing it. The best part of that system has been the MDS and the Pure Storage. Um, we had to replace our operational database, um.
04:25
Epic Systems. We're a large Epic customer. Uh, one of the things I've left out so far is that we provide Connect services, meaning that we provide Epic services to other hospitals and providers who can't afford Epic. Um, Epic has a 200-bed minimum if you guys aren't minimum or, uh, aware you cannot buy Epic if you're a smaller shop. You just don't have the organizational structure they require.
04:50
So we've been running on some B480M5 UCS blades. Pretty beefy guys, but the next upgrade was gonna push us over the limit. The really interesting thing about our flash stack and decision made long before my time was to run Microsoft Hyper-V. We were one of two Epic Hyper-V customers running operational database workloads, and the next one was under 3,000,
05:15
um, total sessions, so 3 million GP reps. So we had a project ahead of us, right, and healthcare doesn't move that fast. Um, we reviewed our solutions, we took a look at the data that we had available to us, and at that time, Epic had kind of changed their viewpoint. They had been a strong Intel supporter.
05:36
They had changed and seen great results with AMD. We ended up moving to an AMD-based environment for this, and we decided that we were going to stick and build upon some of the existing three-tier infrastructure we had with Pure Storage with that MDS, but just switch out our compute layer.
05:52
The other really big decision, and I feel like I, uh, wear this, uh, I feel like I gave myself a black eye when I say this, we decided to switch hypervisors. Um, we wanted to get in line with Epic's recommendations and we wanted to join the other 84% of Epic customers who use VMware. This was two weeks before the merger announcement, by the way.
06:16
So we had the absolute best of intentions going into this, and it went completely sideways. Broadcom would not recognize us as a customer, would not honor the licenses, the perpetual licenses that we purchased. This was in November through February of 2024, right during that, that key migration where they were pulling out the portals from HPE and from VAs to register
06:41
stuff. It was a nightmare. And we're sitting here going, we're going to have to postpone a major Epic upgrade and the possible presidents and the physicians are going to be breaking our doors down. Well, not my door, but my VP's door, and that's close enough. So we had some saving grace happen. Uh, there were some major improvements in
07:05
backend performance, uh, based upon some IRIS upgrades, which is the, the database platform that Epic runs off of. It gave us breathing room and we were able to take a step back and say, what do we do? Uh, we talked to our friends at Newtanics at the time, we talked to Microsoft and Newtanics let us know that they just,
07:25
they weren't there yet based upon the workload we were trying to do. We're about 10 to 12 million Gref shop at this point in time, uh, about 10,000 active sessions. So we went back to Hyper-V. Um, Epic was very unhappy with us, and they actually asked that we find a new plan.
07:48
So we moved forward, um, converted that environment, rebuilt it, um, and successfully moved to it January 1st, 2025 or January 12, 2025. But I had a wild idea, uh, based upon a technology that I had been interested in, uh, I'm a SNEA certified, um, architect, uh, Storage Network Industry Association, um.
08:14
So I stay kind of up to date on what's going on, and anyone who's familiar with NVMe over fabrics or any of the NVMe over, you know, TCP RDMA, knows that its baby brother NVMe over TCP has been gaining traction in the industry here and through conversations, uh, with my account SC at Nutanix here, Dave Weber, who's here with us, um, he introduced me to the performance lead at
08:39
Nutanix, John Koller, who can't be here with us today sadly. And we started to talk about how we can make this a reality because like many other customers, we had found a lot of success in Nutanix at our branch sites and in other projects we had used them for. But we still need to solve the problem of external storage.
08:59
I mean, Nutanix offered a great hypervisor management platform for us, but we were limited by the HCI storage at the time of this discussion. And it turned out Epic had the same question, and it was actually a conversation that one of their lead performance engineers uh had that really kickstarted this initiative. So, one thing that really helped with this, like all the pieces fell into place.
09:26
Pure started to support NVMe over TCP in 6.4.2. Nutanix AOS 7 was announced and coming out. It had major storage improvements and network performance in the background. So we decided to go to Epic, sit down with their performance team, and anyone who's worked with Epic knows that they have very strict guidelines that they
09:55
recommend you follow and they recommend, I mean, if you deviate from their recommendations, they'll still support you, they just will be very unhappy with you. So we twisted their arm, uh, Nannis joined us, um, and we had a, a very open conversation with their performance team that, you know, they were watching what was happening in the industry with Broadcom.
10:18
They were watching what was happening to customers like us who just got completely murdered in licensing costs, and we needed a path forward and we were willing to take the risk as an organization. to proof of concept this in our lab, I had some spare gear. Don't tell my boss I snuck it in on a quote. Um.
10:38
so my leadership wanted to provide an alternative for the other 238 VMware customers who, uh, anyone who's familiar with, uh, Galaxy knows it's one of the few places that there is a, uh, BBS board for Epic customers, and the number of people posting in there saying, what do I do? I got 3 times, 7 times increase in renewal costs. They're only doing 3 year renewals for,
11:02
you know, um, yeah, just, it's just crazy, right? We were. Stuck with every other customer. So I put my lab together, um, pretty simple setup. It's just 3 HP lower spec nodes, a pure XL 130 running some 652 code, Juniper 100 gig backend, and we use ACI top a rack.
11:28
Front end's not that important. we had some better code provided to us uh by the performance team at Nutanix, and we're running rail 9.5, right? so, from the pure side, it was as easy as scheduling a downtime, throwing in some network adapters, and porting it over to the Juniper switches.
11:49
Um, I'm gonna put an asterisk on that cause we'll talk about some of the things I've learned. We had some fantastic initial results here. And for those not familiar, a pure XL 130 I've learned from uh Thomas Whalen TWs with pure storage here, um, some shorthand in the Epic world. This is Epic specific, um, an XL 130 is good for about 330,000 IOPs and an XL 170 is good
12:17
for about 750,000 IOPs when running the Gen IO testing used by Epic to certify arrays. We hit within 5% of Pure's own internal testing using NVME over TCP connected to this, so. we had a lot of challenges, uh, to overcome as part of this, right? We had a really short window on how um how we
12:45
could prove, prove this out before that Hyper-V go live happened. This is all happening in like the December of 2024 time frame. The results were really promising. Throughput expectations were fantastic. Latency stayed below 0.5 millisecond across all of our testing with NVMe over TCP. Of our testing with NVMe over TCP.
13:07
But our CPU utilization, uh, we have those lower spec cores. When we were running our GenIO testing, we ran out of CPU overhead. That's one of the downsides of, uh, IP-based storage, right? You know, you have a little bit of additional CPU overhead. So we decided to come back and, uh, purchase a production spec node.
13:29
HPU was able to deliver it very quickly for us, which was fantastic and rerun those tests, which we'll talk about. And that asterisk I mentioned earlier, when we started to do failover testing, when you build an Epic production environment, you are very thorough about pulling power, making sure network redundancy works, make sure storage connectivity redundancy works as expected,
13:52
and expected live migration happens, right? You have really strict guidelines of how they want you to handle this really important operational database because it's everything Epic needs at its heart. It didn't work when we pulled those first cables out of the back of the hosts. We actually ended up corrupting some data in our test database,
14:14
right? Thankfully, but thankfully, uh, with, uh, Scott, my SE who's here to help, he put us in contact with the NVMe over TCP Engineering team at Pure. Team at Pure. And we were able to rewrite some of the documentation that is now public-facing for you guys on how the storage connectivity should look for these hosts.
14:38
So when you take a look in Pure diagrams today, you're going to see a really similar model to what I have drawn here. What makes this work from a storage connectivity standpoint. You know, our Rel VM lives at the very top up there. We have 4 separate VLANs with an IP address unique to each adapter
14:59
on each controller. So I got 4 ports on the Pure arrays. I got 4 IP addresses. Those pass up through my LACP groups up into my Nutanix HV host, and then it's just handed to the guest OS, right? So, inside the guest OS I have a guest initiator. And that's what does my storage connectivity.
15:20
This is different than the announcement that happened a month ago. Next, they're talking directly to the hypervisor. You can go and allocate storage via storage container. I didn't have the time. They weren't there yet. We're looking at this fall,
15:35
winter for that to be a viable solution with Pure, and I'm really looking forward to testing it further, and we'll talk about that more at the end. So, here's our GenIO notes, results, a really boring spreadsheet. The really cool thing to me is we tested within 1% of our fiber channel environment running on Microsoft Hyper-V 2022.
16:04
We performed better in key areas like Wid performance, which is database update performance. So at the end of the day, we were faster on NVME over TCP on the same hardware stack. And based upon our success, we were given the green light to move forward with this solution. Our leadership came about with an absolutely insane request that I don't think any
16:33
healthcare company in their right mind would move forward with. We made this decision January 1st of this year. January 12th is when we cut over to the new Hyper-V environment. And March 12th is when we recycled those Hyper-V nodes into Nutanix hosts and moved our production environment over. We've been live since March 12th on that
16:57
environment. And we did it in 3 months, and this was a year project before that. We cut over in 1.5 hours. We had a 4 hour window to do this work. I had to do some shell games with the hardware. I had just enough hardware between my lab and the new production spec node that we had where
17:19
I was able to. Add, remove, subtract nodes, reimage them as AHV and throw some disks in them just for the local OS. And bring everything online, and it worked flawlessly. Following our testing, at this point in time, our growth has kind of caught up with us.
17:39
We're about 15 million Gref shop on average now. And the chart on the right there in the middle is the bar where it cuts over. The only difference you see is about 1 to 2 second improvement in one of the response times. So, anecdotally, I've had a lot of reports from physicians, From MA's, uh, my wife works for one of our connect partners.
18:08
Epic got snappier. I don't know why. I cannot back it in data. But I constantly have people telling me that Epic is faster now, and it's, that's the impact we make at the end of the day to these guys' lives, right? Because when a patient comes in, the faster
18:24
they can get them into Epic, they can look at their imaging matters, right? My own grandmother was a stroke patient, she was 101, she was the oldest person in our stroke unit, unit. It was because of the fast response that she lived to 103.
18:40
So there was one unexpected result, I'm gonna call it an unfortunate result because we were also focused on the production aspect of this that we didn't think about what would happen during backups. When we moved to the Hyper-V environment, our environment got faster, but it did not get this fast.
18:59
We maxed our array out. We didn't take anything down doing so, but we hit 6 gigabytes per second throughput over those 100 gig links out of that pure array, which is just awesome, right? I had to go back through and put some throttling on all of my LVLs, right, to slow it down.
19:19
And we were down to like a 2.5, 3 hour backup window from 11 hours on the original environment. So that was a big win, right? Say that again. Yes. I would have to ask my Linux admin, but you can, if you find me afterwards, I'll have my badge to show you,
19:40
but I'm happy to put you in contact. So we completed this entire thing during that shell game, right? That was fantastic, but. Yeah, that's it, really. I mean, I'm happy to answer questions for you guys, but I want to stop real quick and just thank the partners who are here in this room
20:06
and for the people who weren't able to come. We had such fantastic support from Newtanics, from our account teams, from Epic, and from my internal leadership for supporting this kind of crazy idea. At the end of the day, I think we've delivered on their goal of providing a viable solution for healthcare companies who are of our size and larger when they're faced with these crazy
20:31
Broadcom bills. Um, this would not have been possible, if anyone works directly with Epic, please thank Gavin Edwards. If you ever have the opportunity to talk with him. Gavin was the force inside Epic. He was our internal
20:48
voice of reason for Epic to allow us to move forward doing this. And we couldn't have done this without Pure Storage and the support that we had from them. So we just can't thank them enough, and for being here today to tell you guys about it.