Customer Interview with Go Endo, DevOps Manager at UnitedHealth Group (Optum)
Could you tell us about your role at UnitedHealth Group (Optum)?
Sure. My current role is the Architect for the DevOps Team. I focus on improving our processes, systems, and approaches on the Medicare and Retirement Team at Optum. Our team is responsible for the AARP digital experience.
Day-to-day I work with teams, applications, and systems to help people be more effective when building software. I set the direction for where my team (DevOps) wants to go and help the organization set the general direction with new technology, new ideas, and a general approach on how we adopt them.
Where did you start your career and how long have you been working at Optum?
I've been with Optum for eight years now. Before Optum, I worked for McKesson for 10 years which is how I got into the medical field. Before that, I worked for a game development company. I've been a coder for most of my life.
What sort of games and languages did you work on?
I worked on children's games. Back then it was mostly C and C++. I also worked on some of the proprietary languages for Apple Media.
Wow. That's awesome. In terms of Optum, how many engineers work there?
Oh, that's a lot. So Optum is divided into many different divisions. I work for Medicare & Retirement and we are one of the bigger divisions. Our total team size is about 150 people. About 100 developers, product owners, architects, and QA. Beyond our team, there are dozens of different divisions, like Optum Bank, OptumRx and the list goes on. In terms of total engineers at Optum, I can’t say exactly…a lot!
What are the main consumer-facing projects that you are working on? Is it mainly consumer-facing applications that you are working on?
We manage the AARP website. Our division is responsible for the entire shopping experience. So basically UnitedHealthcare subcontracts all of the Medicare products for AARP and to help them sell more effectively, we manage the sales which happen online and through traditional sales channels. We also maintain the sister site, United Healthcare where we sell the Medicare and Retirement products ourselves directly to consumers. So those two are my main consumer-facing products.
Do you know roughly how many people are using that website in a given year?
Wow. That's a big number…The best way to think about it is that AARP has around 60 million customers. All of them can access the site, but not all of them do. Our business is not just digital. We have a conventional sales channel too for sales. And 60 million is our total customer base. It’s safe to say that it’s in the tens of millions of people accessing the site.
Wow, that's cool. In terms of engineering software and innovating new products, what are the biggest challenges that you commonly face?
So our biggest challenge and our motto is “reliability”. We cannot go down. It is like a banking business. If the bank’s website is down, they lose money. We are similar. If we go down, we get penalized by the government because we are selling Medicare on behalf of the United States Government! If we go down, not only do we lose money, but we lose credibility with the government. So it becomes a huge challenge to keep our system up and running and provide the appropriate content. Those are the two biggest challenges we have. We are heavily regulated.
Wow, that’s a huge responsibility! So do you have standardized processes in terms of engineering practices to keep reliability high? CI/CD processes, things like that?
Yes, we do. The challenge is that Optum is a really large company and what works for our Medicare Retirement Team may not work for Optum Bank. So Optum tends to set a loose guideline as to the way we do it. And each organization gets to decide how much of that we adopt and how much of that we change for an approach that is better for us as a team.
So you can steer your own path to a degree?
Yes. And that's where my job comes in. I evaluate the corporate direction and say yes, this is the way we want to go in some areas. In others, I say, okay, this is a good idea, but we've been doing things differently and are better off staying with our approach vs. adopting the corporate direction. Another possibility is making changes over time where we lay out the roadmap to convert our way to the more corporate way or vice versa. It’s constantly changing.
In terms of building and deploying your software, how does that happen?
We focus on Maven and then we use Jenkins. We are going to deploy directly using Jenkins going forward. We are also heavily invested in Azure. So the Azure CI/CD might come into our ecosystem as well. Adobe is another technology we work with closely. We are moving over to their platform which has some CI/CD capabilities too.
Beyond these, what other technologies, tools and/or platforms are you guys using?
As for other tools, we use Dynatrace as monitoring on our team. There are other tools for monitoring that are offered by the central corporate team. They are moving towards Elastic. This is one of those areas that our team (Medicare & Retirement) says, “No, no, no. We are going to stay with Dynatrace” because we are so entrenched in the technology.
Oh cool! We have an integration with Dynatrace.
Yes, we are very excited about the integration. We have this metric dashboard in Dynatrace that shows us the very dynamic nature of our digital business. Things like how many people are accessing “X” page, how many seconds are being spent on, the steps they take to enroll, etc. We want to be able to see if a user has gone through certain steps to buy our product. Why did they stop at the fifth step and not complete it? And that's a huge question. This analysis shows us where we need to improve. That's where Flagsmith can come in. When we start going into A/B testing. We want to create different paths to purchase and measure the success rate of Path A vs. Path B.
So what made you decide on Flagsmith?
So my journey with a feature toggle started like four years ago. Feature toggles aren’t really a new idea at Optum, but it used to be very clumsy. We had this configuration file on the server that we would upload. We had a monolithic application and then we just managed that at deployment time. But that was painful! Some of our deployments were 48 to 70 hours long. If something goes wrong, everybody has to stay up and try to fix it. So eventually we were fed up with that model. Can we have a better way to control our application? This problem became a bigger issue when we started moving to a microservice architecture. Now we were controlling two dozen applications. And when we started having a redundancy server, you're talking hundreds of change points and it becomes unmanageable.
People start saying, okay, we got to be able to do this. So we talked about things like a zookeeper for the configurations and so forth, but it was still very complicated. We had a few proprietary things that helped, but it didn't solve the problem.
So I said “There is a product that does this for us. Why spend hundreds of hours developing stuff that other people can do for us? And it's already established?” So my journey started to look for a feature management tool. We started looking and came across LaunchDarkly and a few other providers that came up in my search. We started looking into that, but Optum generally prefers open source. Not just for the cost and option to self-host, but for the transparency. If something breaks, we know we can look at the source code, we can take a look at it. We're not dependent on other people to fix it for us. If there is a security concern, we can look at it and say, “nope, there isn't a security concern. Or, there is!”
That's why, you know, we always push for open source and that's why Flagsmith won out over the other solutions because you guys support enterprise, you guys support SaaS and you are also open source.
Am I right in saying that your kind of workflow; has changed quite a lot since you started using feature flags?
Yes. One of the biggest changes was last year during the annual enrollment period. We call it “AEP” and that's our business time. We are regulated that we can only sell our product at a certain period of time of year. We cannot sell our product before that. If our product leaks early, that's a violation of the contract. So when the certain date hits, our products must be presented to the customer at the right time and the right content/offer. If something is wrong, we get penalized. If the digital shopping experience isn’t working or is active for customers to purchase outside of the Annual Enrollment Period, we lose money. That's by contract.
Traditionally, we had a lower environment where we test everything as if we are within the future AEP Dates. So we will set the server to that future date in the environment, but that causes all kinds of issues. How are you going to deal with things that are out there in the cloud. We can't set the cloud date. We can't set the CDNs date either. So as our technology expanded out away from our data center and more into the cloud that sort of a testing model became very, very difficult.
And of course, you’re still testing on staging. Once you push this out to the production, who knows if it's gonna work. Flagsmith changed that. We now push things into the production early without breaking our contract. Even if it's in production, Flagsmith hides that for us.
So the general public would never know, but we can go into the production environment and test it as if it. And all of a sudden we can prepare for the AEP a month ahead of time. That was huge for us and our QA Team loved it.
Oh, that's awesome to hear! Flagsmith really helps with the quality and stability of your application as opposed to just moving faster, right?
That's correct. In fact we can't move fast because if we move too early we get dinged for it. At the same time, we can't stay behind because if our application isn’t ready, we lose. It doesn't have to be fast, but it has to be on time.
I guess feature flags can give you a lot more confidence around that side of things?
Exactly, because we can build it early and make sure it is correct.
The term we throw around in our team is Shift-Left. We want to shift everything early, early, early; but there is a limit to what we can do. We can shift left, but we can’t publish to production…with Flagsmith, we can. We are now realizing that production is not a sacred area where we can’t ship anything. It is an environment that is more open where we can ship new things early as long as they are behind Flagsmith.
It's really interesting. It's as much about peace of mind as the speed of the speed of engineering.
It's actually all about confidence. That’s what I sold my management and QA owners on. I asked them “when we release software, how much confidence do you have?” That number wasn't very high. Because no matter how much testing you do, the moment that hits production there is a doubt behind in our minds that says, “is this gonna actually work?” And that lack of confidence causes a release to be overnight or all day or even multiple days. Flagsmith has been the driving factor to increase that confidence.
Beyond increasing your team’s confidence, has Flagsmith also helped to increase the number of releases your team is doing?
Yes! We are now doing a release every two weeks. The confidence that gained from Flagsmith actually led to us being able to reach the number of releases that we were hoping for.
What’s next for your team?
Now that we’ve been able to leverage Flagsmith for testing in production, we’d like to now start using Flagsmith for A/B Testing and Feature Rollout capabilities.
Go, thank you so much for your time and for trusting us with your business. We wouldn’t be here without you!