Talking DSPM: Episode 3 - Omkhar Arasaratnam

00:00:03
Dr. Mohit Tiwari:
Omkhar, it’s a pleasure to have you on this interview.

00:00:06
Omkhar Arasaratnam:
Thank you for inviting me.

00:00:07
Dr. Mohit Tiwari:
Just as a very quick introduction. I know we met through when you were at JPMorgan. You’ve since been at Google. Now you’re running OpenSSF. Especially OpenSSF and Linux Foundation, if you could just give us an intro to what it is and what the mission is, that would be awesome.

00:00:24
Omkhar Arasaratnam:
Absolutely. So at the Open Source Security Foundation, the OpenSSF, we’re here to make open source more secure. Open source software, as you are aware, is contained within 90% of software, if not more, out there. Everything from your mobile phone to satellites to internet routers to Mars rovers. Open source is everywhere. And as it is such a critical part of our infrastructure, it is very important for us to ensure the security and safety of that software. Open source itself is quite a distributed development environment, and as a result of that… for the last quarter century I’ve been working in various forms of software engineering through large organizations, and you always have this top-down SDLC that you can rely upon to ensure that the right security and variance are always present through your code. With open source, it’s much more organic. So we provide the right tools and create the right incentives for all community, members of the open source community to contribute code in a secure manner. And sometimes we do that through providing infrastructure like SIG store and scorecard and other OpenSSF projects. Sometimes we do that through education. So we have a free course, LFD 121, which is available to everybody to learn how to develop software securely. Sometimes we do that through public sector advocacy. Just last month I was at the United Nations talking about how to create not only secure software, but a safe community around that. We’re frequently partnering with government here, being federal organizations like CISA, the National Security Council, the Office of the National Cyber Director, but also internationally. So we’ve done a lot of work with the Japanese government, with the European Union, and we believe that there’s three primary stakeholders that are required to participate in order to make open source software secure. It is the private sector, so member organizations such as Google and Microsoft and JPMorgan… the public sector, some of the colleagues that I’ve mentioned previously, as well as the open source community. So we provide convening, tooling and education so that we can all enjoy secure open source software and we can produce the right infrastructure for everyone.

00:03:00
Dr. Mohit Tiwari:
Amazing. That’s awesome. One direction that I was wondering here is it seems that Linux foundation or OpenSSF specifically can take a large, complex, many stakeholder area and kind of bring order, distributed order, to the chaos, right? The field that we are in and you’ve championed and worked in for a long time is also similar. Now there are many stakeholders. Traditionally there have been many different ones. There’s a lot of noise and we are sensing that data security and all the different associated pieces,
there could be value to bringing order to that. Is this something that, colloquially speaking, ‘dspm.org’ or ‘datasecurity.org’… is this something that could live under OpenSSF?

00:03:46
Omkhar Arasaratnam:
Absolutely. So we believe open source isn’t just availabilityAvailability is a key aspect of information security, ensuri... More of source code. It is the right licensing to allow for open collaboration. But it all begins with community, and that’s what we do our best to foster at the OpenSSF. How we do that is bringing things under neutral governance. There have been, without naming names, there’s been a long history of organizations, single organization open source projects that will pivot their licensing on a dime or take their open source project in a different direction than maybe the community desires. So while the code is available, the openness, as in the OSI definition of open source, isn’t respected. By bringing projects such as those to a foundation such as the Linux Foundation, OpenSSF, one… you have neutral governance. So it is no longer the autonomous control of one organization, but rather a distributed accountability across many parties. The second, as a precondition to bring a project within the OpenSSF, it first has to be adopted by one of our ‘workgroups,’ which inherently provides a multi-stakeholder set of interested parties that are going to foster a community around this project. Once that’s been established, the project can then be brought to our TAC, our Technical Advisory Committee, in order to get approval as an open source OpenSSF project. Once approved, there’s many benefits that these projects enjoy. So we recently, as of this fiscal year, created something called “technical initiative funding.” So once you reach a particular level of maturity within the OpenSSF, you’re able to apply for funding, and that funding can be used towards… just to recall some of the recent… some of the recent funding approvals we’ve provided for our technical initiatives could be used for cloud credits. So if you’re doing some experimentation or if you’re running a public goods service, it could be used for technical documentation. We just approved a technical writer for one of our projects. It could be used for better engaging the community. So we have one of our technical initiatives that is sponsoring early career folks to come and work on the project as a paid fellowship. So there’s a number of different ways that once the project is adopted by the OpenSSF, that we provide many resources to the project. Of course, the standard accoutrement of GitHub organization and swag and logos and marketing effort and all that. But we… I mean, we’re a nonprofit. We’re not here to make money off of it. We quite literally put all the funds that come in back into the technical initiatives.

00:07:02
Dr. Mohit Tiwari:
Amazing. And as an outcome, can the community then look forward to having standardized ways of benchmarking or evaluating, kind of giving order to the field as well? Like I know in code security, there are similar topics, data security feels very similar in that it’s very noisy. But is this something that the
community would also be able to…

00:07:23
Omkhar Arasaratnam:
Oh, absolutely. So in addition to being able to create de facto standards through these communities and organizations, there’s also a number of efforts that we put into making de facto standards de jure. So as an example, in Europe in particular, they have this legislation called the “Cyber Resilience Act.” Part of that is the European Parliament and Commission and member states being able to find international standards that can correlate to the policy statements they’ve made within the regulation. As part of this, one of the things that the OpenSSF is doing is we’re embarking on a program to take our projects that do have standards and put them through ISO certificationIn the context of cybersecurity and data privacy, a certific... More. In doing so, you have… I don’t think, as… and, I’ve worked in international standards for a very long time, I don’t think a top-down push to say “thou shalt” always works. I don’t think a bottoms-up kind of organic growth in standards is always perfect either. But being able to do both, being able to build that community and then ratify it as an international standard, is how we get things done. If left unto, you know… without the kind of guidance and shepherding that we can provide, it may not get to that level of adoption. But I think that’s definitely an area that we can partner in. And perhaps diffu… or diffuse is the wrong word, perhaps provide the right scaffolding or structure without having to rely on analysts coming up with interesting acronyms and buzzwords for their bi-monthly reports. Right? This should be something technically driven and ratified by engineers and community, not necessarily by marketing and analysts.

00:09:22
Dr. Mohit Tiwari:
Shifting gears into the data security side of things, folks may know already that you champion and led data security initiatives at JP Morgan and running regulated cloud at Google, just stringent… most stringent measures. We had a talk at CloudSec on the Google team talking about the regulated cloud, which is amazing. From your lens, if a large company, Fortune1000 company, is setting on this data security journey, what are the sort of milestones they should look forward to?

00:09:55
Omkhar Arasaratnam:
As with any of these things, it depends, and forgive the consulting answer, but there’s… I’ve been… I’ve been very lucky in my career to have worked at incredibly large organizations. When working in organizations that size, it forces you to think about scalability right out of the gate. When I say scalability, I don’t mean like hundreds of thousands of transactions per second, or typical performance or distributed system scalability, but organizational scalability. When you’re trying to deploy something like data security in an organization with hundreds of thousands of employees and millions of customers, you need to be very thoughtful as to how you are doing this. So at JP Morgan… and for reference, JPMorgan Chase is about 300,000 employees. The Chase retail bank itself, which really only operates within the US, has 60 million customers. It’s a lot of people. Being able to… to the point that you were making earlier, when there’s a lack of standards, to come up with a flexible yet deterministic way to govern who should have access to the right data at the right time is a really vexing problem and a challenge in which some of the traditional methods of doing this, like “use data classification…” or “use RBAC…” rapidly fall apart. To pick on those two kind of concepts for a minute: RBAC by technical nature, can only create inclusive sets, and the moment that you want to refine a role, you have to create another role. As a result of that, what I found, having worked in many large organizations, is you’re often faced with about ten times the amount of roles as people, which means, if you recall what I said earlier: 300,000 people… 3 million roles… That’s a lot of roles. It’s not realistic for somebody, for a human to reason over what the right role is, and that in itself leads to a person creating yet another role, because they don’t want to pick the wrong role. So we quickly migrated to an ABAC, an ‘Attribute Based Access Control’ model where we could create in a constrained language set a relatively simple rule that could be used to assert who should have access to a particular data element. The other thing that worked in our favor, and these are things that you really need to focus on with large organizations, is find where there’s already an organizational tailwind to go and execute or resolve a business challenge. In our case, there was a very strong impetus from the Chief Data Officer to have the adoption of a standard data taxonomy across the bank. So we leveraged that from the perspective that we were able to provide them a programmatic method of identifying data tied to the taxonomy that they wanted to enforce. In doing so, came up with a very easy way, rather than engineers having to maintain architectural diagrams, that we could basically have them annotate their code in order to say: “this data is this type, this data is this type” and then as it went through the build pipeline, just automatically generate the data catalogA data catalog is an organized inventory of data assets with... More. So that allowed us to do this ‘ABAC’ kind of work. The second thing that cropped up very quickly at this scale is traditional data classification falls apart incredibly quickly. This notion that you can kind of subdivide data into public, private, confidential, highly confidential… whatever you want to say, that was born out of defense networks and when you had physically discrete cables telling you, you know, what you could do. That worked well in those kind of operations where there was very clandestine and enclaved kind of work going on. That doesn’t work in corporate America. We used to do this raise of hands when we would do our roadshow to get internal stakeholders bought in to adopting our technology, which was to get a raise of hands as to who knew whether a Social Security Number was ‘confidential’ or ‘highly confidential’ and the room would be about 50/50 split. And as amusing as that is, what that really reflects is that the framework, or the ontology, doesn’t serve the purpose. What we determined a much more effective method was, getting back to the Chief Data Officer’s objective, would be to authoritatively tag each element of data and then to come up with those data rules that I was mentioning earlier to express access controlAccess control is a process that restricts access to resourc... More. So we wouldn’t have a data type called ‘PII’ because that’ll be based on whether you’re in California, whether you’re in Europe, whether you’re in India, whether… and rather, what we would do is we would tag all these elements. So this data structure has a first name, last name, address, us, Social Security Number… whatever. And then we would create a data rule that would say, for CCPA, it would mean a first name and last name or address, and… and we’d build a boolean string that would express what that legislation represented. And in doing so, we had this incredible balance between having a very deterministic outcome that a computer needs, and being flexible enough to scale across millions of customers… hundreds of thousands of employees… in a way that made sense. Now, there were also a bunch of interesting distributed system challenges that we had to work through, but from an organizational perspective, that’s key. And you asked what some of the tripping hazards were with large organizations… one of the tripping hazards is oversimplification. If you try and too narrowly scope your initial foray into data security, you’ll find that you have to refactor the entire thing by the time you do your second phase. So being able to balance the typical “analysis paralysis” versus execution, is more art than it is science. When I was doing regulated cloud at Google, the other challenge that we encountered is often regulation, especially stuff to do with, say, defense or some of these other highly regulated industries, aren’t written by engineers. Being able to find the right balance between, again, a very deterministic computer science, heavily driven infrastructure and law is one of these kind of negotiations. And through that negotiation, we had a lot of very productive discussions domestically here, with some friends over in Europe as well, in order to have a meeting of the minds between how lawyers will legislate and how computers will actually act. There was… to use one example, and out of politeness, I’ll leave the country name out of it. We were speaking with a cyber agency within a nation, and one of their major concerns, as many public sector organizations kind of default to, is they wanted to make sure all their data was present within a particular geographical boundary, down to the fact that even firewall rules should not egress that country’s boundary. And we were able to have a very productive discussion in which we reasoned over the fact that while they may feel this very nationalistic desire for that, one of the benefits of migrating to Google Cloud was rather than waiting for us to stop bad traffic at the ingress of that country, we could stop it wherever there was a Google point of presence on the planet. And that’s how Google serves many, many terabytes a second to the internet, not by allowing everything to hit a geographical boundary and then make a decision over that. While this is one example, what this comes back to is that stakeholder… bring the stakeholders along for the ride. So whether it be getting the right stakeholders from a business perspective with a business problem that you’re trying to solve… from a regulatory perspective, working with the right regulators and legislators… and most importantly, working with your engineers. If you build an SDK, or you build tooling, that engineers don’t want to use, they’re not going to use it. There has to be some kind of win for them. What are you doing to make your developer experience better? I was… our initial foray at JPMorgan was to make developers lives easier when it came to cryptography. And rather than any complex kind of data rule or data classification or whatever it was… the first problem we solved was building an SDK that had two methods: encrypt, decrypt, and a bunch of protected classes. This was within the internal JPMorgan Java Spring Boot framework, and that was it. And the reason that developers jumped on that so quickly: they didn’t have to worry about key management anymore. They didn’t have to worry about ciphers and key strength and all of… KEK, DEK, key hierarchy, key span… They didn’t have to worry about anything anymore. And most of all, they didn’t have to worry about getting audit failures for not doing that properly. They could just adopt this library and they were off to the races. So as we think about engineering for engineers, the best security isn’t done through clipboards and checklists. The best security is done through giving our developers opinionated, easy to use security tooling so they don’t have to worry about the intricacies of how to do cryptography correctly, or how to do authorization correctly, or how to do authentication correctly. They can worry about the cool app that they’re building or the new feature that they’re implementing. And to take that cognitive toil out of having to figure out how to do security properly is how we best serve our developers. In fact, there was another lesson… I was talking to my buddy Christoph Kern from Google, who’s been in the internal security organization for years. And one of the ways that Google pretty much eliminated cross site scripting through every single app that they have is by simply using a well vetted input validation library rather than each developer having to figure that out themselves. Nobody wants to reinvent the wheel, they just want to build the next cool app.

00:22:14
Dr. Mohit Tiwari:
This is really good. So if I’m a company and I’m orienting on data security, identify some business user that has some tailwinds at the top. Identify what tooling can I make so that you have buy in from, not at the clipboard level, but at the implementation layer so it scales as… getting both these ends sorted out is a priority. One relation. I noticed that you mentioned there’s a lot of focus on developers and how they protect customer data, which is a really important and large segment. There’s also like corp environment. We’re seeing a lot where you have Microsoft Onedrive type environments, or Google Drive, and lots of different Personas in the company who work with the data and share it and so on. What principles poured over into the corporate environment?

00:23:02
Omkhar Arasaratnam:
Yeah, the corporate environment was much more difficult. And even at JP, that wasn’t something that we had… especially for office productivity tools, that was hard. And the most difficult part of that, of course, was having to contend with unstructured data. So at least in an application environment at some level, you have a structural semantic representation of what that data is. If you have a document or a presentation or a spreadsheet or email, that gets pretty knotty pretty quickly. We had some ideas for how to reason over that, extending the same idea of data tagging and then inferencing based on the content. But it was always imperfect and it was always relatively difficult. The other reason that we didn’t optimize for that first, other than being a technically challenging problem, from a regulatory perspective, the most important data, certainly from the perspective of the Chief Data Officer, was what the bank called “reference data.” So if you were making a trade or a loan based on a particular risk calculation, the authoritative data, the reference data that went into determining whether you should make that trade or loan, was the most important thing, not the email you had in your inbox from Bob. So whether it be an email from Bob or a spreadsheet from Sally, that was kind of irrelevant. What really mattered was the lineage of that reference data. So that was another reason that we didn’t focus on it. And to get back to the comment about tailwinds as the big business problem was traceability of the reference data. That’s where we focused first. So I don’t have a good answer, I guess. It is definitely a difficult and challenging problem, and perhaps one where there’s opportunity for the industry to innovate.

00:25:03
Dr. Mohit Tiwari:
That makes sense. It’s actually really interesting also because these are environments where folks are layering on Copilots, which takes the so called confused deputy. He’s like a deputy that you have, while that’s going around, ranging over unstructured data, surfacing it for others. There’s an interesting open problem, pretty much.

00:25:24
Omkhar Arasaratnam:
I think there’s a lot of interesting work in that space. I think there’s a lot of interesting opportunity.
There’s also potential for pitfalls with hallucinations and whatnot. But I was on stage last year at Defcon and one of my co-presenters, I think it was Matt Knight from OpenAI… Well, there’s a lot of kind of hype around LLMs and what AI can provide today said, whatever your opinion, just remember this. As of today, right now, the most, you know, ridiculous hallucination that you’ve seen coming out of an LLM, remember, that’s worst case scenario. It’s only going to get better. It’s not like things are getting worse. So if we think of the potential, I think it could be great. I also think the market right now is contending with what us as cynical engineers have kind of known for a while, which is, I’m sorry if this is provocative, but AI is not going to solve everything. And we’re kind of getting to that trough of disillusionment where people are realizing where it’s actually applicable. My friend Mark Russinovich from Microsoft, who’s the CTO of Azure, said, you know, what people view as a negative in terms of hallucinations can actually be quite helpful if you’re writing a script or writing a book or coming up with an interesting paper, because that kind of creativity can help make for a better read and figuring out the right ways and places to leverage this. Whereas people are trying to throw LLMs at math problems. No, math is pretty well structured. There’s good ways computers can do these things. Maybe we’ll go with the traditional ways.

00:27:05
Dr. Mohit Tiwari:
That makes sense. Another interesting direction, I think you mentioned this earlier, was also authorization is a thorny long term problem. We’re trying to understand the data and get it all mapped out tagged so that we can apply either compliance driven or security driven authorization, which means there’s some implication that it has to be tied to identity systems. How would you think of that? Because in the industry at least, it seems like data security and classification, etcetera, has been traditionally very separate from identity, quote unquote, security and policies kind of stop at the roles. They don’t really push all the way to data. Is this something that you see? There are ways people have solved it, or are there any leads that you should…

00:27:53
Omkhar Arasaratnam:
So I don’t think it’s been well solved. I think the idea that these two were kind of bifurcated to begin with is a failure mode of our industry. At the end of the day, the primary idea about authorization is ensuring the right people have access to the right data at the right time. And if that’s not how you’re operating, then the entire RBAC that you were concocting for your organization was all a fallacy. So I think as we evolve, being able to balance, whether it be ABAC, whether it be RBAC, but being able to provide a performant way of reasoning over the right people, having the right access to the right data, and ensuring that that is one of the primary invariants is key. And the reason I say one of the primary invariants, what you see in non-holistic authorization schemes, is a scheme where… or a scenario in which a particular type of data, you have access to it in a particular application, and then for some reason you don’t in another application when you should, or conversely, due to poor authorization rules, you get access to the wrong thing, or a thing that you shouldn’t have. And I think where this actually comes back in terms of LLMs is being able to ensure that your prompts don’t inadvertently give you access to answers you shouldn’t have. So if you have something like an enterprise Copilot, you shouldn’t be able to say: “what’s so and so’s salary?” unless you’re appropriately authorized. But yeah, I think figuring out the right and scalable ways of doing that, and to give a specific example, the more elaborate you get with ABAC rules, while they may be easy to reason over, from a distributed systems perspective, they become very expensive at runtime. If you are going to query all these policy information points before the policy enforcement point can make an authorization decision every single time, your system is going to collapse. And you have to be thoughtful about things like either coming up with data structures that allow you to cache some of this in a reasonable manner, like: job titles don’t change on a per transaction basis, you can probably cache some of that overnight. But finding the right balance between data freshness, caching and cap theorem when trying to do these runtime queries is another one of those scale challenges. And I like this point that you sort of have to try to bring these two systems together. You must, yeah. It blows up the complexity as well. That was a provocative statement about the failure mode. So one of the just two quick questions, just throwing all the way back to OpenSSF.

00:30:43
Dr. Mohit Tiwari:
As a founder, I wasn’t aware that a cute little startup should be also talking to OpenSSF and figuring out how to get engaged. What would be like a good takeaway for other founders, whether they are small or medium companies, not just the Google and JP Morgan, of how to best engage.

00:31:24
Omkhar Arasaratnam:
So we have over 120 members, not all of them are huge mega-corporations we’ve got startups, we’ve got academia, we have large organizations as well. And as I mentioned, it’s going to take everybody working together on these very challenging problems for us to have success. The best way to figure out how to get involved is to go to openssf.org/getinvolved and we have a number of different ways. It is completely open to everybody. You can participate on Slack, you can join workgroup meetings, join our events. We have a number of events throughout the year. Typically a big event in North America. This year it was in Seattle in April. A big event in Europe. So in September we’ll be in Vienna. A big event in Japan. So we’ll be in Japan at the end of October. We’re also having an event in Atlanta in mid-October called “Fusion.” The reason we’re calling it fusion. So we’re here right now at Blackhat, and Blackhat brings a lot of the CISO, corporate security professionals… It’s a trade show, much like RSA. Defcon brings out a little more of the hacker community, the researcher community, and we’ll enjoy that later this week. Fosdem on the open source side is this wonderful conference that’s in Brussels at the beginning of the year and really brings the academic open source kind of community together. But what was missing was this intersection between the open source community and the security communities. So Fusion will bring the two together. That’s going to be in Atlanta and also is going to bring policymakers together and all members of our community that are interested in working together on this. Open source, as I mentioned, is ubiquitous. As it is everywhere, it’s upon us as stewards of open source to ensure that it’s secure for everybody.

00:33:37
Dr. Mohit Tiwari:
Amazing. Looking forward to it. One final question as we take off. Is there anything that you’re super keen about at either Black Hat or Defcon this week? Any session or talk or any team even…?

00:33:49
Omkhar Arasaratnam:
I have to plug AIxCC, the AI Cyber Challenge. We are challenge facilitators. It is a challenge from DARPA. And the AI Cyber challenge has this thesis that with this kind of distributed development environment of open source software, some of our traditional security tools aren’t doing that great a job in terms of static analysis. What if we used Large Language Models AI to help make open source more secure? So this year in proper DARPA fashion is the midterm. Next year will be the finals. And we are very excited because as the Open Source Security Foundation, the winning solutions will be open source to us as stewards next August. So I’m very excited about that. But what I’m most excited about is meeting up with colleagues such as you that I haven’t seen as regularly as I would have liked. But this is a big… this is like a family gathering for the community, and I’m really happy to participate in it.

00:34:57
Dr. Mohit Tiwari:
Thank you so much.

00:35:00
Omkhar Arasaratnam:
It’s a pleasure. Thank you.

Talking DSPM: Episode 3 – Omkhar Arasaratnam

Innovate with Confidence With Symmetry.

INDUSTRIES

COMPANY

KEEP IN TOUCH

Sign up now to receive the latest on modern data security from Symmetry.