Security and Privacy when Building Great Products

Knowledge / Inspiration

Security and Privacy when Building Great Products

Continuous Delivery
UXDX USA 2022

While often tackled separately, security and privacy are two sides of the same coin and offer external implications for the company and internal complexities which leaves it as one of the biggest challenges facing tech teams today. Frederic will give key insights on how Dashlane think as an organization about privacy and how internal decisions have translated to how it is delivered to the customer. Federic will provide scenarios on how privacy and security are reflected in the technical stack.

Frédéric R

Frédéric R, CTO,Dashlane

My name is Frederic Rivain. I am the CTO of Dashlane. We built a password manager to help you manage your digital identity in a safe and user-friendly way. And today, I'd like to talk to you about how we think about privacy and security at Dashlane and why I feel it is really important for you to build great products.

We were founded 12 years ago, and we have a distributed team globally, mostly between France, Portugal, and the US. And our products are both consumers, the B2C, and the enterprise world B2B. And just to give you a sense of scale, we have around 20 product engines, agile teams. When we think about privacy and security, it is really about the two sides of a coin for us. You can't have one without the other. And in our case, because we store very sensitive data for our customers, the logins, their password, their payment information, it's obviously mandatory to ensure that the data is really secure, really private, and only accessible by the users themselves and the product like password managers, in general, must be built by design for privacy and security. But even though this is true for password managers, I believe it applies also to most products to some extent. We all store sensitive information on behalf of our customers, whether it's medical information or payment information and so on.

So let's start with the perspective of privacy. Firstly, the important thing is that you need to be clear and explicit about what data you have and how you classify it. So, as you can see on the screen, we have three main categories. The first, is the customer vault. That's really the core of our product. It's everything the user stores is in there and obviously it's critically sensitive. Then the middle layer is the user data and we have two flavors of the data. The account data is things like the email which you use to create an account, and we try to minimize the number of data that we have on that form. Like we don't want much and we shouldn't have much because it's what you call PII personal identifiable information. And then the second flavour is more traditional event data, the logs that we collect about the users of our product. And yet again, we try to be frugal about it. We do need some of it to operate, but we want to minimize it and we just want to aggregate what is needed for us to improve the product, do marketing, and so on. Finally, in our case, we have what we call activity logs. They are the logs related to how our users are using that and on a daily basis how they interact with the websites and the online services. Of course, in that case, we don't want to know what each specific user is doing. I don't want to know that you're using Amazon that you're using Dropbox or so on. So we consider those logs as sensitive data and they are anonymized and aggregated. But let's do a bit of a deep dive into each type of data and how we address them. First managing the customer. This is very specific to our needs and our business and the way we manage a customer Vault is based on a security concept called zero-knowledge architecture. It implies that nobody can access the user data except the users themselves. Everything is built around that very simple notion, but to be honest, which is very complex to put in practice. The short version of it is that the vault is always encrypted locally on your device, on the user's device, and it's using a key derived from a secret which we call the master password that only the user knows nobody else. That master password is never transmitted or stored by Dashlane. And that's really important because this is what protects you as a user, and that's what makes your data private, which, by the way, means that if you forget that master password, we can't do it like a forgotten password, like on the traditional online services. We can't give you back your access because we don't have the key. So it means that as a user you will have to start over if you lose that master password. Obviously, you may not have such sensitive information in your own product, but if you do, whether it is like financial information, health data, or the personal content, you should consider the level of encryption and the technical complexity required to ensure that you offer the best level of protection for the privacy of your customers. So that's really something that's important even for you.

The two other types of data that I mentioned are generated from the client applications and they are stored in the segregated database for us. It's on the one side, the activity logs are anonymized locally in the client before being sent to the different servers. And of course, we are also dropping parts of the timestamps to avoid the risk of correlation. For instance, we're doing more filtering on the server-side to ensure that if there is an issue on the client-side, we don't have unwanted details going up by mistake and we also try to really, really limit PII personally identifiable information. Those are the really sensitive ones that are regulated. So the less you have, the better you are. And because of the diversity of those logs and data, I actually run what we call a data privacy committee. So it's a group made of myself as a CTO. We have our CSO, our general counsel, and representatives of third-party data, from products to the different departments. And what we do in that committee is that we have an ongoing review of our practices, and in particular we evaluate and approve the addition of new data points inside the logs. So we actually need those new laws. What level of granularity should we have? What type of classification do they belong to and so on. And one thing to keep in mind as well as regarding the types of data is that with the explosion of SAS tools that we all use, some of the data will be duplicated to third-party up-to-date services and that increases the risk of what we call contamination. So think about platforms like Salesforce, like agrace or CRM, SendGrid, and Zendesk. All those platforms are creating some form of privacy exposure for you, so you need to keep them in mind.

A word about the compliance side of privacy. Our world is becoming more and more regulated. So this is not something you can avoid. And you should always, yet again, try to minimize your compliance exposure. The more you can limit the need for data, for PII, for regulated data types, the better you are. It will make things easier for you. Also, regarding compliance and audits and those types of activities, it's not always easy depending on your business, but at least you should ensure to have strong segregation between different categories of data so that you maintain an inventory of all data items that you collect. Where are they coming from? What is their sensitivity? How are you using them? Where could it be forwarded? Which third party could be forwarded? And you also need to define the retention policy per data type. How long should you keep each data type? That's very important from a regulatory standpoint. There are a lot of mandatory flaws in regulations like GDPR or CCPA, which is the Californian version of GDPR. So ideally you want to automate all those flows so that you respect the rules and you don't have to think about it. And just as an example. Recently, a few months back, Apple made it mandatory for iOS applications to offer in-app access to the deletion of your account. And so that's typically a flow that you want to automate so that you delete the account automatically across the board, inside your own system, but also across the different systems where you're delegating data to third parties. Another point about compliance is that all those practices should be compiled into your privacy policy and that privacy policy should obviously publish a public seat for your customers, either on your website or inside your applications. And you have a screenshot of the privacy policy here. One thing that I really like about the Dashlane one is that there is a summary in plain English before all the legal jargon which makes it nicer to read for our customers. And that's something that our general counsel implemented, which I found a very nice tip.

One last thought about privacy. It must be an ongoing conversation inside the organization. We are not in a static world, so it is important to always review our practices. How can we find the best solution for the business while ensuring the privacy of our users and for that purpose, at Dashlane we have built what we call our privacy stance? So it's a bit like the three laws of robotics from Asimov. So we have a set of principles from the highest order to the most specific, and principles must be considered in order. So we start from the top. And of course, principle two cannot go against the principle. So in our case, principle number one is Dashlane as a company should never be able to access any user data. Here we are really referring to the customer vault and the fact that only the customer can access their customer data. Principle number two, that's related to the B2B context. As an employee in an organization, you have a workspace and a personal space in your vault. When you use Dashlane and of course your admin, the organization can know certain things, such as what's happening in your workspace, but they shouldn't know what's happening in your personal space. So you have strong segregation of that customer data. Principle number three refers to logs and account data and how we aggregate that data, as I mentioned previously. And finally, number four is about customization and personalization of user experience. So most of the time in our case, it needs to happen on the client-side if you want to maintain privacy. If you don't have a privacy stance for your organization, I really encourage you to do the exercise. It's an interesting way to align the organization across how you think about privacy and then have a guideline for your business.

Let's look at security. Security is about keeping everybody safe. When we say everybody is safe, it's obviously our customers. But this is also one of the key purposes of our product in our case. But it's also about employees and it's also about the company, the shareholders, and so on. So how do we protect everybody and continue your business? It can be very existential in our case because we build a security product and security is one of our core features, but it can also be a marketing unique selling point. So in our case, we want to use security as one of the factors why you should use Dashlane. It can also be one depending on your own product. And in our case there are also two additional important considerations. We think we have a mission to educate our users around security. So most of them will start originally because they have the pain point of managing the passwords. But hopefully because we can educate them, we can learn and teach them about security. They will onboard and raise the awareness about security and privacy as they start using Dashlane. And last but not least, our second kind of mission is that we hope that at our level we can influence the digital world so we can have an impact and make sure that privacy and security become more and more foundational on the Internet. An example for us is how we participate in some consortiums like the Fido Alliance, which is a group of organizations that are trying to contribute to pushing multi factor authentication solutions.

Let's start by saying that security is mission impossible. We know that one-day Dashlane reaches them or that we will suffer critical security incidents. So we really need to be pragmatic about it. And the goal of our effort is really not to guarantee that nothing bad will ever happen because something will. But the goal is first and foremost to minimize the likelihood that something would happen. We want to make it very complex and expensive for attackers to target us. And it is kind of a game of cat and mouse. And if you think about it, if it's too hard for the cat to catch the mouse, they will go someplace else. So that way, hopefully, the hackers will target some easy targets. Number two, we want to make sure that we are as ready as possible, which means that if something happens, we are able to react to a security crisis in a very fast way and that readiness can make a huge difference depending on how prepared you are. Are you able to react fast? Do you have the right countermeasures in place, the right internal crisis organization, the right investigation capabilities? What I strongly encourage you to do is to design your own code read plan, which is like the plan of what happens in that case. And you need to rehearse it and practice with the organization. And finally, the last goal here is to minimize the impact if something bad were to happen. How can you make sure we limit as much as possible the impact for our customers and of course, for our company?

All in all, so far so good. Touchwood We never had a public security breach as far as we are aware. But yet again, that doesn't mean it won't happen. So we should not stop investing in security and we should keep rehearsing for that moment when it happens.

So if someone wanted to hack Dashlane, how should they do it? How would they do it? We are going to try to identify the attack vectors and the main threats against us. And doing that exercise of building your own threat model is a very important one. I strongly encourage you to do the same for your own organisation and be aware of what happens if someone tries to hack us.

In our case, we have five main attack vectors against Dashlane. We are going to deep dive on each of them. But here is the high-level view. Number one, the compromising of the Dashlane application, that's essentially bugs and vulnerabilities. Attack vector number two, Compromising the user device, like planting malware, planting a Trojan horse, or a keylogger, for instance. I think vector number three, which is like the main one, attacks the servers, compromising the server infrastructure of Dashlane. Attack vector number four breaching our internal IT. What if someone goes into the network on the Dashlane and accesses our systems? And finally, last but not least, the human factor, the insider jargon. What happens if someone is bribed or threatened or an employee goes wrong?

Let's look first at the first attack vector. Well, no code is perfect. That always bugs, and issues. And those bugs and issues can be ways for malicious actors to exploit them and get access to the customer's data. So it could be a bug in the client application. It could be a man-in-the-middle attack to intercept communication between the server and the client to get data and transit. And here the impact is really about leaking some of the user data. Well, it depends on where the vulnerability is. And the common theme, by the way, to all those attack vectors is, reputation is always at risk, even if the actual impact can be limited. And the impact on Dashlane's reputation and the communication that you have to do in terms of a security crisis is always very critical to us, and they should be to everybody. But there are many ways to try and mitigate that risk. Most of them relate to best practices of the secure development lifecycle, for instance, code analysis or code reviews. Having multiple layers of security review. Removing all code continuously. I just want to highlight two of them. The main one for us is obviously the concept of zero-knowledge architecture. It is a very simple principle, but it's really a way for us to be protected and make the life of hackers very hard. And then the second one that I want to mention is bug bounty programs, even if you don't have a team with dozens of security engineers. And of course, they would never be enough engineers to look at this. You can use bug bounty programs. It's a way for you to have more eyeballs on your code and more eyeballs on your platform. And it will give you a greater chance to find those vulnerabilities and bugs before the hackers find them. We do use bug bounty programs, one of them the main ones, HackerOne. And we ask the security researchers to investigate our code, try to find vulnerabilities. Our head of security RC. So he likes to call those platforms like the Uber of security. And even if you're a small start-up or an early-stage company, I strongly encourage you to already start having a smaller program in those bug bounty platforms. So really a good start. The second attack vector is a compromise of the user's device. What if the user's laptop, the user's mobile is infected by malware or a keylogger? In that case, the attacker can steal the user's data directly from the device's memory. I think it's kind of game over already since the attack is already inside of the house. You are already given the key to the house so you can do whatever he wants. And unfortunately, at Dashlane, there is not so much we can do to prevent this. At the end of the day, it's really more of an OS-level challenge to mitigate against those malicious attacks. Still, while we try to do our best, we try to make the life of the attacker more complicated by using techniques such as obfuscation in memory, hardware encryption, advanced two-factor authentication, and so on. But that's a tricky one, to be honest.

An attacker could try to penetrate the servers we hosted on AWS and they could try to steal the encrypted user files that we are storing on behalf of our customers and our servers. In practice, I don't think that's the best, the most effective attack, because by doing so, the attackers would get access to millions of small encrypted files without having any of the keys. That's thanks to our zero-knowledge architecture. But they could, of course, try to brute force each of those files. But that would require a massive amount of computing power, and I'm not sure it's efficient. On that front, there are very traditional server-side security hardening practices. So it's a well-known activity. You don't need to reinvent the wheel. You need to leverage the best practices from the industry along with the standards and the compliance standards like PCI-DSS, SOC 2. In our case, we also benefit from all the built-in security from AWS and the many years of the tech community serving security and improving security. So it's really about doing things right in that case.

This one is more sensitive. If someone were to compromise the internal I.T. infrastructure. That would be a bigger deal for us. And here, the use case now is not really intellectual property in the sense it's really more about someone accessing, for instance, the pipeline of our engineering team and being able to plant malicious code in the software. So we would see Dashlane applications with malicious software inside a backdoor or something along those lines. So that's like the traditional supply chain attack. We had the SolarWinds incident a while back, which was a typical example of that type of attack. And as a start-up, it's the trickiest one because you need to find the right balance between strict IT security practices and also employee productivity, we need to still go fast, and as a start-up. So of course we lean more on the security side because we are a security product and that's why we have 2FA everywhere. We have strong network security and read detection mechanisms. We try to put tight IT processes such as least privileged access. The fact that you should have only access to the minimum you need for your job and only the systems that are required for you to do your job. Not much more. But still, it's always a trade-off and the balance to find between the productivity of the company and the security.

The last attack vector has a similar impact as the breach. One comes from the outside, the other one from the inside. But the risk is similar. Of course, we trust our employees, but for their own safety and the safety of the company. We need to think about, okay, what if one or several employees were bribed with money, were threatened, and one of them went rogue? They could, How could they do and harm our customers and our company? So yet again, the sensitive system here is our software factory. So we make sure we have a very secure pipeline with full traceability. You need multiple approvals to ship the code from multicore to production. We sign the Dashlane application to make sure that they are like you have the integrity of the data. So the goal here is really to make sure that one employee alone cannot ship a corrupted Dashlane build.

That's it for today. I want to leave you with three main takeaways. Very basic one. The first one is that privacy and security is not a topic just for the security or the legal teams. It really is a company conversation. So it needs to be something everybody is an accountable forum and you need to have everybody in the organization talking about privacy and security. Number two, it will not happen by magic then there isn't work, that effort that needs to be put behind privacy and security. So you should be proactive about it. You should invest some of your time in this. And finally, last but not least, and most of the time it's really about common sense. You don't have to reinvent the wheel. Just do what seems to be the most obvious in your own context. Keep it simple, but fully integrate it from the start of everything you do.

Thank you for listening. Be safe, Be the ambassadors of privacy and security in your own organization. And of course, if you don't do it, use a password manager. Thank you very much.