What is a Software Audit and Why Should Every Company Conduct One?
What is a Software Audit and Why Should Every Company Conduct One?
The core principal behind open source software is software freedom: the right to access the source code and know what’s in the software you’re using. In this podcast, the open source software attorneys at Traverse Legal discuss why the fundamental right of software freedom is so important and why companies should perform audits of open source code to avoid expensive legal liabilities.
What is an open source license audit?
Enrico Schaefer: Welcome to Tech Law Radio. My name is Enrico Schaefer. And today we are going to be talking about software audits. It’s an issue that a lot of tech companies, a lot of software development companies don’t really think a lot about. But they should. What is in your source code? Well, number one, I’m going to tell you what’s in your source code is copyrighted lines of code.
Who owns the copyright to your code? Now, there’s a lot of different issues we could dig into today, but today we’re going to be talking about some very important aspects of doing due diligence on your software code. And the best way to do that is to do a software audit.
And that means to actually do a line-by-line analysis of your code to make sure that you are not violating any third-party copyright notices, that you have the rights you think you do in your code, and to try and resolve any issues that you find. Today we have Russell Gelvin. Russell is a licensing and software attorney for Traverse Legal. He is an expert in software auditing. Welcome to the show, Russell.
Russell Gelvin: Hi, Enrico. Thank you for having me back on the podcast.
Enrico Schaefer: Yeah, no problem. So, Russell, let’s talk a little bit about a software audit. What is a software audit?
Russell Gelvin: I mean really it’s exactly what it sounds like. You’re just taking a look at your code to make sure you’re not violating anybody else’s rights or violating any applicable laws or anything like that. In the context of source code, you’ve got a lot of legal issues you can run into. We’ve all heard about emissions scandals and things like that at auto companies where they could have their code do various nefarious things.
That sort of thing is a little bit more difficult to look out for, but probably a more likely scenario that companies would find themselves running into are violations of open source or other third-party licenses that they might not even know are in their code. Nowadays it’s basically impossible to really write code without leveraging some open source tools.
And for that reason it’s really important to at least give your code a check, especially if you’re distributing it. I would say if you’re distributing your code to any third parties — like, for example, if you’ve got an application that you’re selling to third parties, then I would definitely recommend a source code audit because upon distribution you could trigger a lot of really high-risk open source licenses that quite frankly could end up in you losing any and all profits attached to your software.
For example, if it turns out that you violated an open source license and then, therefore, you’re now in the territory of copyright infringement, it’s very realistic to say that the person whose code you’re using could come back and sue you and demand all your profits that you made off of that software because you’ve infringed their copyright.
Enrico Schaefer: So let’s just kind of break that up a little bit. Sometimes we’ll get approached by a software development company. And they’re like, “Hey, we’ve got this code. We’ve been developing it for years.” And one of my first questions, Russell, is always like, well, how many coders were involved in this project, right? And I can only think of one instance over the last couple of decades where in a complex software situation the person said, “Just me.”
Typically there are lots of people contributing to code or the company actually outsourced the development of the code. And what that means, the owner of the company — the owner of the intellectual property that is the proprietary software may not know exactly where all the code came from. And if they work with outside developers, what rights did those developers have in the code that they uploaded into your system?
If you have internal developers, where did they get the code? What did they use from potentially open source software in order to perform their job to get the code functionality that the company’s looking for? So let’s talk a little bit about that. Knowing where the keystrokes came from is critical, isn’t it?
Russell Gelvin: Yeah, and it’s really difficult. Like you said, there could be tons of developers. You might even have had some offshore development companies that are doing the work and then just sending you the code. If you get sent millions and millions of lines of code, you can’t just go through and read that by hand. That’s impossible. So as a manager or somebody running a business, you do have to worry.
Are my programmers writing all the code themselves? They might have gone and downloaded some of the code. How do I know that? Do I have to sit there and monitor my programmers all day long while they’re writing code? The simple answer is, no, you don’t have to do that because there are tools — there are even open source and free tools that you can use to sort of do a scan of your code, which will essentially identify particular keywords in the text that will allow you to identify a lot of your legal risk.
Some of these tools are really powerful. One important thing they’ll do is help you identify any copyright notices. That’s important for two reasons. A, if there’s a copyright notice that means that somebody else might own the code and you might not have written that code. And, B, a lot of the open source licenses are going to actually require you to reproduce that copyright notice when you go to distribute your software.
And there is case law in federal courts that says that simply failing to put that copyright notice on your redistributions constitutes a breach of the open source license and, in some jurisdictions. you’re looking at both damages for breach of contract and damages for copyright infringement, which can be significant.
When should I audit my source code for open source licenses?
Enrico Schaefer: So it’s easier really, Russell, to take a look at your code once it’s completed, once you’ve got those thousands, tens of thousands, hundreds of thousands, millions of lines of code, go ahead and take a look backwards at what’s in the code. You mentioned the software tool. I know you run our clients through a process where you take a look at the lines of code that are potentially triggers based on keywords to identify potential legal issues.
So let’s talk a little bit about that software tool. Let’s say I’m a client. I have a software technology company. I’ve been in the market for a long time. But now I’m about to take on an investment round. Or I’m about to look for an exit valuation. And I want to maximize my valuation on exit.
So one of the things I’m going to do is go look for an attorney who specializes in software licensing and potentially even an attorney who specializes in open source to make sure there is no open source licenses in my code because open source presents a whole special set of liability issues if you happen to have proprietary software. So I go out there, and I say, “Okay, I found Russell Gelvin. He is an attorney who specializes in these software audits and licensing issues.”
I provide you my source code. You’re going to run that through your software. And it’s going to spit out what?
Russell Gelvin: Let me back up really quick. I don’t want to say that it’s better to do the audit after you’ve written millions and millions of lines of code because you might have created so many issues at that point that you can’t go back and undo them. And we’ve certainly seen that with some of our clients. I would say definitely what you want to do is have a policy in place.
You want to make sure you educate your employees just on the basic rules of how open source works. Here’s a couple of open source licenses — the most common licenses. And you can tell them these licenses are fine. Just make sure you document them. Make sure you keep records of which licenses you’re using. And then you can say, “Certain licenses are not fine.”
Like for a distributed mobile app, you do not want to be putting GPL 3.0 code in your binary, for sure. So definitely it’s something that you should start from the beginning, you should be mindful of from the beginning. And, yes, you can do a scan of the code. But if it’s millions and millions of lines of code that could be quite a process. It’s not undoable, but you might end up with a lot of work if there are thousands and thousands of open source licenses you have to sort through.
So coming up to how an audit works, what you’ll do is start with the source code. Usually it’s easiest to just package it all together in a nice little zip file or tar file or something like that. There are tools that generally run as remote servers that you can just upload your code. So you can download the software and set up one of these servers yourself.
I wouldn’t say it’s easy, but it’s not writing code from scratch for sure. Once you run it through, there are a couple of different options of how you can have your information displayed. Essentially the most important thing you’re going to want to get is a listing of all the licenses. And the software is really good because it’ll look for key terms or words, the names of the licenses, but also some of the key phrases that help identify the unique licenses.
You’ll get a listing of all the open source licenses in your code. From there the first step is you need to identify if there are any problem licenses. So in the context of a distributed mobile application, you want to avoid anything that’s going to be copyleft, because copyleft means you have to make your own source code available to anybody you give the application to.
Assessing the risk of open source software
Enrico Schaefer: So let me just stop you there. A mobile application — I develop this proprietary mobile application, right? And I’m selling it for 99 cents a download on the app store or through my mobile device download. Now you’re saying I have to provide to the end user not only the functionality of the application of the compiled application but I’ve got to give them the source code?
Russell Gelvin: Not only do you have to give them the source code, they are allowed to do whatever they want with it. They can modify it. They can go build their own app. And then they can redistribute it and do whatever they want with it. So in the context of proprietary software, it is not something you want to deal with because your right to control your intellectual property, the underlying copyrights in the code and in the audio-visual displays of the user interface, that’s what you have value in.
You spent all your time creating that intellectual property. If you’re unable to control redistribution, if you’re unable to control a modification, then you’re really losing your ability to monetize it.
Enrico Schaefer: Yeah, so that’s really important. So I hire you as my attorney. You’ve run the software program. You’ve run my software through your software program to do this software audit. It spits out these results. Your next step is to tell me where the problems might be, what the risk is in my code based on the output. I take it you’re also going to tell me what my options are in order to reduce the risk or eliminate the risk from my code.
Russell Gelvin: Exactly. So my risk assessment is going to contain a comprehensive listing of all licenses identified. I will tell you how many instances of each license, and I’ll tell you any terms or conditions you’re going to have to comply with. All that would be included in the risk assessment. We’ll basically try to take all that legalese and translate it into plain English. Really short bullet points. Quick do’s and do-not’s to make sure you’re in compliance.
That being said, there are going to be certain open source licenses that you don’t want to just accept a risk for. The GPL, the AGPL, even the LGPL under certain circumstances could have that copyleft triggered. So, yeah, my first advice for a distributed software is going to be we need to remove all that copyleft code. There are various methods we can employ using different legal standards.
The simplest of course would be to just remove it. If it’s something that’s non-essential to the functionality of your software, you can just delete it. You can remove those libraries or whatever. Other options would be to reach out to the copyright holder, the person who wrote that open source code and say, “Hey, can I have permission to use this under a different open source license?”
You’d be surprised at how willing a lot of programmers will be to just give you a less restrictive license. I’ve seen that happen. That’s really easy. Sometimes they’ll do it without even asking anything from you. They’ll just do it because you asked nicely. Other things you can do is maybe the code isn’t necessarily protected by copyright. That’s something where we would work in detail with you, just looking at the code, seeing if there’s anything like, for example, it might be fair-use.
It might be so little — they might have just written so little code in that one file that it’s not going to necessarily trigger copyright protection, things like that. So we would definitely be able to work with you to get those licenses out of your code. And there are a lot of options. It just really depends on how many you have. If it’s 90 percent of your code is covered in GPL licenses, that’s going to be more difficult to monetize.
But typically if you’re writing proprietary software, you likely would have written a substantial amount of the code yourself. So once we pull out all those copyleft licenses, we’re just left with less restrictive open source licenses. The Apache 2.0, MIT, BSD. These common licenses that don’t have those copyleft provisions. And from here the terms are not onerous, but you can still mess up.
So there are certain things you’re going to have to look out for, for example, with the Apache license. If you sue the licensor for patent infringement, any patent licenses you were granted with that Apache license automatically terminate. And now you might be liable for patent infringement. So things like that. The most important thing you’ll probably want to keep in mind is you do need acknowledgements of the copyright notices and the licenses for all those open source licenses you have.
Usually as like an appendix to my risk assessment, I’m going to provide a document that you’re going to want to attach or include in your terms for your software. Or you could have it in your user interface. Just somewhere your end users are likely to see it. And it’s just saying, hey, this software may contain code from the following licensors under the following licenses.
And we try to keep these acknowledgements as clean and simple as possible, but sometimes you do end up with several pages of just acknowledgements because with open source code there can be a lot of contributors. And you’re going to want to make sure that you acknowledge each one of those contributors.
Who does this matter to?
Enrico Schaefer: This is all kind of basic to us as software and technology attorneys. It is surprisingly unfamiliar territory to technology companies. So when you were working with Ford, obviously they had a whole due diligence program that they would run every piece of software through this process in order to make sure that they weren’t taking on risk or managing their risk in terms of software licenses.
Yet there are very few other technology companies who tend to be thinking about these issues. Number one, who is this going to be important to? And why is it important? So I assume that if you’re the software technology company, you’re protecting yourself against potentially getting sued for copyright infringement or license violation?
Russell Gelvin: Yeah, absolutely. So the owner of the software company definitely has an interest in making sure that they have the right to distribute, monetize, or whatever with the software they’re developing. But quite honestly, from what I’ve seen the people that really care about this are going to be the investors, the people who are putting the money down.
So these audits, I’ve seen them usually come up in merger and acquisition transactions where a hot, new software startup has this amazing app and a larger company or just an investment company or something is either looking to invest in the company or acquire the company. And they’ll say, “Okay, what is your policy for open source? Do you have any open source?”
And sometimes these software companies will be like, “What do you mean, policy for open source? We were just hitting the ground running. We were focused on developing. We were focused on growing. Like we don’t have time for that. Our programmers are working around the clock trying to get this done. Like we didn’t have time to sit there and double check every commit to our github server to see if there was any open source in it.”
Rightfully so. Because the investors are the ones who are — if I buy a software company and then a couple of months later it turns out that they don’t own anything proprietary. They get hit by a lawsuit for open source software. And all of a sudden, they can’t sell anything. The value of the company goes to zero. That’s something I’m concerned about.
Quite frankly we’re seeing more and more of these lawsuits. Just a couple of years ago, there was only a handful of these opinions on open source in the federal courts. Now we’re pushing probably close to 100 cases, and that’s probably all within the last 10 years. So it’s only getting bigger and bigger. Just as people start to realize how serious some of these terms can be and the consequences of violating these licenses, I think we’re going to see more and more litigation and we’re going to see more and more concern just about open source.
Enrico Schaefer: So if I’m the owner of the company or an investor in a company that is developing software, I want to complete a software audit at every stage along the way to make sure that I’m not taking on unnecessary risk and putting myself in a position to fix any issues as we go. If I’m an investor looking to put money into a software development company, then I want to make sure there’s a software audit before I purchase so I know what I’m buying and whether or not I’m getting what I think I’m paying for.
And if I’m purchasing a company that has any software that it’s developed or any software that it claims as providing value to the company, proprietary or otherwise, I want to do a software audit on a line-by-line basis of the code in order to make sure that, again, I’m not taking on any unnecessary risk and that I’m getting the intellectual property asset that I think I’m getting and I’m looking at more than just what the revenue from the software is.
I’m seeing whether or not I can get sued for copyright infringement or copyleft on a copyleft basis when I actually purchase the company. And this could be for not only companies that are software development companies but any company that lists as an asset software.
Russell Gelvin: Absolutely. So as an investor I would definitely just be looking out for are they aware of open source? Do they have any policy in place? Like do they take this seriously? To me, I think when I see software developers taking open source licensing seriously and understanding it, it really shows a level of professionalism and just that they have it together, they know what they’re doing.
So just as an investor having a startup or a software developer that’s taking it seriously from the start, I think that’s a good sign. But even the best programmers, even the most conscientious developers, they make mistakes. I’ve done probably over a thousand audits already. Some of the best programmers, I’ve seen them make mistakes where they’ll submit the request and identify the license.
I do my scan, and they couldn’t be further from the truth. Once you do a deep dive into the code, there could be any number of licenses because these projects have various contributors. And in a lot of cases, open source involves the combination of smaller open source projects into one big project. And then you’re dealing with dozens of different licenses for every different [smaller] project.
There could be a different license with different terms. What you’ve got to remember is when you go to download open source, the most common Web site you’re going to get your open source code from is Git Hub. You’ll have that license on the front page. But that’s really just the top level license. That license applies to the whole packaged code you’re downloading.
But they’re almost always or very likely they’re going to be additional open source licenses in the sub-modules, in the sub components of that package. And each one of those licenses applies to whatever particular code it’s attached to. I’ve had situations where I asked the developer what are the licenses, and they tell me there’s no licenses whatsoever.
And then upon further research we find some pretty high-risk licenses. Or sometimes they can just be confusing, what licenses apply. Especially in the context of freeware licenses. A lot of the things you get from IBM and Microsoft, it can be a labyrinth to just even figure out what licenses apply to what code. And then licenses are constantly changing.
So you want to keep up-to-date. You want to do annual reviews for open source that has been known to switch licenses. We’ve seen cases where — just one example is the Java Runtime Environment. That’s been free since its inception. And now all of a sudden Oracle has decided they’re going to start charging license fees on that. So you can use the older versions of the Java Runtime Environment. But any newer versions I think after Version 9 is something you have to pay a license for, which that’s a big risk. It goes from being open source to being a commercial product you have to pay for. For a larger enterprise, that could cost millions in additional license fees because JRE is used to run everything.
How do I get started with a open source license audit?
Enrico Schaefer: Let’s say I own a company or I’m an investor or I’m looking at purchasing a company, and I want to do a software audit. How is it that I get a hold of Russell Gelvin to help me understand what’s involved and what the cost and fees will be in order to run that software audit?
Russell Gelvin: You can definitely just reach out to our firm Traverse Legal at our Web site traverselegal.com. There are some forms you can fill out just giving us some quick information about what you’re looking for. And then we can definitely get back to you. What we can do is we’ll get you the necessary confidentiality provisions in place. Obviously anything you send to your attorney is going to be covered by attorney-client privilege.
Enrico Schaefer: People still worry about that.
Russell Gelvin: Yeah, people still worry about that.
Enrico Schaefer: It’s their code. It’s their baby.
Russell Gelvin: It’s their code. So this is us right now promising that we take very good care of your code. We respect your code. We’re just here to help out. We’re your attorney. We’re here to protect you. Your code is certainly safe in our hands. Just send us over your code. Like I said, a zip or a tar usually works best. I can run it in my tool, and that will give me an initial scan.
And I can tell you, hey — that’ll give me an idea of what we’re working with. Usually the cost and size of these audits is driven by the size of the software code base. So if it’s several hundred megabytes of source code, that could be a pretty sizable audit. Most audits aren’t too bad. Once I get a listing of licenses, if it’s a couple hundred licenses, that’s going to be one price.
But if I do the scan and only see one license, that’s probably only going to be a couple hours, which isn’t going to cost you too much. If it saves you from an open source lawsuit down the road, there’s value in that, trust me.
Enrico Schaefer: Yeah, there’s value in that. And there’s value in being able to say we’ve done the audit, our code is clean, and here’s the documentation to show it.
Russell Gelvin: Yeah.
Enrico Schaefer: Think about that. Think about your IP value on exit or on any other valuation [event]. Being able to show that your code is clean, that other licenses are not being violated is going to add real money to the top of the pile. So all very valuable stuff, Russ. Thank you for being on the show today. I know that you love open source and you’re a big part of the open source community.
So appreciate you giving us our time today. And we will see you next time on Tech Law Radio. Until then, have a great day.