Announcing our first public product release
by Josh Schwartz

At Phaselab, our goal is to help our customers deliver on the privacy promises they make to their customers. After over a hundred research conversations and several months of development with a small group of design partners, we’re thrilled to announce that we’re opening up our product for alpha users immediately.

Companies’ approach to data is fundamentally changing. Recent advances in AI have radically expanded what data can be leveraged for, and with that companies are investing more in storing rich, unstructured data for model training. Meanwhile, data regulations are only increasing — new state- and country-level privacy laws are announced every month and AI regulation is imminent on a global scale. While these two trends may seem at odds with one another, we think that good governance is actually a force multiplier when implemented in an automated and scalable way.

Across stage and sector, and despite virtually everyone agreeing on the value and need for strong privacy compliance, we’re seeing a clear and multifaceted disconnect between policy writers and technology builders. We think this is both a regulatory problem and a tooling problem – and we’re eager to work with lawyers and technologists to get under the hood of how privacy programs are implemented and build tools that make operationalizing your privacy posture as easy, automated, transparent, and accessible as possible.

To start, we think there’s a massive gap between how policies are written and conceived and how they’re technically implemented. We’re building a translation layer that understands both JSON and jargon, logs and legalese. Building into this intersection is the founding idea of our company and we’re excited to get started by biting off a small chunk of the problem space: unstructured data.

In our conversations with technical leaders, we often heard outright resignation about the idea of trying to keep tabs on what’s being stored in unstructured data. Unlike structured data with clearly defined content and format, unstructured stores accumulate a lot of hard-to-inspect ROT data (redundant, obsolete, trivial), kind of like the Downloads folder on your desktop. The context and rationale for storing this data quickly becomes lost as employees come and go, leaving in their wake a large repository of unknown risk. Legal teams want to understand and quantify this risk, but technical leaders don’t have a way to inspect and characterize this data at scale.

Today’s Release

We’re excited to release our first set of features to alpha users – a tool that identifies and characterizes data compliance issues within large amounts of unstructured cloud data storage.

Our first set of features aims to bridge this gap by scanning an organization’s entire object storage footprint to help technical leaders:

  • Quickly characterize the contents of their buckets to help them identify data that’s obsolete, redundant, trivial, and non-compliant.
  • Identify personal data and help them understand: where is it, what is it, how much there is, how long has it been there, and is it a problem?
  • Assess what existing retention/deletion rules are in place for this data and if they are being applied correctly.
  • Fix issues by suggesting configuration and/or policy changes to minimize your footprint/risk and cut costs on an ongoing basis.

Scanning takes hours even at petabyte scale and is comprehensive and cloud native. Integration is as simple as setting up a new IAM role – and our small but mighty founding team will be there to help every step of the way.

Of course, unstructured data storage is only one small piece of the larger privacy puzzle, but we think it’s a high-leverage opportunity that isn’t well-served by existing tools and, in our experience working with design partners, provides both reduced risk and meaningful cost savings on Day 1 of using the product. We can’t wait to show it to you!


In our first scans of customer data, we’ve found three types of opportunities for improvement of customers’ data practices:

  • Shadow Personal Data: In addition to systems intended to store customer data, it’s easy for personal data to creep into other systems via backups, exports, ad hoc data analyses, and log files. So far we’ve found millions of records of unintended personal data for our early customers.
  • Data out of retention policy: Even for companies with well defined data minimization policies, we've found that they're often keeping data for longer than intended, especially non-production data that may not have come with an explicit production governance policy. So far we’ve found more than 30TB of old and unused data for our early customers.
  • Cost optimization: By highlighting data that is out of compliance and helping our customers to delete that data, we of course help customers reduce their data privacy and security risks. In addition to this, deleting data that is out of policy comes with a significant cost upside; one customer has already saved more than $800/month in storage costs, allowing privacy programs to literally pay for themselves.

Try for yourself

It’s a pretty big company eye-opener to see what’s inside your old unstructured data. We’d love you to try it out and give us your feedback. We’ll run one scan of your data completely for free, starting today.

If you’re interested in this free data scan, add yourself to our list and we’ll get back to you with next steps right away. If you have any questions, just email myself and David at

Try it out!
Get in Touch