Learn how Digit leverages the Figure Eight platform to solve for data drift
Transaction data changes every day. Digit uses human-in-the-loop machine learning practices to make sure they’re up to date on all of it.
Digit analyzes their customers’ spending and automatically saves them money. They’ve saved their users over a billion dollars since their launch in 2013 by using sophisticated spending and transaction models to identify when and how much money can be safely be moved to savings accounts without adversely affecting their users’ day-to-day lifestyle.
To automatically save their users money, Digit needs to understand what every transaction is. The problem is, transaction data is messy. For example, sometimes, it’s just a string of numbers. Other times, keywords are abbreviated and tough to understand.
But what’s really difficult about this kind of data is that it changes over time. New companies start every day and old ones change their billing names. Customers move and get new landlords or employers. Digit’s transaction data provider also simply changed some things Digit could not themselves control. It boils down to this: new, unseen data crops up each and every day and a model built in 2016 simply won’t be as accurate in 2018 as it was at its creation.
This concept is sometimes called “data drift.” It refers to the fact that while you may have captured the ground truth when you built your model, that ground truth changes. And while an NLP practitioner understands that new slang and idioms appear constantly, we don’t often think about financial technology companies struggling with this. But Digit learned it was something they’d need to solve to keep their users happy–and saving.
The solution is simple: when your ground truth has changed, change your ground truth. And for Digit, that meant labeling more training data.
By annotating new data for their transaction models, Digit is effectively able to categorize and learn about the fresh, unknown information around which their old model was struggling. They used Figure Eight to label the same kinds of data they built their original algorithms on, but keep those models current. Those businesses that just popped up in the last year? Those companies who changed names? Those pesky changes from their original data provider? Now Digit understands them all. And since Digit runs on AWS, those models are updated seamlessly, allowing them to stay current whenever their model needs a tweak.
By running more data and iterating constantly on their algorithms, Digit knows that they can stay ahead of changing ground truth instead of catching up after the damage is done.
Put simply, having a model that understands this ever-changing landscape of transactional data is a key driver for Digit’s success. Solving for data drift means their customers get a better product that works for them. It keeps active users active, reducing churn, and helps make Digit indispensable for users who rely on it for saving money automatically.