Richard Craib is a 29-year-old South African who runs a hedge fund in San Francisco. Or rather, he doesn’t run it. He leaves that to an artificially intelligent system built by several thousand data scientists whose names he doesn’t know.
Under the banner of a startup called Numerai, Craib and his team have built technology that masks the fund’s trading data before sharing it with a vast community of anonymous data scientists. Using a method similar to homomorphic encryption, this tech works to ensure that the scientists can’t see the details of the company’s proprietary trades, but also organizes the data so that these scientists can build machine learning models that analyze it and, in theory, learn better ways of trading financial securities.
“We give away all our data,” says Craib, who studied mathematics at Cornell University in New York before going to work for an asset management firm in South Africa. “But we convert it into this abstract form where people can build machine learning models for the data without really knowing what they’re doing.”
He doesn’t know these data scientists because he recruits them online and pays them for their trouble in a digital currency that can preserve anonymity. “Anyone can submit predictions back to us,” he says. “If they work, we pay them in bitcoin.”
The company comes across as a Silicon Valley gag. All that’s missing is the virtual reality.
So, to sum up: They aren’t privy to his data. He isn’t privy to them. And because they work from encrypted data, they can’t use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund.
Numerai’s fund has been trading stocks for a year. Though he declines to say just how successful it has been, due to government regulations around the release of such information, he does say it’s making money. And an increasingly large number of big-name investors have pumped money into the company, including the founder of Renaissance Technologies, an enormously successful “quant” hedge fund driven by data analysis. Craib and company have just completed their first round of venture funding, led by the New York venture capital firm Union Square Ventures. Union Square has invested $3 million in the round, with an additional $3 million coming from others.
Hedge funds have been exploring the use of machine learning algorithms for a while now, including established Wall Street names like Renaissance and Bridgewater Associates as well as tech startups like Sentient Technologies and Aidyia. But Craib’s venture represents new efforts to crowdsource the creation of these algorithms. Others are working on similar projects, including Two Sigma, a second data-centric New York hedge fund. But Numerai is attempting something far more extreme.
The company comes across as some sort of Silicon Valley gag: a tiny startup that seeks to reinvent the financial industry through artificial intelligence, encryption, crowdsourcing, and bitcoin. All that’s missing is the virtual reality. And to be sure, it’s still very early for Numerai. Even one of its investors, Union Square partner Andy Weissman, calls it an “experiment.”
But others are working on similar technology that can help build machine learning models more generally from encrypted data, including researchers at Microsoft. This can help companies like Microsoft better protect all the personal information they gather from customers. Oren Etzioni, the CEO of the Allen Institute for AI, says the approach could be particularly useful for Apple, which is pushing into machine learning while taking a hardline stance on data privacy. But such tech can also lead to the kind of AI crowdsourcing that Craib espouses.
On the Edge
Craib dreamed up the idea while working for that financial firm in South Africa. He declines to name the firm, but says it runs an asset management fund spanning $15 billion in assets. He helped build machine learning algorithms that could help run this fund, but these weren’t all that complex. At one point, he wanted to share the company’s data with a friend who was doing more advanced machine learning work with neural networks, and the company forbade him. But its stance gave him an idea. “That’s when I started looking into these new ways of encrypting datalooking for a way of sharing the data with him without him being able to steal it and start his own hedge fund,” he says.
The result was Numerai. Craib put a million dollars of his own money in the fund, and in April, the company announced $1.5 million in funding from a group that included Howard Morgan, one of the founders of Renaissance Technologies. Morgan has invested again in the Series A round alongside Union Square and First Round Capital.
It’s an unorthodox play, to be sure. This is obvious just when you visit the company’s website, where Craib describes the company’s mission in a short video. He’s dressed in black-rimmed glasses and a silver racer jacket, and the video cuts him into a visual landscape reminiscent of The Matrix. “When we saw those videos, we thought: ‘this guy thinks differently,’” says Weissman.
As Weissman admits, the question is whether the scheme will work. The trouble with homomorphic encryption is that it can significantly slow down data analysis tasks. “Homomorphic encryption requires a tremendous about of computation time,” says Ameesh Divatia, the CEO of Baffle, a company that’s building encryption similar to what Craib describes. “How do you get it to run inside a business decision window?” Craib says that Numerai has solved the speed problem with its particular form of encryption, but Divatia warns that this may come at the expense of data privacy.
According to Raphael Bost, a visiting scientist at MIT’s Computer Science and Artificial Intelligence Laboratory who has explored the use of machine learning with encrypted data, Numerai is likely using a method similar to the one described by Microsoft, where the data is encrypted but not in a completely secure way. “You have to be very careful with side-channels on the algorithm that you are running,” he says of anyone who uses this method.
Turning Off the Sound at a Party
In any event, Numerai is ramping up its effort. Three months ago, about 4,500 data scientists had built about 250,000 machine learning models that drove about 7 billion predictions for the fund. Now, about 7,500 data scientists are involved, building a total of 500,000 models that drive about 28 billion predictions. As with the crowdsourced data science marketplace Kaggle, these data scientists compete to build the best models, and they can earn money in the process. For Numerai, part of the trick is that this is done at high volume. Through a statistics and machine learning technique called stacking or ensembling, Numerai can combine the best of myriad algorithms to create a more powerful whole.
Though most of these data scientists are anonymous, a small handful are not, including Phillip Culliton of Buffalo, New York, who also works for a data analysis company called Multimodel Research, which has a grant from the National Science Foundation. He has spent many years competing in data science competitions on Kaggle and sees Numerai as a more attractive option. “Kaggle is lovely and I enjoy competing, but only the top few competitors get paid, and only in some competitions,” he says. “The distribution of funds at Numerai among the top 100 or so competitors, in fairly large amounts at the top of the leaderboard, is quite nice.”
Each week, one hundred scientists earn bitcoin, with the company paying out over $150,000 in the digital currency so far. If the fund reaches a billion dollars under management, Craib says, it would pay out over $1 million each month to its data scientists.
Culliton says it’s more difficult to work with the encrypted data and draw his own conclusions from it, and another Numerai regular, Jim Fleming, who helps run a data science consultancy called the Fomoro Group, says much the same thing. But this isn’t necessarily a problem. After all, machine learning is more about the machine drawing the conclusions.
In many cases, even when working with unencrypted data, Culliton doesn’t know what it actually represents, but he can still use it to build machine learning models. “Encrypted data is like turning off the sound at the party,” Culliton says. “You’re no longer listening in on people’s private conversations, but you can still get very good signal on how close they feel to one other.”
If this works across Numerai’s larger community of data scientists, as Richard Craib hopes it will, Wall Street will be listening more closely, too.