Is Open Source Machine Intelligence Simply Impossible?*
Competitive machine intelligence requires massive data sets. Massive data sets are expensive. This implies competitive AI is only possible at massive, for-profit corporations, not in the open-source world. What can we do about this?
A decade ago, we in the free and open-source community could build our own versions of pretty much any proprietary software system out there, and we did. Publishing, collaboration, commerce, you name it. Some apps were worse, some were better than closed alternatives, but much of it was clearly good enough to use every day.
But is this still true? For example, voice control is clearly going to be a primary way we interact with our gadgets in the future. Speaking to an Amazon Echo-like device while sitting on my couch makes a lot more sense than using a web browser. Will we ever be able to do that without going through somebody’s proprietary silo like Amazon’s or Apple’s? Where are the free and/or open-source versions of Siri, Alexa and so forth?
The trouble, of course, is not so much the code, but in the training. The best speech recognition code isn’t going to be competitive unless it has been trained with about as many millions of hours of example speech as the closed engines from Apple, Google and so forth have been. How can we do that?
The same problem exists with AI. There’s plenty of open-source AI code, but how good is it unless it gets training and retraining with gigantic data sets? We don’t have those in the FLOSS world, and even if we did, would we have the money to run gigantic graphics card farms 24×7? Will we ever see truly open AI that is not black-box machinery guarded closely by some overlord company, but something that “we can study how it works, change it so it does our computing as we wish” and all the other values embodied in the Free Software Definition?
If you are pondering the same thing, come by!
OSCON. Digital Identity World. World Economic Forum. Enterprise Data World and others.