“The more data the better” and, “whoever has the most data wins” are common perceptions in artificial intelligence (AI). But winning is really about more information, not more data, as data is usually full of inefficient redundancies. Throwing more data at the problem means higher costs for your budget and for the environment. The billions of dollars being invested in new AI-processors and better algorithms are not enough as the costs continue to spiral upward. You need information efficiency. With Summary Analytics’s mathematically proven AI techniques, you can bring big data down to size through summarizing and prioritizing your data without loss of fidelity — delivering better insight while reducing time, cost, and bias for any form of data analytics. The savings can be orders of magnitude! Now you can use AI in new ways never before possible.
As AI models get more complex, the amount of data needed to train
them and, therefore, the computation, storage, and energy costs, are growing
exponentially. Billions of dollars are being invested in new processor
technology and new algorithms, but costs are still
increasing.
A new tool for “information
efficiency” is needed as well.
Read related
articles.
Insiders know that data labeling is the skeleton in the closet
for nearly every AI project. Automated labeling is mediocre at best and manual
labeling is plagued with errors due to operator fatigue, mainly due to
repetition from redundant data. Labeling random samples reduces training
accuracy by missing corner cases. Mathematically eliminating redundant data,
however, would mean less labeling while still keeping all the information
content. This approach would also result in fewer errors, as fewer and unique
records keep human labelers more vigilant.
Read
related
articles.
When dealing with data analytics, data labeling is not the only
area of errors due to human fatigue. AI used for cybersecurity and financial
decision making are just two examples of applications prone to too many alerts
and resulting fatigue errors. Prioritized summary alerts would help analysts
focus on the most important alerts and avoid boredom and resulting fatigue
errors, while linking redundant alerts to those prioritized in the summary would
ensures that nothing is missed.
Read related
articles.
Not a day goes by without a press article on the evils of AI and
ML due to data bias. But how do you find the unknown bias in your overwhelming
datasets? A tool is needed to identify the corner cases in your data, ascertain
that ALL desired constituents are included, and make sure those corner cases are
not dominated by the majority data when training your models.
Read
related
articles.
Summary Analytics was founded in 2018 by Professor Jeff Bilmes from University of Washington in Seattle after over 25-years of research in the areas of artificial intelligence (AI) and submodular optimization. From his work in AI, Jeff saw how the computational power required to train state-of-the-art AI models was growing exponentially, just as Moore’s Law continued losing steam. The problem was being addressed primarily using machine learning algorithmic advances and increased parallel compute system power, and these help. More was needed, however, to stop runaway AI analytics costs and delays! A new complementary strategy, namely information efficiency, was needed.
Jeff realized that submodular optimization was a potential answer. This mathematical technique can be used to order data along the lines of diminishing marginal returns, thereby prioritizing data records in terms of their biggest contribution to the information content of the entire data set, and then relegating redundant data to the end. If submodular optimization could be automated and greatly simplified, it could quickly and economically reduce the amount of data required for many AI processes by orders of magnitude!
Jeff and his team got to work and developed proprietary calibrated submodular (CaSM) functions which dramatically reduce the labeling and training data required for AI. The CaSM functions don’t replace AI algorithms, they just make machine learning run much faster since the data sets are vastly smaller but still contain all the important information. And the calibration process is easy for any kind of featurized data, whether health records, customer profiles, network logs, biological signals, sensor data, or even images, audio, and video streams. The team productized this breakthrough technology and delighted their first customers in the summer of 2020.
Copyright © Summary Analytics, Inc.