About the Problems We Address

  • AI financial and environmental costs

    As AI models get more complex, the amount of data needed to train them and, therefore, the computation, storage, and energy costs, are growing exponentially. Billions of dollars are being invested in new processor technology and new algorithms, but costs are still increasing. A new tool for “information efficiency” is needed as well.
    Read related articles.

  • Data labeling

    Insiders know that data labeling is the skeleton in the closet for nearly every AI project. Automated labeling is mediocre at best and manual labeling is plagued with errors due to operator fatigue, mainly due to repetition from redundant data. Labeling random samples reduces training accuracy by missing corner cases. Mathematically eliminating redundant data, however, would mean less labeling while still keeping all the information content. This approach would also result in fewer errors, as fewer and unique records keep human labelers more vigilant.
    Read related articles.

  • Human analyst alert and decision fatigue

    When dealing with data analytics, data labeling is not the only area of errors due to human fatigue. AI used for cybersecurity and financial decision making are just two examples of applications prone to too many alerts and resulting fatigue errors. Prioritized summary alerts would help analysts focus on the most important alerts and avoid boredom and resulting fatigue errors, while linking redundant alerts to those prioritized in the summary would ensures that nothing is missed.
    Read related articles.

  • Data bias

    Not a day goes by without a press article on the evils of AI and ML due to data bias. But how do you find the unknown bias in your overwhelming datasets? A tool is needed to identify the corner cases in your data, ascertain that ALL desired constituents are included, and make sure those corner cases are not dominated by the majority data when training your models.
    Read related articles.

About the Company

Summary Analytics was founded in 2018 by Professor Jeff Bilmes from University of Washington in Seattle after over 25-years of research in the areas of artificial intelligence (AI) and submodular optimization. From his work in AI, Jeff saw how the computational power required to train state-of-the-art AI models was growing exponentially, just as Moore’s Law continued losing steam. The problem was being addressed primarily using machine learning algorithmic advances and increased parallel compute system power, and these help. More was needed, however, to stop runaway AI analytics costs and delays! A new complementary strategy, namely information efficiency, was needed.

Jeff realized that submodular optimization was a potential answer. This mathematical technique can be used to order data along the lines of diminishing marginal returns, thereby prioritizing data records in terms of their biggest contribution to the information content of the entire data set, and then relegating redundant data to the end. If submodular optimization could be automated and greatly simplified, it could quickly and economically reduce the amount of data required for many AI processes by orders of magnitude!

Jeff and his team got to work and developed proprietary calibrated submodular (CaSM) functions which dramatically reduce the labeling and training data required for AI. The CaSM functions don’t replace AI algorithms, they just make machine learning run much faster since the data sets are vastly smaller but still contain all the important information. And the calibration process is easy for any kind of featurized data, whether health records, customer profiles, network logs, biological signals, sensor data, or even images, audio, and video streams. The team productized this breakthrough technology and delighted their first customers in the summer of 2020.


Contact Information