Floating-Point Arithmetic for AI Inference - Hit or Miss?

Qualcomm

Qualcomm, Tuesday, April 11, 2023, Press release picture

OnQ Blog

Artificial intelligence (AI) has become pervasive in our lives, improving our phones, cars, homes, medical centers, and more. As currently structured, these models primarily run in power-hungry, network-dependent data centers. Running AI on edge devices such as smartphones and PCs would improve reliability, latency, privacy, network bandwidth usage, and overall cost.

To move AI workloads to devices, we need to make neural networks considerably more efficient. Qualcomm has been investing heavily in the tools to do so, most recently showcasing the world's first Stable Diffusion model on an Android phone. Bringing models like GPT, with its hundreds of billions of parameters, to devices will require even more work.

The Qualcomm AI Research team has been making advances in deep learning model efficiency for the past years with state-of-the-art results in neural architecture search, compilation, conditional compute, and quantization. Quantization, which reduces the number of bits needed to represent information, is particularly important because it allows for the largest effective reduction of the weights and activations to improve power efficiency and performance while maintaining accuracy. It also helps enable use cases that run multiple AI models concurrently, which is relevant for industries such as mobile, XR, automotive, and more.

Recently, a new 8-bit floating-point format (FP8) has been suggested for efficient deep-learning network training. As some layers in neural networks can be trained in FP8 as opposed to the incumbent FP16 and FP32 networks, this format would improve efficiency for training tremendously. However, the integer formats such as INT4 and INT8 have traditionally been used for inference, producing an optimal trade-off between network accuracy and efficiency.

We investigate the differences between the FP8 and INT8 formats for efficient inference and conclude that the integer format is superior from a cost and performance perspective. We have also open sourced the code for our investigation for transparency

Differences between floating point and integer quantization

Our whitepaper compares the efficiency of floating point and integer quantization. For training, the floating-point formats FP16 and FP32 are commonly used as they have high enough accuracy, and no hyper-parameters. They mostly work out of the box, making them easy to use.

Going down in the number of bits improves the efficiency of networks greatly, but the ease-of-use advantage disappears. For formats like INT8 and FP8, you have to set hyper-parameters for the representable range of the distributions. To get your original network accuracy back, you also have to spend some extra time quantizing these networks. Either in some simple quantization steps called post-training-quantitation (PTQ), or by training the network in a quantized way all together, called quantization-aware-training (QAT).

Given that most training in the industry is currently conducted with entire networks in FP32, or sometimes FP16 with mixed precision, the step to having some parts of a network run in FP8 is an appealing potential speed-up for the costly and time-intensive training procedures in deep learning. This topic has gained quite some traction lately, so we set out to find out what this development means for efficient inference on edge devices. Specifically, we look at both the hardware considerations for the formats and the effect of the chosen formats on neural network accuracy.

Our whitepaper shows that the hardware implementation of the FP8 format is somewhere between 50% to 180% less efficient than INT8 in terms of chip area and energy usage. This is because of the additional logic needed in the accumulation of FP formats versus integer formats. This seems like a broad range, but the actual efficiency depends on many hardware design choices that vary greatly. A similar conclusion was reached recently by Microsoft and Meta: Floating-point arithmetic is just much less efficient than integer arithmetic.

This means that FP8 will have to be significantly more accurate than INT8 to be worthwhile from a hardware-efficiency perspective.

Quantization-aware training (QAT) results

Quantization-aware training is the quantization scenario most like how a format like FP8 would be used in practice, you train with the format while optimizing your neural network. We show the QAT results below for different tested formats. We see that all quantized networks get close to their original floating-point performance. In most cases, we even see an improvement over the baseline results of FP32. The reason for this is simply that training these models for longer generally improves results, even if we would train longer in FP32.

The results are quite clear: INT8 tends to perform better than other formats for most types of networks. It is only for transformers that FP8 performs better, but in the paper, we delve deeper into transformers and show that this difference is easily mitigated. The conclusion is simple however: there is no a-priori reason to believe that the FP8 format is more accurate for neural networks. In some cases, even when going as low as 4-bit weights with the W4A8 format (as indicated in the rightmost column of Figure 2), the accuracy is comparable to the FP8 format.

Can we convert FP8 to INT8 with good accuracy?

Since there are some benefits to using the FP8 data format for training, we also investigated the performance when FP8-E4 (a FP8 format with 4 exponent bits) trained networks are converted naively to INT8 for inference. We found that INT8 can precisely represent roughly 90% of the range covered by the FP8-E4 format without any quantization error. The remaining 10% of the range close to 0 incurs a small quantization error.

The general conclusion is that for networks that were originally easy to quantize from FP32 to INT8, the conversion is expected to be smooth, and can in several cases be done directly.

For networks that were already problematic to convert to INT8 from FP32 with simple PTQ techniques, mostly networks with significant outliers, similar issues will arise when converting from FP8 to INT8. However, since these latter networks are trained to deal with the reduced precision of the FP8 format, the INT8 conversion results from FP8 are better when compared against INT8 simple conversion from FP32. Moreover, INT8 QAT can be further employed to recover more accuracy in such cases.

The path towards better AI inference on device

Overall, integer quantization is still the way to do efficient AI inference. With varying effort levels, you can achieve significant efficiency benefits without sacrificing much accuracy.

For optimizing networks even further, opting for QAT can get the networks into the W4A8 (4-bit weight and 8-bit activation) regime. This is very achievable for a wide range of networks. Transformer-based large language models such as GPT, Bloom and Llama tend to benefit greatly from this jump in efficiency from 8- to 4-bit, as they are weight-bounded. Several works have shown that 4-bit weights are not only possible for large language models, but this is also optimal and possible to do in the PTQ setting. This is an efficiency boost that currently does not exist in the floating-point world.

To sum it all up, we see that floating-point format FP8-E4 is not a replacement for INT8 in terms of performance and accuracy. In most cases, they perform worse. Only in some extremely specific scenarios where layers have significant outliers, can the floating-point format perform better in terms of accuracy. We are confident that our proposed solutions will lead to a better and more seamless implementation of large AI models on edge devices. For this purpose, the Qualcomm Innovation Center has open-sourced the AI Model Efficiency Toolkit (AIMET). This allows developers to quantize their models more easily and implement AI on device more efficiently.

View additional multimedia and more ESG storytelling from Qualcomm on 3blmedia.com.

Contact Info:

Spokesperson: Qualcomm
Website: https://www.3blmedia.com/profiles/qualcomm
Email: info@3blmedia.com

SOURCE: Qualcomm



View source version on accesswire.com:
https://www.accesswire.com/748643/Floating-Point-Arithmetic-for-AI-Inference--Hit-or-Miss

News Provided by ACCESSWIRE via QuoteMedia

QCOM
The Conversation (0)
cell phone lying on table with app icons floating above it

How to Invest in Mobile Apps (Updated 2024)

The ubiquity of mobile devices and their prominence in everyday life has led to the development of mobile apps for everything from gaming and dating to banking and stock trading.

Mobile apps began rising to prominence in 2007 with the launch of the iPhone, which heralded a new era in connectivity brought about by revolutionary touch technology. The field has grown widely from thereon out, and the diversity of today’s offerings makes investing in mobile apps an appealing prospect.

With about 2.87 million apps in Google’s (NASDAQ: GOOGL) Google Play Store and around 1.96 million apps available in Apple’s (NASDAQ:AAPL) App Store, there is no shortage of app choices for mobile devices.

Keep reading...Show less
Icons for various apps floating above a smartphone.

Social Media Stocks: 5 Biggest Companies

The world’s largest social media platforms have revolutionized the way people connect on the internet, and the companies behind these platforms can offer major investment opportunities.

This year's strong rally in technology stocks, led by Meta Platforms (NASDAQ:FB), is a clear example of the huge presence social media companies have in the stock market. In late April, shares of the social media giant jumped 14.6 percent on higher-than-expected earnings. The news came alongside increasing investor confidence in the broader tech industry.

“Meta earnings show the company’s commitment to cost discipline while driving accelerating N-T revenue growth and also continuing to invest in longer-term transformational technologies like artificial intelligence (AI) and the metaverse,” said Doug Anmuth, an analyst at JPMorgan Chase (NYSE:JPM).

Keep reading...Show less
BlackBerry Extends Partnership with Leading Managed Security Services Provider  to Ensure SMBs are Set Up for Cyber Success

BlackBerry Extends Partnership with Leading Managed Security Services Provider to Ensure SMBs are Set Up for Cyber Success

BlackBerry Limited (NYSE: BB; TSX: BB) and Solutions Granted today announced an extended partnership, naming the leading cybersecurity services provider a Master Managed Security Services Provider (MSSP), enabling it to better scale and meet the growing demand for cybersecurity services among small and medium-sized businesses (SMBs).

BlackBerry Logo Black (PRNewsfoto/Blackberry Limited)

"Solutions Granted has been honored as BlackBerry MSSP Partner of the Year for North America for five consecutive years and we're excited to take our partnership to the next level by crowning them as our top Master MSSP," said Adam Enterkin , Chief Revenue Officer, Americas, BlackBerry Cybersecurity. "BlackBerry is dedicated to increasing its focus on MSSP partners to ensure they're set up for success. Endpoints are proliferating, and so are the cyberattacks against them. Our extended partnership with Solutions Granted will help hundreds of small and mid-size businesses continuously adapt to an ever-changing threat landscape."

As a 'Master MSSP', Solutions Granted will be better positioned to help its own partners to deliver Managed Detection and Response (MDR) and other Managed Security Services to their mid-market and SMB clients.  In partnership with BlackBerry and heavily leveraging the Cylance® AI-powered portfolio, Solutions Granted helps thousands of clients secure their environments and prevent attacks. By working with Solutions Granted, MSSPs and managed service providers (MSPs) can offer industry leading managed security, without making the significant investment of building out their own security operations center (SOC).

CylanceENDPOINT™ is among the solutions it helps managed service providers (MSPs) deploy to clients, either as individual managed services or integrated into a SOC-as-a-service offering.

"BlackBerry's support for our business model provides the flexibility we need to continue to meet customer demand and provide the best possible product support for their business needs," said Michael E. Crean , Chief Executive Officer, Solutions Granted. "We value the investment BlackBerry is making in our partnership and know this will go a long way in setting up our customers for success."

To learn more about BlackBerry MSSP Partners, visit blackberry.com/us/en/partners/mssp-partners .

About BlackBerry

BlackBerry (NYSE: BB; TSX: BB) provides intelligent security software and services to enterprises and governments around the world.  The company secures more than 500M endpoints including over 215M vehicles.  Based in Waterloo, Ontario , the company leverages AI and machine learning to deliver innovative solutions in the areas of cybersecurity, safety and data privacy solutions, and is a leader in the areas of endpoint management, endpoint security, encryption, and embedded systems.  BlackBerry's vision is clear - to secure a connected future you can trust.

BlackBerry. Intelligent Security. Everywhere.

For more information, visit BlackBerry.com and follow @BlackBerry.

Trademarks, including but not limited to BlackBerry and EMBLEM Design are the trademarks or registered trademarks of BlackBerry Limited, and the exclusive rights to such trademarks are expressly reserved.  All other trademarks are the property of their respective owners.  BlackBerry is not responsible for any third-party products or services.

About Solutions Granted Inc.

Solutions Granted is a Master Managed Security Services Provider (Master MSSP). They offer cybersecurity solutions to North American MSPs and MSSPs and are committed to delivering solutions without requiring minimums, commitments, or long-term contracts. They proudly offer many security layers as well as a 24x7 U.S.-based Security Operations Center (SOC). Over the past several years, Solutions Granted has emerged as a clear leader in the channel, by winning countless awards including the CRN Security 100 list, Top 100 MSSP List, Top Global MSSP List, and BlackBerry MSSP Partner of the Year. Learn more at https://www.SolutionsGranted.com

Media Contacts:

BlackBerry Media Relations

+1 (519) 597-7273

mediarelations@BlackBerry.com

Cision View original content to download multimedia: https://www.prnewswire.com/news-releases/blackberry-extends-partnership-with-leading-managed-security-services-provider-mssp-to-ensure-smbs-are-set-up-for-cyber-success-301803800.html

SOURCE BlackBerry Limited

News Provided by PR Newswire via QuoteMedia

Keep reading...Show less
BlackBerry's Quarterly Threat Intelligence Report Finds Banks, Healthcare Providers and Food Retailers are Top Targets for Cybercrime

BlackBerry's Quarterly Threat Intelligence Report Finds Banks, Healthcare Providers and Food Retailers are Top Targets for Cybercrime

Geopolitical unrest positions key industries as targets for state-sponsored actors and financially motivated attacks

BlackBerry Limited (NYSE: BB; TSX: BB) today released its latest Quarterly Global Threat Intelligence Report highlighting an increase in cyberattacks directed at financial institutions, food retailers and healthcare providers, with 60 percent of all attacks targeting these three key industries.

News Provided by PR Newswire via QuoteMedia

Keep reading...Show less
person using credit card to pay for something on their phone

Mobile Investing in Australia

After lagging behind for a prolonged period, Australia's tech sector is ramping up at an accelerated pace. The tech sector is now equivalent to 8.5 percent of the country's GDP as of the end of 2021, an increase of 26 percent since the onset of COVID-19 through June 2021 and a massive 79 percent increase over the past five years. Tech contributes AU$167 billion to the Australian economy, trailing only the mining (AU$205 billion) and financial/insurance (AU$169 billion) sectors.

Australia's characteristically resilient economy — which had not experienced a recession in nearly 30 years prior to COVID-19 lockdowns — has provided a sturdy backdrop for its growing tech sector. The growth in the tech sector’s contribution to the GDP has outpaced average growth of other industries by more than 400 percent, a gain partly attributable to accelerated digital technology adoption during the pandemic.

This dramatic expansion is largely in response to Australia's need to catch up to the rest of the world and assert itself in the global tech marketplace. Should the tech sector continue to grow at its current rate it will eventually surpass the relative GDP contribution of the long dominant mining sector. This will also complete the process of bringing Australia more in line with other western economies such as the UK, and notably Canada, which is comparable to Australia in terms of its dominant mining and agricultural industries.

Keep reading...Show less
DGTL Holdings Completes Acquisition of Engagement Labs

DGTL Holdings Completes Acquisition of Engagement Labs

DGTL Holdings Inc. (TSXV: DGTL) (OTCQB: DGTHF) (WKN: A2QB0L) (FSE: D0G) ("DGTL Holdings") and Engagement Labs Inc. (TSXV: EL) ("Engagement Labs") are pleased to announce that DGTL has completed its previously announced acquisition of Engagement Labs by way of a plan of arrangement (the "Arrangement").

Transaction Details

News Provided by Newsfile via QuoteMedia

Keep reading...Show less

Latest Press Releases

Related News

×