What are the unique features of OpenAI's self-developed chips? I

release time:2023-10-10Author source:SlkorBrowse:8437

Recently, Reuters reported that OpenAI is considering self-developed chips. According to the report, since last year, OpenAI has been actively seeking solutions to the shortage of training chips for artificial intelligence models (specifically, the scarcity of Nvidia GPUs). Currently, OpenAI is actively preparing for self-developed chips to meet future demands for AI chips. In fact, not long ago, OpenAI's CEO Sam Altman publicly acknowledged the significant impact of Nvidia GPU shortages on OpenAI and the entire AI industry. Additionally, starting this year, OpenAI has been recruiting talents in hardware-related fields, with several positions related to software-hardware co-design advertised on their official website. Moreover, in September of this year, OpenAI also recruited Andrew Tulloch, a renowned expert in the field of AI compilers, further indicating their investment in self-developed chips. OpenAI officials have declined to comment on this matter, but if it comes to fruition, OpenAI will join the ranks of Silicon Valley tech giants such as Google, Amazon, Microsoft, and Tesla in developing their own chips.

Why does OpenAI want to develop its own chips?

As mentioned earlier, the main reason behind OpenAI's development of their own chips is due to the shortage of GPUs. More specifically, the prices of both purchasing Nvidia GPUs and using GPU-based cloud services are too high, especially considering the exponential increase in computing power required for OpenAI's future model training.

OpenAI has been strategically investing in generative artificial intelligence for several years. With the release of GPT-3 last year and ChatGPT in the latter half of the year, the capabilities of these large-scale generative language models have improved significantly over the past few years, reaching a stage where they can engage in meaningful conversations with humans. As a result, OpenAI has emerged as a leader in the field of artificial intelligence, and generative AI has become one of the most impactful technologies expected to shape human society in the coming years. According to Reuters, OpenAI recorded revenue of $28 million last year, but incurred an overall loss of $540 million, primarily due to the expenses related to computing power. It is worth noting that this massive loss of $540 million occurred in 2022, just before the surge in generative artificial intelligence.

Looking ahead, the expenses related to computing power are expected to increase exponentially. This is primarily due to:

The competition in large-scale models is becoming more intense, and the speed of model evolution is getting faster, requiring rapidly increasing computing power. In addition to OpenAI, tech giants such as Google are also pushing their own large-scale models, which has significantly accelerated the pace of model evolution. It is expected that a new generation of models will need to be updated every quarter to half a year, and the computing power required for the most cutting-edge models is estimated to increase by an order of magnitude every year.
Large-scale models are being applied in a wider range of scenarios. Currently, Microsoft and Google have already begun using large-scale models in the fields of search and code writing. It is expected that there will be more applications for large-scale models in the future, including automatic task processing, multimodal Q&A, etc. These will greatly increase the number of different models, and also significantly increase the total computing power required for model deployment.

According to analysis by the American financial company Bernstein, if the traffic of ChatGPT reaches one-tenth of Google search (which is also one of OpenAI's important goals in the future), OpenAI's annual GPU expenses would amount to $16 billion. Such expenses could potentially become a significant bottleneck for OpenAI's further scaling.

So, how much cost can OpenAI save if they develop their own chips? Currently, the purchase cost of a server with eight Nvidia H100 GPUs is approximately $300,000, and the total cost of using this server for three years, including the premium from cloud service providers, is around $1 million (this is the official price quoted by AWS, with prices from other cloud service providers expected to be in the same range). If OpenAI can reduce the cost of such an eight-GPU server to below $100,000 with their self-developed chips, it would significantly lower their expenses. On the other hand, if the self-developed chips are successful, it is very promising to control the cost of a single accelerator card to below $10,000 in the case of large-scale deployment. In other words, controlling the cost of an eight-GPU server below $100,000 is not an unattainable goal.

To be continued...

Previous:What are the unique features of OpenAI's self-developed chips? II
Next:New advances in the development of domestically-produced EUV lithography machine light sources