Deep Learning Hardware


SkyHighTech

Uploaded on Sep 10, 2022

Category Technology

“But what about GPUs?” one might ask. The GPU was a development of “generalized execution” for graphic applications. It was a step towards specialization, albeit a wide one. The very repetitive nature of graphics operations allowed to “divide” the instruction overhead between large amount of data executions. It achieves greater efficiency than the CPU for a certain kind of applications, but it is still an ISA-based machine and, as such, it is hitting the same evolutionary wall.

Category Technology

Comments

                     

Deep Learning Hardware

Deep Learning Hardware On top of this, we hit the end of Dennard Scaling due to the decrease in our ability to reduce supply voltage when moving from one process technology to the next. For over 30 years, the evolution of general- purpose computing was impressive and fast- paced, doubling performance every 2 years or so. But then, some would say not unexpectedly, it hit the metaphorical wall. Know More-Deep Learning Hardware As so many have commented, the process progress has started to slow down, and Moore’s Law entered its last throws. The process technology, which did most of the heavy lifting in increasing performance from generation to generation, is slowing down. Additionally, the growth in single thread general purpose performance has peaked, and cooling devices have reached their "power envelop" (1). As a result, we have reached the stage where not every transistor on a die can operate simultaneously. Only a portion of the on-die functions may be fully active at any given time. This circumstance is known as "Black Silicon." The same is true for multi-core chips, as not all cores can operate at full speed simultaneously We had observed the usage of various applications in the cloud and on the edge increase exponentially. At any given time, only a portion of the on-die functions may be fully active. This situation is referred to as "Black Silicon." Due to the performance demands of these new applications, compute performance was quickly crowned as king, with an increasing demand for power (2). Additionally, the increase in single thread general purpose performance has peaked, and the "power envelop" of cooling devices has been reached (1). As a result, we have reached a point where not all of the transistors on a die can operate at once. Thus, the inevitable target of our current computing environment is to increase the possible performance per unit of power, i.e. reach the best possible power efficiency.