Graphics processing unit

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.

Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.

In the 1970s, the term “GPU” originally stood for graphics processor unit and described a programmable processing unit independently working from the CPU and responsible for graphics manipulation and output. Later, in 1994, Sony used the term (now standing for graphics processing unit) in reference to the PlayStation console’s Toshiba-designed Sony GPU in 1994. The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as “the world’s first GPU”. It was presented as a “single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines”. Rival ATI Technologies coined the term “visual processing unit” or VPU with the release of the Radeon 9700 in 2002, but this term has generally fallen into disuse. Nintendo uses the term “integrated graphics processor” (IGP) in reference to its own GPUs (such as used in the Wii), which incorporate an ARM CPU core, video acceleration hardware, and on-chip memory; however, an IGP may also refer to a graphics card that employs separate processors on a single board.

Early mobile phones with 3D-capable GPU often use graphics processing units (GPUs) designed specifically for mobile phones. The abbreviation SoC stands for system-on-a-chip, which is essentially an entire computer and graphics subsystem on a single integrated circuit.

Since 2010, the use of GPUs in smartphones and tablet computers such as Apple’s iPhone and iPad has increased dramatically. For example, in 2011 HTC Corporation announced the formation of a new graphics processing division at the company, headed by Raj Talluri, who had been with the company since 1992. In 2012, Apple integrated IBM’s graphics processing unit into their A4 processor, which is used in many models of iPhone and iPad.

In 2013, Nvidia announced that they will integrate GPGPU into their GPUs starting from 2015. GPGPU integrates general-purpose CPU computing hardware with specialized video acceleration hardware to accelerate discrete applications.

From 14 May 2016, the Verge reported that NVidias (and AMD’s) flagship GPU chip, the 10 nm Volta (Volt), was designed to be a CPU and a GPU on a single chip called NVIDIA Tesla V100. “NVIDIA has always been known for its CPU-first approach to designing GPUs and now it looks like it is going to stay that way. The company is pushing its new Volta GPU with deep learning and AI features, but they are also bringing them to the masses with performance that rivals that of Intel’s upcoming Kaby Lake chips.”

From 2016, high-end Nvidia graphics cards (for example, the Nvidia GeForce GTX 1080 It) began to incorporate the GP102 chip; the chip supports raytracing. This is especially important for VR applications. In March 2017, Nvidia released a Tesla P100 card based on the GP102 chip. The card is aimed at machine learning. As a result of the considerable differences in performance, Nvidia generally recommends that customers purchase their GPUs based on their intended use rather than the particular model number.

In April 2017, Nvidia announced a new line of GPUs called Quadro RTX GPUs, which support deep learning and ray tracing for cinematic experiences.

In May 2017, AMD launched new GPUs called Radeon RX Vega. They support ray tracing.

In September 2017, Nvidia released their first GPU based on the Volta architecture, the Titan V. It is aimed at machine learning applications. “The card is meant to take advantage of Nvidia’s new NVLINK technology, which allows multiple cards to communicate with each other in order to get more work done without slowing down the entire system. Each device can also communicate directly with Tesla V100 accelerators that are deployed into the data center. This card has no issue crunching through dense math problems.”

Since the 2010s, it is increasingly common for a GPU to use multiple graphics processing clusters (or ‘streaming multiprocessors’ or SMs) to increase parallelization of computations. The GeForce 8 series was the first to do this, and starting with the GeForce 8800 GTX, Nvidia has included an additional streaming multiprocessor (SM) in many of their high-end GPUs. In-depth analysis of the use of multiple SMs in Nvidia’s latest GPUs was published by NVIDIA at SIGGRAPH 2017.

As of August 2017, AMD reported that it has six SKUs (streaming multiprocessors) in their “Vega” graphics architecture: Vega 10 and Vega 20 (x86 and Graphics Core NeXT), Vega 8 and Vega 16 (GPU only), and their mobile counterparts embedded in mobile chipsets such as Vega Mobile.

A key driver of the adoption of GPU for general-purpose computing is the ability for GPUs to perform many simultaneous operations (typically referred to as threads) in a relatively short period of time. The performance scales with the number of concurrent threads (for example, 2× throughput is achieved by using twice as many threads).

This has an increasing impact on the design and implementation of multi-core processors, where designers strive to keep power consumption in check while increasing performance through large numbers of cores. The growing use of GPUs in many applications allows manufacturers to invest some of their R&D budget on developing more efficient multi-core processors that provide better performance per watt than CPUs alone.

The number of concurrent threads that can be processed by a GPU also depends on the choice of programming model, the GPU architecture, and the runtime library implementation. Today’s GPUs are programmable in several ways. The original model for parallel processing is a “fine-grained” programming model that allows threads to be scheduled for execution independently. Examples of this model include the original OpenCL, Microsoft Direct Compute, and CUDA.

This model makes it possible to write multi-threaded programs in which each thread is bound to a single GPU execution unit. It simplifies the design of multi-threaded software by eliminating synchronization overhead between threads, but performance per thread is often lower than when threads are allowed to share execution units.

A second programming model, known as “coarse-grained”, is more suitable for applications with irregular parallelism. The underlying principle of coarse-grained parallelism is similar to that of vectorized processing, in that the programmer only specifies a single program loop without any parallel execution control. Coarse-grained programs are typically written with OpenCV directives in C/C++ or using the new parallel for statement in C#. However, the GPU architecture necessitates that this for loop be run multiple times to process data in parallel. To minimize the amount of redundant work, the number of loop iterations is often calculated using only one multiplier and one additive offset, which can result in suboptimal performance.

A growing trend is to combine both approaches within a single program. For example, Nvidia CUDA initially used only a fine-grained programming model but added support for coarse-grained programs in CUDA 4.0.

Leave a Comment