Archive for January, 2008

Get to know a radeon part 3

Wednesday, January 30th, 2008

Command Processor

The Command Processor (CP) is a part of the chip that parses incoming packets from the driver and programs the GPU appropriately based on those packets. This is a more efficient programming method than MMIO as command streams can be stored in buffers and queued up for eventual processing and, since a single command packet can encompass a fairly substantial set of register writes bandwidth requirements are reduced.

Ring Buffer

To use the CP, a block of GART memory is allocated for the ring buffer. This buffer is shared by the driver and the CP. The driver writes new packets into the ring and the CP reads packets out of the ring and processes them, programming the card appropriately. In order to work properly, both the driver and the CP must have a consistent view of the buffer. To do this, both sides keep track of both a read pointer and a write pointer. The write pointer tracks where in the ring buffer the driver is writing new packets and the read pointer tracks where the CP is reading packets for processing. If the CP’s pointers are equal, the queue is empty and the CP will go idle. Periodically, the driver updates the CP’s copy of the write pointer and the CP updates the driver’s copy of the read pointer so both sides have a consistent view.

Packets

There are 3 types of packets that are primarily used with the CP: type 0, type 2, and type 3. Type 0 packets are used to write data to a number of consecutive registers starting at a particular offset. Type 2 packets are filler packets; NOPs. And type 3 packets are opcode packets that are used to program specific 2D/3D/video tasks.

Indirect Buffers

In addition to the ring buffer, the CP is able to read from buffers in GART memory called Indirect Buffers. The driver can use indirect buffers to store 2D/3D/video command streams. When the driver wants to execute these buffers, it queues up writes to the indirect buffer control registers (base and size) via type 0 packets in the ring buffer. When the CP encounters this it starts fetching the command stream from the indirect buffer until the end of that buffer at which time it goes back to processing the ring buffer.

Get to know a radeon part 2

Tuesday, January 29th, 2008

Memory Controller

There are two views of vram, the GPU’s view and the CPU’s view. When I say CPU, that could be anything that’s running on your main processor: a device driver, a GL application, the xserver, a window manager. In most cases the CPU accesses vram via a PCI BAR (Base Address Register).

BARs are resources on PCI devices that are used for configuration and resource access. Video devices generally have at least two BARs, one for mapping MMIO register space, and one for mapping the framebuffer. Since the largest BAR size is 256 MB, this poses a problem for CPU access to vram beyond 256 MB. Vendors could add multiple BARS for access, but this would eat up a lot of address space.

Fortunately, the GPU does not have a problem accessing the full amount of vram as it has direct access to it via the built in memory controller. So while your new 512 MB card may only provide CPU access to 256 MB of vram, the gpu can use the other 256 MB to store things that the CPU doesn’t need ready access to.

The GPU has it’s own address space and it’s resources can be mapped into the address space however it likes. It can map vram to one block of addresses and GART memory (system memory) to another block. Internal GPU clients (CRTCs, overlays, 2D/3D engines, video decoders, etc.) access vram or gart space via the memory controller’s address space. So if you wanted to blit (copy) something from one location in vram to another, the 2D engine would use GPU addresses.

Get to know a radeon part 1

Monday, January 28th, 2008

ATOMBIOS

You’ve probably heard about it if you’ve been following xorg development over the last few months, however there is some confusion as it what it is and how it can be used. ATOMBIOS is a collection of card specific data tables and scripts stored in the rom on recent radeon cards (r4xx cards had the initial version of it). The data tables store card specific information such as connector information and memory timings. The scripts allow you to program specific functionality on a particular card using a common API. Some of the functionality includes: setting crtc timing, setting up DACs, TMDS, and TV encoders, DPMS, crtc routing, and card initialization. The x86 real mode video bios, the windows driver, and fglrx all use ATOMBIOS to initialize and program the card.

In order to use ATOMBIOS, the driver provides a wrapper for the parser and some hooks for touching the hw (read/write memory mapped register, read/write PCI register, allocate/free memory, sleep, etc.). To run one of the scripts you point the parser at the script and it parses it, calling the driver supplied hw hooks to actually program the hw. Each script takes a struct that specifies what parameters you want to use for that script as an input.

For example, if you execute a script to program the pixel clock, you would supply the parameters you want to program (pll1/2, dot clock, M, N, P values, etc.) and execute the script. The parser would then run through the script calling the hw access hooks and programming the card. It might read in one of the PLL regs, then adjust some values and write it out, then wait for a few microseconds for the clock to lock, then read back the value, then write some other register value, etc.

The scripts do basically the same thing you would do if you were programming the registers directly, but since they can be tailored to specific hw, so there is less need for card specific workarounds in the driver code and hw differences are hidden behind a common API.

Get to know a radeon part 0

Monday, January 28th, 2008

Over the next few days I’ll be writing a series of posts detailing, at a relatively high level, how various aspects of the radeon programming model works. If anyone has any questions, please feel free to ask. The hardware is really not as complicated to program as it might seem :)

R300 Render Accel… more or less

Wednesday, January 16th, 2008

I’ve just pushed the first pass at EXA render accel for r3xx/r4xx cards.  Right now it only supports transforms for rotation, no blending yet.  It’s based on the initial implementation from Wolke Liu with additional lock-up fixes by Dave Airlie.