1) Create a device independant bitmap using CreateDIBSection (use a negative heigh to make the DIB top->down so the pointer is to pixel (0,0) in the upper left corner)
2) Create a memory DC using CreateCompatibleDC with the DC for your window (which should have the 'Own DC' flag set in the window class)
3) Select the DIB into the DC created in step #2
4) Create a device dependant bitmap using CreateCompatibleBitmap with the DC for your window
5) Create a memory DC using CreateCompatibleDC with the DC for your window
6) SelectObject the bitmap into the DC creatd in step #5
loop
7) Call GdiFlush() to ensure nothing is messing with the DIB (mainly relevant if you're using GDI functions to draw things onto the DIB)
8) Draw to the DIB created in #1 using the pointer given by CreateDIBSection (don't forget each line is a multiple of 4 bytes, even if you're using 24-bit color, so if your line would be 15 bytes {5 pixels @ 24bpp}, increment the pointer by 16 to get to next line)
9) Use BitBlt or StretchBlt to copy from the DC in step #2 to copy to the DC in step #5
10) Use BitBlt or StretchBlt to copy from the DC in step #5 to your window
endloop
11) SelectObject old bitmaps to their DCs, DeleteDC on both created DCs, DeleteObject on the bitmap and DIB
The reason for the separate bitmap is basically as a double buffer (it stores the complete scene, so you could draw it on the WM_PAINT event to ensure the window is redrawn with the latest frame, for example), but IME it also helps speed things up for some reason. The reason I didn't use StretchDIBits to blit the DIB is that it seemed to be completely sporadic - sometimes it didn't work at all, sometimes it would bring the program to a crawl, and sometimes it worked just fine. It could be the video drivers or something like that, but whatever the cause it was not reliable.
The width if your DIB must be a multiple of 4.