Scraping Text from the Screen
Scraping Text from the Screen
Contents
- Introduction
- Locating the Text to Read
- Locating the Text's Window
- Reading Text from the Static Control
- Naïve Method
- Naïve Method II
- The Memory Mapped File Standard Allocator
- Naïve Method III
- The Final Method
- Reading Text from the List View Control
- Reading Text from the Tab Control
- Bringing it all Together
- Footnotes
Introduction
In this article, we examine methods for reading text displayed on screen by another process (also called screen-scraping). The methods presented here can be useful in programmatically determining the state of another program from the point of view of a user. Though the attached example code is written for Windows Mobile 5 and newer, the concepts presented should transfer to big Windows without difficulty.
Locating the Text to Read
The first thing we must do is determine what text we want to read. Microsoft introduces a useful concept in their Spy++ tool called the Finder which extends handily to our application.
The user can drag the target icon from the Finder to the object containing the text to be read.
In our application, we implement the Finder control by overriding the WM_LBUTTONUP
and WM_COMMAND
messages. We use SetCapture()
so that our application can receive the WM_LBUTTONUP
message even when the cursor moves outside the boundary of our application's dialog. Note that while the image above shows the target cursor over the item, that won't actually happen in Windows Mobile. The image is altered to make its meaning more obvious.
BEGIN_MSG_MAP( CMainDlg )
MESSAGE_HANDLER( WM_INITDIALOG, OnInitDialog )
// ...
COMMAND_ID_HANDLER( IDC_FINDER, OnFinder )
MESSAGE_HANDLER( WM_LBUTTONUP, OnLButtonUp )
END_MSG_MAP()
LRESULT OnInitDialog( UINT /*uMsg*/,
WPARAM /*wParam*/,
LPARAM /*lParam*/,
BOOL& bHandled )
{
// ...
// pre-load the images used by the finder control
finder_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER ) );
finder_empty_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER_EMPTY ) );
return ( bHandled = FALSE );
}
LRESULT OnFinder( WORD /*wNotifyCode*/,
WORD /*wID*/,
HWND /*hWndCtl*/,
BOOL& /*bHandled*/ )
{
// capture the cursor so we can detect WM_LBUTTONUP messages even when the
// cursor is not within our window boundary.
SetCapture();
finder_.SetBitmap( finder_empty_image_ );
return 0;
}
LRESULT OnLButtonUp( UINT /*uMsg*/,
WPARAM /*wParam*/,
LPARAM lParam,
BOOL& /*bHandled*/ )
{
if( m_hWnd == GetCapture() )
{
ReleaseCapture();
finder_.SetBitmap( finder_image_ );
// get the screen coordinates of our cursor
// get the text on the screen at the given point
// display the text to the user
}
return 0;
}
/// finder control
CStatic finder_;
/// image of the finder in its native state
CBitmap finder_image_;
/// image of the finder in its empty state
CBitmap finder_empty_image_;
Locating the Text's Window
WM_LBUTTONUP
gives us the client coordinates of the point wherever the user releases the stylus or left mouse button. We will use the WindowFromPoint()
function to determine what window lives at those coordinates.
LRESULT OnLButtonUp( UINT /*uMsg*/,
WPARAM /*wParam*/,
LPARAM lParam,
BOOL& /*bHandled*/ )
{
// ...
// get the screen coordinates of our cursor
POINT finder_point = { GET_X_LPARAM( lParam ),
GET_Y_LPARAM( lParam ) };
ClientToScreen( &finder_point );
// locate the window at the given point
HWND target = ::WindowFromPoint( screen_point );
// ...
}
Unfortunately, WindowFromPoint()
has a limitation. From its MSDN page:
The WindowFromPoint
function does not retrieve a handle to a hidden or disabled window, even if the point is within the window. An application should use the ChildWindowFromPoint
function for a nonrestrictive search.
To use ChildWindowFromPoint()
as the documentation suggests, we must provide a parent window and client-coordinates relative to that parent. So, our code must be changed to:
{
// ...
// get the screen coordinates of our cursor
POINT finder_point = { GET_X_LPARAM( lParam ),
GET_Y_LPARAM( lParam ) };
ClientToScreen( &finder_point );
// Locate the parent of the window at the given coordinates
HWND parent = ::GetParent( ::WindowFromPoint( screen_point ) );
// the screen coordinates from the child-most window's perspective
POINT client_point;
// perform a non-restrictive search to find the child-most window
HWND target = GetChildMost( parent, screen_point, &client_point );
// ...
}
/// Get the child-most window a given parent control at a specific point
HWND GetChildMost( HWND parent_window,
const POINT& screen_point,
POINT* parent_point )
{
// reset our coordinate system to the current window
*parent_point = screen_point;
::ScreenToClient( parent_window, parent_point );
// Find this window's child (if any)
HWND child = ::ChildWindowFromPoint( parent_window, *parent_point );
if( NULL == child || child == parent_window )
return parent_window;
// get the next child-most window in the stack
return GetChildMost( child, screen_point, parent_point );
}
Now, we will always locate the correct window regardless of its state.
Reading Text from the Static Control
We've located the control with the text we want to read, so we will now examine several methods of extracting that text and discuss the limitations of each method. We will start with the simplest control to read - the Static control or Label. Later in this article, we will examine more complex controls.
Naïve Method
The most obvious and easiest method to get the text of a given window is GetWindowText()
and GetWindowTextLength()
. We could implement that as below:
// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::GetWindowTextLength( target );
if( text_length > 0 )
{
// buffer to hold the text from WM_GETTEXT
std::vector< wchar_t > window_text_buffer( text_length + 1 );
// text returned by WM_GETTEXT
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::GetWindowText( target, window_text, text_length + 1 ) )
{
// We've successfully received the text from the other process.
}
}
}
Unfortunately, this method has a number of limitations. As Raymond Chen points out in The Old New Thing - "The secret life of GetWindowText", GetWindowText()
is mainly used to get the title of a frame. It doesn't work if you're using it from another process to get the text of a control that does custom text management. For that, we need to use WM_GETTEXT
and WM_GETTEXTLENGTH
.
Naïve Method II
Changing from GetWindowText()
to WM_GETTEXT
isn't much work, really. Just replace GetWindowText()
with a couple of SendMessage()
calls, and voilà:
// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
if( text_length > 0 )
{
// buffer to hold the text from WM_GETTEXT
std::vector< wchar_t > window_text_buffer( text_length + 1 );
// text returned by WM_GETTEXT
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::SendMessage( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ) ) )
{
// We've successfully received the text from the other process.
}
}
}
Too bad it doesn't work. If we run this code, we find the WM_GETTEXTLENGTH
returns the length of the text string as expected. But, though WM_GETTEXT
succeeds, it returns an empty string.1 Why? Consider what we're doing: We're sending a message to another process and asking it to populate a buffer within our process with data. That's a big no-no. For this to work, we need to access a memory space that can be shared between processes. Memory-mapped files to the rescue!
The Memory Mapped File Standard Allocator
The memory-mapped file gives us access to a piece of the virtual address space that can be used to share either a file or memory between processes. In our case, we aren't likely to need to share enough data to warrant using a file, so we will used "RAM-backed mapping" where the data resides entirely in RAM and is never paged out.
The memory-mapped file API consists of three functions that are relevant to our program:
CreateFileMapping
- Creates a memory mapped file in the shared virtual address spaceMapViewOfFile
- Gives us a pointer to the fileUnmapViewOfFile
- Releases and invalidates our pointer to the file
It would be a terrible thing to have to go from the elegance of using a std::vector<>
to having to put all that memory-mapped file code around every call to SendMessage()
. Fortunately, the standard library has a little-used faculty to deal with just this sort of situation. Every standard library container has at least two template parameters. The first (and by far the most commonly used) defines what will be stored in the container. The second parameter, however, defines how the container should allocate space for those objects. We will define an allocator that std::vector<>
can use to allocate space in a memory mapped file.
/// Standard library allocator implementation using memory mapped files
template< class T >
class MappedFileAllocator
{
public:
typedef T value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
pointer address( reference r ) const { return &r; };
const_pointer address( const_reference r ) const { return &r; };
void construct( pointer p, const_reference val ) { new( p ) T( val ); };
void destroy( pointer p ) { p; p->~T(); };
/// convert a MappedFileAllocator< T > to a MappedFileAllocator< U >
template< class U >
struct rebind { typedef MappedFileAllocator< U > other; };
MappedFileAllocator() throw() : mapped_file_( INVALID_HANDLE_VALUE )
{
};
template< class U >
explicit MappedFileAllocator( const MappedFileAllocator< U >& other ) throw()
: mapped_file_( INVALID_HANDLE_VALUE )
{
::DuplicateHandle( GetCurrentProcess(),
other.mapped_file_,
GetCurrentProcess(),
&this->mapped_file_,
0,
FALSE,
DUPLICATE_SAME_ACCESS );
};
pointer allocate( size_type n, const void* /*hint*/ = 0 )
{
mapped_file_ = ::CreateFileMapping( INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE,
0,
n,
NULL );
return reinterpret_cast< T* >( ::MapViewOfFile( mapped_file_,
FILE_MAP_READ | FILE_MAP_WRITE,
0,
0,
n ) );
};
void deallocate( pointer p, size_type n )
{
if( NULL != p )
{
::FlushViewOfFile( p, n * sizeof( T ) );
::UnmapViewOfFile( p );
}
if( INVALID_HANDLE_VALUE != mapped_file_ )
{
::CloseHandle( mapped_file_ );
mapped_file_ = INVALID_HANDLE_VALUE;
}
};
size_type max_size() const throw()
{
return std::numeric_limits< size_type >::max() / sizeof( T );
};
private:
/// disallow assignment
void operator=( const MappedFileAllocator& );
/// handle to the memory-mapped file
HANDLE mapped_file_;
}; // class MappedFileAllocator
Now, we are able to define a memory buffer with all the advantages of std::vector<>
that can be shared between processes.
/// a sequential byte-buffer backed by a memory-mapped file.
typedef std::vector< byte, MappedFileAllocator< byte > > MappedBuffer;
Naïve Method III
Let's revisit our last method with this memory-mapped buffer and see how it works.
// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
if( text_length > 0 )
{
// buffer to hold the text from WM_GETTEXT
MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );
// text returned by WM_GETTEXT
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::SendMessage( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ) ) )
{
// We've successfully received the text from the other process.
}
}
}
Our code barely changed at all, but the result is exactly what we want... With one exception. SendMessage()
waits for the target process to respond before returning. What if the target process is frozen? With our code the way it is, we could freeze our application waiting for the other process to get its act together. Fortunately, Microsoft has taken care of this eventuality with SendMessageTimeout()
.
The Final Method
Putting it all together, we end up with an algorithm that can safely retrieve text from any static, button, check-box, combo-box, or edit control.
// define some arbitrary, but reasonable timeout value
DWORD timeout = 1000;
// verify the static doesn't have the SS_ICON style or WM_GETTEXT will
// return an icon handle instead of text.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
// length of the text in the window
DWORD text_length = 0;
if( ( ::SendMessageTimeout( target,
WM_GETTEXTLENGTH,
0,
0,
SMTO_NORMAL,
timeout,
&text_length ) ) &&
( text_length > 0 ) )
{
// memory-mapped buffer to hold the text from WM_GETTEXT
MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );
// text returned by WM_GETTEXT
wchar_t* window_text =
reinterpret_cast< wchar_t* >( &window_text_buffer.front() );
// amount of text copied by WM_GETTEXT
DWORD copied = 0;
if( ( ::SendMessageTimeout( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ),
SMTO_NORMAL,
timeout,
&copied ) > 0 ) &&
( copied > 0 ) )
{
// We've successfully received the text from the other process.
}
}
}
Reading Text from the List View Control
Being able to get the text from statics, buttons, check-boxes, combo-boxes, and edit controls is great, but there are lots of other controls out there. Let's take a look at a more complex control, the List View, where WM_GETTEXT
doesn't work. The List View is used in applications like File Explorer and Task Manager. There is a three step process for retrieving its text:
- Verify there are items in the list view -
LVM_GETITEMCOUNT
- Locate the item our cursor is over -
LVM_SUBITEMHITTEST
- Get the text of that item -
LVM_GETITEM
Since there's no sense in our program looking for text in an empty List View, let's first check to see if there are any items in the view.
bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
DWORD item_count = 0;
if( ::SendMessageTimeout( target,
LVM_GETITEMCOUNT,
0,
0,
SMTO_NORMAL,
timeout,
&item_count ) > 0 )
{
return item_count > 0;
}
return false;
};
You may have wondered why our GetChildMost()
function needed to return the mouse point in client coordinates for the child window whose text we were scraping. After all, we didn't need it to get the static control text. But, more complex controls, like the List View, have multiple text elements. We will use the client coordinates to determine which text element we're looking at using a "hit test".
typedef struct {
int item;
int subitem;
} item_type;
bool LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE )
{
MappedBuffer hti_buffer( sizeof( LVHITTESTINFO ) );
LVHITTESTINFO* hti =
reinterpret_cast< LVHITTESTINFO* >( &hti_buffer.front() );
hti->pt = pt;
int res = 0;
if( ::SendMessageTimeout( target,
LVM_SUBITEMHITTEST,
0,
reinterpret_cast< LPARAM >( hti ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &res ) ) > 0 &&
res > -1 )
{
item->item = hti->iItem;
item->subitem = hti->iSubItem;
return true;
}
return false;
};
Now that we know which item and sub item our coordinates point to, we send the List View a LVM_GETITEM
message to receive the text for the selected item in the List View.
bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE )
{
MappedBuffer lvi_buffer(
sizeof( LV_ITEM ) + sizeof( wchar_t ) * length );
LV_ITEM* lvi =
reinterpret_cast< LV_ITEM* >( &lvi_buffer.front() );
lvi->mask = LVIF_TEXT;
lvi->iItem = item.item;
lvi->iSubItem = item.subitem;
lvi->cchTextMax = length;
lvi->pszText = reinterpret_cast< wchar_t* >(
&lvi_buffer.front() + sizeof( LV_ITEM ) );
BOOL success = FALSE;
if( ::SendMessageTimeout( target,
LVM_GETITEM,
0,
reinterpret_cast< LPARAM >( lvi ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &success ) ) > 0 &&
success )
{
*text = lvi->pszText;
return true;
}
return false;
};
Reading Text from the Tab Control
Like the List View control, we have a three step process for scraping the text from a Tab control:
- Verify there are items in the tab control -
TCM_GETITEMCOUNT
- Locate the tab our cursor is over -
TCM_HITTEST
- Get the text of that tab -
TCM_GETITEM
As before, we first check to see if there are any tabs in the control.
bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
DWORD item_count = 0;
if( ::SendMessageTimeout( target,
TCM_GETITEMCOUNT,
0,
0,
SMTO_NORMAL,
timeout,
&item_count ) > 0 )
{
return item_count > 0;
}
return false;
};
Then, we determine which tab our pointer is over.
BOOL LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE )
{
MappedBuffer tch_buffer( sizeof( TCHITTESTINFO ) );
TCHITTESTINFO* tch =
reinterpret_cast< TCHITTESTINFO* >( &tch_buffer.front() );
tch->pt = pt;
item_type it;
if( ::SendMessageTimeout( target,
TCM_HITTEST,
0,
reinterpret_cast< LPARAM >( tch ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &it ) ) > 0 )
{
if( it > -1 )
{
*item = it;
return true;
}
}
return false;
};
Lastly, we query the tab control for the text of that tab.
bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE )
{
MappedBuffer tc_buffer( sizeof( TCITEM ) + sizeof( wchar_t ) * length );
TCITEM* tc = reinterpret_cast< TCITEM* >( &tc_buffer.front() );
tc->cchTextMax = length;
tc->mask = TCIF_TEXT;
tc->pszText = reinterpret_cast< wchar_t* >(
&tc_buffer.front() + sizeof( TCITEM ) );
BOOL success = FALSE;
if( ::SendMessageTimeout( target,
TCM_GETITEM,
item,
reinterpret_cast< LPARAM >( tc ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &success ) ) > 0 )
{
if( success )
{
*text = tc->pszText;
return true;
}
}
return false;
}
Bringing it all Together
By now, it is obvious a pattern is emerging. We can get the screen text for any control type by following a fairly general procedure:
- Check the validity of the control.
- Locate the text item within the control.
- Get the length of the text.
- Get the text.
We can generalize each of these procedural elements into a 'traits' structure:
/// traits for reading the text of a tab control
struct TabTraits
{
/// type of item contained within this control
typedef int item_type;
/// name of the window class these traits are relevant to
static wchar_t* ClassName() { return WC_TABCONTROL; };
/// Does the target window contain text to read?
static bool CheckValiditiy( HWND target, DWORD timeout = INFINITE );
/// locate the text item within the control at the given point
static BOOL LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE );
/// get the length of the text string to retrieve
static DWORD GetTextLength( HWND target,
const item_type& item,
DWORD timeout = INFINITE );
/// retrieve the text
static bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE );
}; // struct TabTraits
We supply the 'traits' structure as a template parameter to a generalized algorithm that performs each step.
/// read the text at a specific point on a target control window.
template< class T >
bool DoReadScreenText( HWND target,
const POINT& client_point,
std::wstring* screen_text,
DWORD timeout = INFINITE )
{
if( T::CheckValiditiy( target, timeout ) )
{
T::item_type item;
if( T::LocateItem( target, client_point, &item, timeout ) )
{
DWORD length = T::GetTextLength( target, item, timeout );
if( length > 0 )
{
return T::GetText( target, item, length, screen_text, timeout );
}
}
}
return false;
}
Using GetClassName()
we can determine the type of control we're reading. This allows us to create a control structure that can read the text from any on-screen control.
bool ReadScreenText( HWND target,
const POINT& client_point,
std::wstring* screen_text,
DWORD timeout )
{
// get the window class for the target window
wchar_t class_name[ 257 ] = { 0 };
::GetClassName( target, class_name, _countof( class_name ) );
// different window classes require different methods of getting their
// screen text.
if( wcsstr( class_name, TabTraits::ClassName() ) )
{
return DoReadScreenText< TabTraits >( target,
client_point,
screen_text,
timeout );
}
else if( wcsstr( class_name, ... ) )
{
// ...
}
else if ...
}
The attached code has methods for reading from Static, Tab, List View, and List Box controls. Reading from other control types such as Headers, Menus, Tree Views, Today-Screen plugins, or other custom controls is left as an exercise to the interested reader.
Footnotes
- This isn't strictly true.
WM_GETTEXT
won't always return an empty string. There are three window messages that are treated specially:WM_GETTEXT
,WM_SETTEXT
, andWM_COPYDATA
. The result of sending these messages with process-local memory buffers seems to vary depending on what version of Windows is being used and how that control handles the message. For this to work in the general case, we provide it with a memory-mapped file. It won't hurt in cases where it's not necessary.
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)