Angelo Lee's Blog
This is my kingdom .If i don't fight for it ,who will ?

Using CURL to download a remote file from a valid URL in c++


Introduction

Curl is an open source solution that compiles and runs under a wide variety of operating systems. It's used for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS and FILE.

Curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling and a busload of other useful tricks.

This article assume that you have Ubuntu as OS and Eclipse as IDE . To install Curl open "Synaptic Package Manager" (System → Administration) . And install the following packages :

synaptic3.png

After you creating a new Eclipse project (C++ project) go to "Project properties->C/C++ Build->Settings->Tool Settings tab".At "GCC C++ Linker" in "Libraries (-l)" list add "curl" . This is all we must to do to be able to link with libcurl.

Untitled.png

Using Curl to Download a remote file from a valid URL

To use libcurl you must include in your project "curl/curl.h" . This file must be listed in your "Includes" section from your Eclipse project (probably in "/usr/include/curl").

Before calling any function from libcurl you must call the following function:

 CURLcode curl_global_init(long flags);

This function sets up the program environment that libcurl needs. Think of it as an extension of the library loader. The flags option is a bit pattern that tells libcurl exactly what features to init . Set the desired bits by ORing the values together.

In normal operation, you must specify CURL_GLOBAL_ALL. Don't use any other value unless you are familiar with it and mean to control internal operations of libcurl.

After you call curl_global_init you must create a Curl handle . To do that you must call CURL *curl_easy_init( ). At the end of your application or when you want do release your Curl handle call void curl_easy_cleanup(CURL * handle );

After you have created your handle you must to tell libcurl how to behave. By using the appropriate options tocurl_easy_setopt, you can change libcurl's behavior.

All options are set with the option followed by a parameter. That parameter can be a long, a function pointer, an object pointer or a curl_off_t, depending on what the specific option expects. Read curl manual carefully as bad input values may cause libcurl to behave badly!

You can only set one option in each function call. A typical application uses many curl_easy_setopt() calls in the setup phase.

To perfrom the file transfer ,call CURLcode curl_easy_perform(CURL * handle );

This function is called after the init and all the curl_easy_setopt() calls are made, and will perform the transfer as described in the options.

You must never call this function simultaneously from two places using the same handle. Let the function return first before invoking it another time. If you want parallel transfers, you must use several curl handles.

The source code

The following source code can be used to download a file from a valid URL . Also if you want you can retrieve the server headers.

In our application we must define the following structure :

  1. typedef struct _DATA  
  2. {  
  3.     std::string* pstr;  
  4.     bool bGrab;  
  5. } DATA;  

  1. std::string* pstr - used as buffer. The URL content will be stored in this member.
  2. bool bGrab - indicate if we want to grab the content or we just want to send an request to the server without downloading content.

  1. static size_t writefunction( void *ptr , size_t size , size_t nmemb , void *stream )  
  2. {  
  3.     if ( !((DATA*) stream)->bGrab )  
  4.         return -1;  
  5.     std::string* pStr = ((DATA*) stream)->pstr;  
  6.     if ( size * nmemb )  
  7.         pStr->append((const char*) ptr, size * nmemb);  
  8.     return nmemb * size;  
  9. }  

This is a callback function which gets called by libcurl as soon as there is data received that needs to be saved. The size of the data pointed to by ptr is size multiplied with nmemb, it will not be zero terminated. Return the number of bytes actually taken care of. If that amount differs from the amount passed to your function, it'll signal an error to the library and it will abort the transfer and return CURLE_WRITE_ERROR.

The callback function will be passed as much data as possible in all invokes, but you cannot possibly make any assumptions. It may be one byte, it may be thousands. The maximum amount of data that can be passed to the write callback is defined in the curl.h header file: CURL_MAX_WRITE_SIZE.

  1. static bool DownloadURLContent( std::string strUrl , std::string & strContent,  
  2.                                 std::string &headers,bool grabHeaders = true,  
  3.                                 bool grabUrl = true )  
  4. {  
  5.     CURL *curl_handle;  
  6.     DATA data = { &strContent, grabUrl };  
  7.     DATA headers_data = {&headers , grabHeaders};  
  8.     if ( curl_global_init(CURL_GLOBAL_ALL) != CURLE_OK )  
  9.         return false;  
  10.     if ( (curl_handle = curl_easy_init()) == NULL )  
  11.         return false;  
  12. #if 0  
  13.     //just if you want to debug  
  14.     if( curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1)!= CURLE_OK)  
  15.         goto clean_up;  
  16.     if( curl_easy_setopt(curl_handle, CURLOPT_STDERR, stdout) != CURLE_OK)  
  17.         goto clean_up;  
  18. #endif  
  19.     char stdError[CURL_ERROR_SIZE] = { '\0' };  
  20.     if ( curl_easy_setopt(curl_handle, CURLOPT_ERRORBUFFER , stdError) != CURLE_OK )  
  21.         goto clean_up;  
  22.     if ( curl_easy_setopt(curl_handle, CURLOPT_URL, strUrl.c_str()) != CURLE_OK )  
  23.         goto clean_up;  
  24.     if ( curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, writefunction) != CURLE_OK )  
  25.         goto clean_up;  
  26.     if(grabHeaders)  
  27.     {  
  28.         if ( curl_easy_setopt(curl_handle, CURLOPT_HEADERFUNCTION, writefunction) != CURLE_OK )  
  29.             goto clean_up;  
  30.         if ( curl_easy_setopt(curl_handle, CURLOPT_WRITEHEADER, (void *)&headers_data) != CURLE_OK )  
  31.             goto clean_up;  
  32.     }  
  33.     if ( curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&data) != CURLE_OK )  
  34.         goto clean_up;  
  35.     if ( curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, MY_USR_AGENT) != CURLE_OK )  
  36.         goto clean_up;  
  37.     if ( curl_easy_perform(curl_handle) != CURLE_OK )  
  38.         if ( grabUrl )  
  39.             goto clean_up;  
  40.     curl_easy_cleanup(curl_handle);  
  41.     curl_global_cleanup();  
  42.     return true;  
  43. clean_up:  
  44.     printf("(%s %d) error: %s", __FILE__,__LINE__, stdError);  
  45.     curl_easy_cleanup(curl_handle);  
  46.     curl_global_cleanup();  
  47.     return false;  
  48. }  

  • CURL *curl_handle; - our curl handle
  • DATA data = { &strContent, grabUrl }; //buffer for URL content and grabbing attribute
  • DATA headers_data = {&headers , grabHeaders}; // buffer for headers and grabbing attribute

In the following lines we set the curl handle options . For a detailed description of these you can visit the libcurl documentation.

Hereinafter I will make a short presentation of all important curl options used in this article.

  • curl_easy_setopt(curl_handle, CURLOPT_URL, strUrl.c_str()) - set the URL address which will be grabbed.
  • curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, writefunction) - set the address of the callback function which will be used to retrieve the URL body
  • curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&data) - set the buffer where the callback function will store the body.
  • curl_easy_setopt(curl_handle, CURLOPT_HEADERFUNCTION, writefunction) - set the address of the callback function which will be used to retrieve the headers. The same function used to grab the URL body . We change just the storage buffer. Using the CURLOPT_WRITEHEADER option.
  • curl_easy_setopt(curl_handle, CURLOPT_WRITEHEADER, (void *)&headers_data)- set the buffer where the callback function will store the headers.

Using this code

  1. int main(void)  
  2. {  
  3.     std::string content;  
  4.     std::string headers;  
  5.     if(CUrlGrabber::DownloadURLContent("http://www.intelliproject.net" , content , headers))  
  6.     {  
  7.         printf("Headers : %s \n", headers.c_str());  
  8.         FILE *fp = fopen ("out.html""w");  
  9.         if(fp)  
  10.         {  
  11.             fwrite(content.c_str(), sizeof(char) , content.length(), fp);  
  12.             fclose(fp);  
  13.         }  
  14.         else  
  15.         {  
  16.             printf("Could not open file: out.html!");  
  17.             return 0;  
  18.         }  
  19.         return 1;  
  20.     }  
  21.     return 0;  
  22. }  

As you can see the following application save the URL body to a file (out.html) and the headers are printed to the app console .

out.png

As you can see the main purpose and use for cURL is to automate unattended file transfers or sequences of operations. For example, it is a good tool for simulating a user's actions at a web browser.


posted on 2012-12-28 13:12  Angelo Lee  阅读(819)  评论(0编辑  收藏  举报