杭电OJ第4018题 Parsing URL
杭电OJ第4018题,Parsing URL(题目链接)。
Parsing URL
Problem Description
In computing, a Uniform Resource Locator or Universal Resource Locator (URL) is a character string that specifies where a known resource is available on the Internet and the mechanism for retrieving it.
The syntax of a typical URL is:
scheme://domain:port/path?query_string#fragment_id
In this problem, the scheme, domain is required by all URL and other components are optional. That is, for example, the following are all correct urls:
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox
http://en.wikipedia.org/wiki/Bowser_(character)
ftp://fs.fudan.edu.cn/
telnet://bbs.fudan.edu.cn/
http://mail.bashu.cn:8080/BsOnline/
Your task is to find the domain for all given URLs.
Input
There are multiple test cases in this problem. The first line of input contains a single integer denoting the number of test cases. For each of test case, there is only one line contains a valid URL.
Output
For each test case, you should output the domain of the given URL.
Sample Input
3
http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
http://www.mariowiki.com/Mushroom
https://mail.google.com/mail/?shva=1#inbox
Sample Output
Case #1: dict.bing.com.cn
Case #2: www.mariowiki.com
Case #3: mail.google.com
Source
解题思路:简单的字符串解析,没有任何难度。不过要注意,不要输出端口号。直接用Java的正则表达式就能轻松搞定。
import java.io.*; import java.util.*; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String args[]) { Scanner cin = new Scanner(System.in); int n; String URL; Matcher matcher; Pattern pattern = Pattern.compile("([A-Za-z]+://)([^:/]+)[:/].*"); n = cin.nextInt(); URL = cin.nextLine(); for ( int i = 1 ; i <= n ; i ++ ) { URL = cin.nextLine(); matcher = pattern.matcher(URL); if ( matcher.matches() ) System.out.println("Case #" + i + ": " + matcher.group(2) ); } } }
喜欢用C语言搞也行。C语言本来可以用GNU正则表达式的。
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <regex.h> typedef int COUNT; #define MAX_LENGTH 1000 int main (void) { COUNT i; int n; char url[MAX_LENGTH]; regmatch_t pmatch[4]; regex_t match_regex; regcomp( &match_regex, "([A-Za-z]+://)([^:/]+)([:/].*)", REG_EXTENDED ); scanf( "%d", &n ); for ( i = 1 ; i <= n ; i ++ ) { scanf( "%s", url ); regexec( &match_regex, url, 4, pmatch, 0 ); url[pmatch[2].rm_eo] = '\0'; puts( &(url[pmatch[2].rm_so]) ); } regfree( &match_regex ); return EXIT_SUCCESS; }
不过杭电OJ是Windows服务器,用的gcc编译器是MinGW的gcc,所以不支持GNU正则表达式,所以如果用C语言写,就只能自己解析字符串了。C代码如下:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdbool.h> typedef int COUNT; #define MAX_LENGTH 1000 int main (void) { COUNT i, j; int n; bool starturl; char url[MAX_LENGTH]; char outputurl[MAX_LENGTH]; int len; scanf( "%d", &n ); for ( i = 1 ; i <= n ; i ++ ) { starturl = false; scanf( "%s", url ); sprintf (outputurl, "Case #%d: ", i ); len = strlen( outputurl ); for ( j = 0 ; url[j] != '\0' ; j ++ ) { if ( !starturl ) { if ( url[j] == '/' ) { j ++; starturl = true; } } else { if ( url[j] == ':' || url[j] == '/' || url[j] == '\0' ) break; outputurl[len++] = url[j]; } } outputurl[len] = '\0'; puts( outputurl ); } return EXIT_SUCCESS; }