Extract domain information from a URL with and without regex

by Php Ninza on July 12, 2009

At times Php developers need to parse domain information from a given url , Usually a $_SERVER['REQUEST_URI'] , The php function www.php.net/parse_url is used to extract domain name, Variables from a url string, The list of information extracted (Return values) from parse url are:-

  • scheme – e.g. http, https, etc.
  • host:- the domain name( www.google.com)
  • port:- the number of port used
  • user :- username if specified on domain url
  • pass :- password if specified on domain url
  • path :- path of the page after the url
  • query – after the question mark ?
  • fragment – after the hashmark #


After parsing the url contents , there are still a lot of words that are url encoded , Which contains %20 between words. These can be parsed via www.php.net/urldecode .

A alternative method which can be used to parse a url contents is to use regex (regular expression) for

$r = “(?:([a-z0-9+-._]+)://)?”;
$r .= “(?:”;
$r .= “(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9a-f]{2})*)@)?”;
$r .= “(?:\[((?:[a-z0-9:])*)\])?”;
$r .= “((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9a-f]{2})*)”;
$r .= “(?::(\d*))?”;
$r .= “(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9a-f]{2})*)?”;
$r .= “|”;
$r .= “(/?”;
$r .= “(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9a-f]{2})+”;
$r .= “(?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9a-f]{2})*”;
$r .= “)?”;
$r .= “)”;
$r .= “(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?”;
$r .= “(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?”;
preg_match(“`$r`i”, $url, $match);

Be Sociable, Share!

{ 2 comments… read them below or Shout @ me! }

Amel July 18, 2009 at 7:47 am

Thanks for that, haven’t tried it yet but i’ve been looking for something like it. Looks good

Reply

Php Ninza July 18, 2009 at 11:20 am

@Amel

You are most welcome !, Thanks for stopping by

Reply

Leave a Comment

Previous post:

Next post: