Regexp on urls with ruby
In ruby, if you would like to get the components of a URI you would need to use URI
that’s not in the core library but in the standard library. Thus making you require it, like this:
require 'uri'
uri = URI("https://hello.asdas.com/foo/asdf/asda?aaa=bbb&a=b")
uri.path # "/foo/asdf/asda"
and you could shortcut it without storing it in a variable if you just gonna call URI
once:
require 'uri'
URI("https://hello.asdas.com/foo/asdf/asda?aaa=bbb&a=b").path # "/foo/asdf/asda"
This regexp rather removes the stuff before the path than catching the stuff from the other way, just because I think it’s easier. Here’s how it looks
url = "https://hello.asdas.com/foo/asdf/asda?aaa=bbb&a=b"
url.gsub(/^http(s)?:\/\/(([a-z]+)|([\.]+))+\//, "") # foo/asdf/asda?aaa=bbb&a=b
A better one, this will capture every part of the url as groups.
url = "https://hello.asdas.com/foo/asdf/asda?aaa=bbb&a=b&#fragment"
/^(http[s]?:\/\/)?([^\/]*)([^\?|#]*)([^\#]*)(.*)$/.match(url)
# <MatchData
# "https://hello.asdas.com/foo/asdf/asda?aaa=bbb&a=b&#fragment"
# 1:"https://"
# 2:"hello.asdas.com"
# 3:"/foo/asdf/asda"
# 4:"?aaa=bbb&a=b&"
# 5:"#fragment"
# >
We can do even better with named captures:
m = /^(?<scheme>http[s]?:\/\/)?(?<authority>[^\/]*)(?<path>[^\?|#]*)(?<query>[^\#]*)(?<fragment>.*)$/.match(url)
m.named_captures
# {
# "scheme"=>"https://",
# "authority"=>"hello.asdas.com",
# "path"=>"/foo/asdf/asda",
# "query"=>"?aaa=bbb&a=b&",
# "fragment"=>"#fragment"
# }