Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

I have some relative URL's like the followings:

 

<a href="/some/folder/index.html">Sports</a>

<a href="some2/folder2/default.htm">Weather</a>

 

What I want to do:

 

<a href="http://www.domain.com/some/folder/index.html">Sports</a>

<a href="http://www.domain.com/some2/folder2/default.htm">Weather</a>

 

Basically, I want to insert the domain name at some index. I can match the regular expression without any problem and I did not want to use the groupping in regular expression, because then I have to use a while loop. So I wanted to use the regular expression replace function to enter the domain name in C#.

 

This is something I have used in the past -

 

(?<regSRC>href=[^"']*["'])

 

Then I could easily replace the entire text in C# with a different one using ${regSRC} variable. I wanted to use a similar trick for this solution - Can anyone help?

 

Thanks.

  • 2 weeks later...
Posted

Simple answer?

 

I have some relative URL's like the followings:

 

<a href="/some/folder/index.html">Sports</a>

<a href="some2/folder2/default.htm">Weather</a>

 

What I want to do:

 

<a href="http://www.domain.com/some/folder/index.html">Sports</a>

<a href="http://www.domain.com/some2/folder2/default.htm">Weather</a>

 

You could search for

 

\<a href\=\"

 

and replace with

 

<a href="http://www.domain.com

 

but this may too simple. Is this what you are thinking or am I misunderstanding your question?

Posted
You could search for

 

\<a href\=\"

 

and replace with

 

<a href="http://www.domain.com

 

but this may too simple. Is this what you are thinking or am I misunderstanding your question?

 

hi richard,

i think this will replace all url patters like

<a href="some/x.htm">...

<a href="www.somesite.com/x.htm">...

<a href="http://www.xtremedotnettalk.com

 

 

 

which is a wrong pattern match

i think apart from matching for <a href="

we should filter only those urls not starting with http|www

or atleat

the url doenst start with a literal which is he wants to insert the string

 

i just started working with reggies

may be i am wrong

Posted

You are correct

 

hi richard,

i think this will replace all url patters like

<a href="some/x.htm">...

<a href="www.somesite.com/x.htm">...

<a href="http://www.xtremedotnettalk.com

 

 

 

which is a wrong pattern match

i think apart from matching for <a href="

we should filter only those urls not starting with http|www

or atleat

the url doenst start with a literal which is he wants to insert the string

 

i just started working with reggies

may be i am wrong

 

You are correct. :cool:

 

My suggestion would do just as you said, so you have a good understanding of regex. My suggestion was based on my assumption (and you know what assume does) that all his candidate strings were of the form of his example, which did not show a www or http as part of the url. If his data does contain the www or http, then further analysis is warranted and attention to situations like you have brought up would have to be considered.

 

To handle situations you have brought up you could search for:

 

(\<a href\=\")([^hw][^tw][^tw])

 

and replace with:

 

\1http://www.domain.com\2

 

This says search for:

 

<a href="

followed by 3 characters where the first is not an h or w, the second and third are not t or w

 

This will find strings where the first three characters after the double quote are not htt and not www. Now....depending on the data this might also exclude some desirable strings like two.three and so forth. However, the search string above errs on the side of safety.

 

Parentheses in the search string allow reference to groups. The first parentheses is group one, the second is group 2, etc. This comes in handy in the replacement string. Using this ability the replacement string above inserts the desired string in between the two parenthetical groups in the search string.

 

Folks please comment on this, because there are many ways to accomplish regex things, all depending on data analysis and desired results, as we have seen by dev2dev's response. :cool:

Posted
To which post are you referring?

the one which i post in response to you new regex. i.e., my post before the post which i posted yesterday.

 

i wrote very lenghty post, god... i cant write it now completly but, in short

 

the regex you gave in your previous post has some logical error

which skips urls like

<a href="wwwtutorial/chap1.htm">

<a href="http/basics.asp">

 

i think its better to skip all url which starts with http:// and https:// and ftp:// and www.

 

what do you say

  • 4 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...