In this article we will demonstrate how to use some methods and properties of HTLMElement class, which can use to modify existing html document.
Introduction
Assume, we have an HTML snippet that we want to modify using C#.
<div id="snippet">This is a <mark>specialSearchWord</mark> that I want to link to <img src="anImage.jpg"/> <a href="foo.htm">A hyperlink</a> Some more text and that <mark>specialSearchWord</mark> again. </div>We want to transform it with replace <mark>-element with hyperlink:
<div id="snippet"> This is a <a class="special" href="http://example.com/search/specialSearchWord">specialSearchWord</a> that I want to link to <img src="anImage.jpg" /> <a href="foo.htm">A hyperlink</a> Some more text and that <a class="special" href="http://example.com/search/specialSearchWord">specialSearchWord</a> again. </div>Loading and modifying page
We will use instance of HTMLDocument to get content of our demo page (http://asposedemo20170904120448.azurewebsites.net/home/ModifyingPage).
If something goes wrong, we can catch an exception and handle it.
const string inputHtml = "http://asposedemo20170904120448.azurewebsites.net/home/ModifyingPage"; HTMLDocument document = null; try { document = new HTMLDocument(inputHtml); } catch (Exception e) { Console.WriteLine(e); }
To find the snippet we want, we will use GetElementById method . This method returns the instance of Element that has an ID attribute with the given value or null if such element doessn’t exists.
if (document == null) return; var element = document.GetElementById("snippet"); if (element != null) { // TODO: Replace elements }
There are several ways to solve our problem:
- Get collection of <mark>-element and replace it with <a>-elements
- Get inner HTML as text fragment and replace text segement
Get collection of <mark>-element and replace it with <a>-elements
To demonstrate 1st method we need to create an HTMLAnchorElement
var anchorElement = (HTMLAnchorElement) document.CreateElement("a") ; anchorElement.Href = "http://example.com/search/specialSearchWord"; anchorElement.SetAttribute("class","special"); anchorElement.TextContent = "specialSearchWord";
To get a collection of <mark>-element we will use GetElementsByTagName method (https://apireference.aspose.com/net/html/aspose.html.dom/element/methods/getelementsbytagname). This method returns an HTMLCollection object of all descendant Elements with a given tag name, in document order.
var markElements = element.GetElementsByTagName("mark");
So, we can traverse throught markElements and replace elements. HTMLElement class has a method ReplaceChild) This method replaces the child node oldChild with newChild in the list of children, and returns the oldChild node.
if (element != null) { var markElements = element.GetElementsByTagName("mark"); foreach (var markElement in markElements) { element.ReplaceChild(anchorElement, markElement); } }
Here is a full example:
private static void Solution01() { const string inputHtml = "http://asposedemo20170904120448.azurewebsites.net/home/ModifyingPage"; HTMLDocument document = null; try { document = new HTMLDocument(inputHtml); } catch (Exception e) { Console.WriteLine(e); } if (document == null) return; var element = document.GetElementById("snippet"); var anchorElement = (HTMLAnchorElement) document.CreateElement("a") ; anchorElement.Href = "http://example.com/search/specialSearchWord"; anchorElement.SetAttribute("class","special"); anchorElement.TextContent = "specialSearchWord"; if (element != null) { var markElements = element.GetElementsByTagName("mark"); foreach (var markElement in markElements) { element.ReplaceChild(anchorElement, markElement); } } }
Get inner HTML as string and replace text segment
HTMLElement class also allow access to inner (and outer) HTML markup of element through InnerHTML (OuterHTML) properties.
For snippet described above the value of InnerHTML is
This is a <mark>specialSearchWord</mark> that I want to link to <img src="anImage.jpg"> <a href="foo.htm">A hyperlink</a> Some more text and that <mark>specialSearchWord</mark> again.and the value of OuterHTML is
<div id="snippet"> This is a <mark>specialSearchWord</mark> that I want to link to <img src="anImage.jpg"><a href="foo.htm">A hyperlink</a> Some more text and that <mark>specialSearchWord</mark> again.</div>
Therefore, we can replace <mark>-element as string fragment using String.Replace method
if (element != null) { string stringToFind = "<mark>specialSearchWord</mark>"; string stringToReplace = anchorElement.OuterHTML; element.InnerHTML = element.InnerHTML.Replace(stringToFind, stringToReplace); }
Here is a full example:
private static void Solution02() { const string inputHtml = "http://asposedemo20170904120448.azurewebsites.net/home/ModifyingPage"; HTMLDocument document = null; try { document = new HTMLDocument(inputHtml); } catch (Exception e) { Console.WriteLine(e); } if (document == null) return; var anchorElement = (HTMLAnchorElement)document.CreateElement("a"); anchorElement.Href = "http://example.com/search/specialSearchWord"; anchorElement.SetAttribute("class", "special"); anchorElement.TextContent = "specialSearchWord"; var element = document.GetElementById("snippet"); if (element != null) { string stringToFind = "<mark>specialSearchWord</mark>"; string stringToReplace = anchorElement.OuterHTML; element.InnerHTML = element.InnerHTML.Replace(stringToFind, stringToReplace); } Console.WriteLine(element.OuterHTML); }
Obviously, that second solution is acceptable only for exact search and can be used in simple cases.
Method ReplaceChild of HTMLElement class is not only one way to modify HTML markup.You can also use other methods such as RemoveAttribute or SetAttribute
Follow example shows how we can clear "class" attribute from hyperlinks in our snippet.
if (element != null) { Console.WriteLine($"Initial content\n:{element.InnerHTML}"); var anchorElements = element.GetElementsByTagName("a"); foreach (var anchorElement in anchorElements) { anchorElement.RemoveAttribute("class"); } Console.WriteLine($"Modified content\n:{element.InnerHTML}"); }