Export HTML content to CSV or Excel

Let’s say you are interested in downloading some HTML content from a website based on some filters.

PowerShell it the tool to use. It just takes a few lines Smile

let’s take this blog as an example.

I would like to get hold of all Title Posts of the category ‘PowerShell’

The URL for this is this https://audministrator.wordpress.com/category/Sharepoint/

If you run it in a browser you get all the post content listed. But we need only the Title ?

Therefor we need to use the DOM (Document Object Model). In order to find what we need.

Here we go :

Open up the IE browser and press F12. This will open up the DOM explorer.

In the right corner fill in a search key word of a post Title

image

Next inspect the section needed.

Next we are going to need a DOM method to access the data, in this case.

getElementsByTagName = “h2”

Next I wanted to get the text of the Title.

So we can use the method

innerText

here is the coded plus some in between debugging information for you to get more information out of this example.

CLS

$URI ="https://audministrator.wordpress.com/category/powershell/"
$HTML= Invoke-WebRequest -uri $URI

# echo $HTML

# $HTML | Get-Member

# $HTML.ParsedHtml | Get-Member

<#
Check out these Members

getElementById
getElementsByName
getElementsByTagName
#>

$Ret = $HTML.ParsedHtml.getElementsByTagName("h2" #| Where { $_.className  }
echo $Ret.innerText

$Ret = $HTML.ParsedHtml.getElementsByTagName("h2" | Where { [int]$_.className.trim.length -eq 0  } | % { ([String]$_.sourceindex + " " + $_.innerText) }
$Ret | Select-Object @{Name='Name';Expression={$_}} | Export-Csv ($Env:USERPROFILE+"\Desktop\Test.csv") -NoTypeInformation

Invoke-item ($Env:USERPROFILE+"\Desktop\Test.csv")

echo $Ret
echo

Next I got a bit too much information. As you can see the last items are not post but are the headers of the right widgets.

image

So we have to filter them out. See code

className.trim.length -eq 0

image

Finally we got what we wanted. And we can now export it to a CSV file if we want.

image

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: