In this article, I will show you how to create a website url crawler using asp.net c#. you can crawl web pages and extract data from a website by inputs the url. It requests the web page and then get response the data programmatically.
The piece of following code written in c# language helps to crawl web page and extract data from website.
Html Design Code:
Create an asp.net web application and right click on the application and create a new web form and name it as CrawlData.aspx. Copy and paste the following design code on it.
<form id="form1" runat="server" style="text-align: center">
<div style="border: 1px solid #DED8D8; width: 750px; height: 550px; font-family: Arial;">
<h2>Crawl website URL</h2>
<asp:TextBox ID="txtUrl" runat="server"></asp:TextBox>
<asp:Button ID="btnCrawl" Text="Crawl" runat="server" OnClick="btnCrawl_Click" Style="height: 26px" />
<br />
<br />
<iframe style="width: 750px; height: 100%;" id="irm1" src="CrawlData/new.html" runat="server"></iframe>
</div>
</form>
Code behind:
protected void btnCrawl_Click(object sender, EventArgs e)
{
string url = txtUrl.Text;
WebRequest request = WebRequest.Create(url);
string path = Server.MapPath("~/CrawlData/");
using (WebResponse response = request.GetResponse())
{
using (StreamReader responseReader =
new StreamReader(response.GetResponseStream()))
{
string responseData =responseReader.ReadToEnd();
using (StreamWriter writer=
new StreamWriter(path+ "new.html"))
{
writer.Write(responseData);
}
}
}
irm1.Src = "CrawlData/new.html";
}
}
Description: Run the application and enter the page url you want to crawl. Click the “crawl” button, It request the web page and crawl data from website and save it in the project folder “CrawlData”. Here I have entered this url "https://www.google.co.in" , it crawls the web page and displayed on the iframe.
Post your comments / questions
Recent Article
- How to create custom 404 error page in Django?
- Requested setting INSTALLED_APPS, but settings are not configured. You must either define..
- ValueError:All arrays must be of the same length - Python
- Check hostname requires server hostname - SOLVED
- How to restrict access to the page Access only for logged user in Django
- Migration admin.0001_initial is applied before its dependency admin.0001_initial on database default
- Add or change a related_name argument to the definition for 'auth.User.groups' or 'DriverUser.groups'. -Django ERROR
- Addition of two numbers in django python
Related Article