我正在尝试从
http://www.dsebd.org/displayCompany.php?name=NBL提取单个数据,
我在所附图片中显示了Xpath的必填字段:/ html / body / table [2] / tbody / tr / td [2] / table / tbody / tr [3] / td 1 / p 1 / table 1 / tbody / tr / td 1 / table / tbody / tr [2] / td [2] / font
错误:发生异常,并且使用该Xpath找不到数据。“ HtmlAgilityPack.dll中发生了'System.Net.WebException'类型的未处理异常”
源代码:
static void Main(string[] args)
{
/************************************************************************/
string tickerid = "Bse_Prc_tick";
HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET");
if (doc != null)
{
// Fetch the stock price from the Web page
string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText;
Console.WriteLine(stockprice);
}
Console.WriteLine("ReadKey Starts........");
Console.ReadKey();
}
好吧,我检查了。我们使用的XPath根本不正确。当您尝试找到错误所在时,真正的乐趣就开始了。
只要检查出您正在使用的页面的源代码,就可以避免许多错误,这些错误会阻碍XPath甚至包含多个HTML标记...
Chrome Dev Tools和您使用的工具可在浏览器校正的dom树上使用(全部打包到单个html节点中,添加了一些tbody等)。
由于html结构只是被破坏了,因此成为了HtmlAgilityPack的解析器。
就目前情况而言,您可以使用RegExp或仅搜索源中的已知元素(速度更快,但敏捷性更低)。
例如:
...
using System.Net; //required for Webclient
...
class Program
{
//entry point of console app
static void Main(string[] args)
{
// url to download
// "var" means I am too lazy to write "string" and let compiler decide typing
var url = @"http://www.dsebd.org/displayCompany.php?name=NBL";
// creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
// simply download result to string, in this case it will be html code
string htmlCode = client.DownloadString(url);
// cut html in half op position of "Last Trade:"
// searching from beginning of string is easier/faster than searching in middle
htmlCode = htmlCode.Substring(
htmlCode.IndexOf("Last Trade:")
);
// select from .. to .. and then remove leading and trailing whitespace characters
htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim();
Console.WriteLine(htmlCode);
}
Console.ReadLine();
}
}
// http://stackoverflow.com/a/17253735/3147740 <- copied from here
// this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says
public static class StringExtensions
{
/// <summary>
/// takes a substring between two anchor strings (or the end of the string if that anchor is null)
/// </summary>
/// <param name="this">a string</param>
/// <param name="from">an optional string to search after</param>
/// <param name="until">an optional string to search before</param>
/// <param name="comparison">an optional comparison for the search</param>
/// <returns>a substring based on the search</returns>
public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture)
{
var fromLength = (from ?? string.Empty).Length;
var startIndex = !string.IsNullOrEmpty(from)
? @this.IndexOf(from, comparison) + fromLength
: 0;
if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); }
var endIndex = !string.IsNullOrEmpty(until)
? @this.IndexOf(until, startIndex, comparison)
: @this.Length;
if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); }
var subString = @this.Substring(startIndex, endIndex - startIndex);
return subString;
}
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句