我有:
library(XML)
my_URL <- "http://www.velocitysharesetns.com/viix"
tables <- readHTMLTable(my_URL)
以上仅输出位于页面顶部的表。看起来饼图已被忽略,而javascript可以解释这一事实。是否有简单的解决方案来提取图表中的两个百分比数字?
看了一下,RSelenium
但是我遇到了一些我无法找到任何解决方案的错误。
> RSelenium::startServer()
Error in if (file.exists(file) == FALSE) if (!missing(asText) && asText == :
argument is of length zero
In addition: Warning messages:
1: startServer is deprecated.
Users in future can find the function in file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see vignette("RSelenium-docker", package = "RSelenium")
2: running command '"java" -jar "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/selenium-server-standalone.jar" -log "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/sellog.txt"' had status 127
3: running command '"wmic" path win32_process get Caption,Processid,Commandline /format:htable' had status 44210
>
根据Phillip的回答,我想出了一个流动的解决方案:
library(XML)
# extarct HTML
doc.html = htmlTreeParse('http://www.velocitysharesetns.com/viix',
useInternal = TRUE)
# convert to text
htmltxt <- paste(capture.output(doc.html, file=NULL), collapse="\n")
# get location of string
pos = regexpr('CBOE SHORT-TERM VIX FUTURE', htmltxt)
# extarct from "pos" to nchar to end of string
keep = substr(htmltxt, pos, pos+98)
输出:
> keep
[1] "CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],\n\n ['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],\n"
使用RSelenium
此解决方案对我来说使用Rselenium(在使用Windows 7并检查了网页的源代码之后)有效。请注意,我使用chromedriver.exe
library(RSelenium)
checkForServer(update = TRUE)
#### I use Chromedriver
startServer(args = c("-Dwebdriver.chrome.driver=C:/Stuff/Scripts/chromedriver.exe"))
remDr <- remoteDriver(remoteServerAddr = "localhost", browserName="chrome", port=4444)
### Open Chrome
remDr$open()
remDr$navigate("http://www.velocitysharesetns.com/viix")
b <- remDr$findElements(using="class name", value="jqplot-pie-series")
sapply(b, function(x){x$getElementAttribute("outerHTML")})
最后一条命令返回
[[1]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 100px; top: 106px;\"><div style=\"color:white;font-weight:bold;\">82%</div></div>"
[[2]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 159px; top: 67px;\"><div style=\"color:white;font-weight:bold;\">18%</div></div>"
您会看到百分比数字出现在此处并且可以轻松提取。
仅使用纯HTML
另外,还可以通过仅读取html源来获取数据,因为已经包含了数据。在源代码中的某个地方,您会找到:
<script type="text/javascript" language="javascript">
$(document).ready(function(){
var data = [
['CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],
['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],
];
这就是您要寻找的。数字在图中显示之前四舍五入。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句