Node.js强制等待功能完成

debugcn 发表于 Dev

汉娜·墨菲（Hannah Murphy）

我在使用Node.js运行的程序中有一个for循环。该函数是xray包中的x（），我正在使用它来从网页上抓取并接收数据，然后将该数据写入文件。该程序在用于刮擦〜100页时是成功的，但是我需要刮擦〜10000页。当我尝试刮取大量页面时，将创建文件，但它们不保存任何数据。我相信这个问题存在是因为for循环在进入下一个迭代之前没有等待x（）返回数据。

有没有一种方法可以让节点在继续下一次迭代之前等待x（）函数完成？

//takes in file of urls, 1 on each line, and splits them into an array. 
//Then scrapes webpages and writes content to a file named for the pmid number that represents the study
 
//split urls into arrays
var fs = require('fs');
var array = fs.readFileSync('Desktop/formatted_urls.txt').toString().split("\n");


var Xray = require('x-ray');
var x = new Xray();
 
for(i in array){
        //get unique number and url from the array to be put into the text file name
                number = array[i].substring(35);
                url = array[i];


        //use .write function of x from xray to write the info to a file
        x(url, 'css selectors').write('filepath' + number + '.txt');
                               
}

注意：我要抓取的某些页面没有返回任何值

大学教师

代码的问题是您没有等待将文件写入文件系统。与逐个下载文件相比，一种更好的方法是一次性完成文件，然后等待文件完成，而不是逐个处理文件，然后再进行下一个文件的下载。

推荐的用于处理Node.js中的Promise的库之一是bluebird。

http://bluebirdjs.com/docs/getting-started.html

在更新后的示例中（请参见下文），我们遍历所有URL并开始下载，并跟踪承诺，然后在写入文件后便解决了每个承诺。最后，我们只是等待使用Promise.all（）解决所有承诺

这是更新的代码：

var promises = [];
var getDownloadPromise = function(url, number){
    return new Promise(function(resolve){
        x(url, 'css selectors').write('filepath' + number + '.txt').on('finish', function(){
            console.log('Completed ' + url);
            resolve();
        });
    });
};

for(i in array){
    number = array[i].substring(35);
    url = array[i];

    promises.push(getDownloadPromise(url, number));                               
}

Promise.all(promises).then(function(){
    console.log('All urls have been completed');
});

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。