BeautifulSoup 无法检索网页链接

信噪比

我正在尝试检测网站列表页面的网址,但 BeautifulSoup 无法做到这一点。我收到以下异常,即使我尝试使用标题,

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 317, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='www.sahibinden.com', port=80): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/soner/PycharmProjects/bitirme2/main.py", line 8, in <module>
    r = requests.get(url)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='www.sahibinden.com', port=80): Read timed out. (read timeout=None)

Process finished with exit code 1

但是当我使用https://hackertarget.com/extract-links/尝试代码中的 url 时,它会带来 URL。

import requests
from bs4 import BeautifulSoup


url = 'http://www.sahibinden.com/satilik/istanbul-kartal?pagingOffset=50&pagingSize=50'
url2 = 'http://www.stackoverflow.com'

r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

for link in soup.find_all("a", {"class": "classifiedTitle"}):
    print(link.get('href'))


'''
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
print(requests.get(url, headers=headers, timeout=5).text)
'''

请注意,如果您发现自己被网站 (sahibinden) 屏蔽,则有可能。我还没有研究过使用代理列表使用 BeautifulSoup。

卡迪亚斯

这是我运行的代码片段,它按预期工作:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

url = 'http://www.sahibinden.com/satilik/istanbul-kartal?pagingOffset=50&pagingSize=50'

r = requests.get(url, headers=headers)
if r.ok:
    soup = BeautifulSoup(r.text, 'lxml')
    for a in soup('a', 'classifiedTitle'):
        print(a.get('href'))

这是上面代码的输出:

/ilan/emlak-konut-satilik-directten%2Ccift-wc-li%2Cgenis-m2de%2Ciskanli%2Culasimi-kolay-sik-3-plus1-671049902/detay
/ilan/emlak-konut-satilik-nesrin-den-kartal-ugurmumcuda-satilik-3-plus1-yunus-emre-caddesinde-692133846/detay
/ilan/emlak-konut-satilik-akelden-karliktepe-de-genis-m2-li-krediye-uygun-daire-659458837/detay
/ilan/emlak-konut-satilik-ikea-ve-metro-yani-teknik-yapi-uprise-elite_mukemmel-firsat-3-plus1-692131163/detay
/ilan/emlak-konut-satilik-kartal-atalar-da-iskanli-5-plus1-dubleks-satilik-daire-692125302/detay
/ilan/emlak-konut-satilik-satilik-daire-kartal-atalar-da-2-plus1-lux-100-m2-671083034/detay
/ilan/emlak-konut-satilik-kartal-ugurmumcuda-3-plus1-genis-masrafsiz-satilik-daire-681180607/detay
/ilan/emlak-konut-satilik-soner-den-manzara-adalar-da-satilik-kacirilmayacak-kelepir-daire-653973723/detay
/ilan/emlak-konut-satilik-mertcan-dan-tarihi-ayazma-caddesinde-2-plus1-satilik-ters-dubleks-692122837/detay
/ilan/emlak-konut-satilik-cinar-emlak%2Ctan-hurriyet-mah-105-m2-toprak-tapulu-692117031/detay
/ilan/emlak-konut-satilik-kartal-cumhuriyet-te-arsa-hisseli-yuksek-giris-daire-692116930/detay
/ilan/emlak-konut-satilik-temiz-emlaktan-petroliste-2-plus1-satilik-sifir-deniz-manzarali-671086029/detay
/ilan/emlak-konut-satilik-cemal-yalcin-dan-ozel-mimarili-luks-satilik-dubleks-623158476/detay
/ilan/emlak-konut-satilik-la-marin-kartal-da-site-icerisinde-ozel-bahce-kati-sifir-daire-645480180/detay
/ilan/emlak-konut-satilik-sen-kardeslerden-merkezde-3-plus1%2Ccok-temiz-satilik-daire%2C350.000tl-692103788/detay
/ilan/emlak-konut-satilik-kartal-petrol-is-mah-de-3-plus1-deniz-manzarali-yatirimlik-daire-619762304/detay
/ilan/emlak-konut-satilik-remax-red-rukiye-korkmaz-dan-panorama-velpark-ta-esyali-1-plus1-616596826/detay
/ilan/emlak-konut-satilik-yakacik-demirli-twinstar-sitesi-ultra-luks-174-m2-3-plus1-daire-692104680/detay
/ilan/emlak-konut-satilik-kartal-soganlikta-yatirimlik-kiracili-firsat-2-plus1-daire-682793715/detay
/ilan/emlak-konut-satilik-istmarinada-devirli-taksitli-satilik-studyo-gulsen-yanmazdan-638548163/detay
/ilan/emlak-konut-satilik-sahibinden-satilik-kartal-merkezde-kaymakamligin-karsisinda-2-plus1-692054497/detay
/ilan/emlak-konut-satilik-petrolis______ara-kat-2-plus1-110-m2-lux-panjurlu_____carsiya-yakin-692100683/detay
/ilan/emlak-konut-satilik-ful-deniz-manzarali-3-plus1-ana-yola-cok-yakin-115m2-sifir-daire-585807696/detay
/ilan/emlak-konut-satilik-kartal-karlitepe-de-ters-dublek-2-plus2-satilik-daire-692085141/detay
/ilan/emlak-konut-satilik-kartal-dap-yapi-istmarina-full-deniz-manzarali-2-plus1-satilik-621795699/detay
/ilan/emlak-konut-satilik-aybars-dan-site-icinde-havuzlu-satilik-daire-671063936/detay
/ilan/emlak-konut-satilik-soganlik-yeni-mah-5-yillik-binada-adalar-manzarali-satilik-dair-679308838/detay
/ilan/emlak-konut-satilik-kartal-soganlik-orta-mah-e-5-yani-yeni-bina-kelepir-daire-573785719/detay
/ilan/emlak-konut-satilik-sahibinden-site-icerisinde-1-plus1-644746509/detay
/ilan/emlak-konut-satilik-3-plus1-luks-sitede-646420303/detay
/ilan/emlak-konut-satilik-mirac-dan-ayazma-koru-da-lux-yapili-3-plus1-135m2-masrafsiz-daire-535382195/detay
/ilan/emlak-konut-satilik-sahibinden-site-icerisinde-3-plus1-644729603/detay
/ilan/emlak-konut-satilik-cevizli-de-satilik-daire-2-plus1-lux-85-m2-671030197/detay
/ilan/emlak-konut-satilik-esentepe-de-bahceli-acik-otoparkli-125m2-ferah-kullansli-daire-670847710/detay
/ilan/emlak-konut-satilik-atalarda-ara-katta-sifir-binada-2-plus1-85-m2-otoparkli-510436215/detay
/ilan/emlak-konut-satilik-sahil-mesa-marmara-10.kat-122m2-deniz-manzarali-0-satilik-3-plus1-692085951/detay
/ilan/emlak-konut-satilik-kartal-da-sifir-ara-kat-3-plus1-satilik-daire-692090351/detay
/ilan/emlak-konut-satilik-pega-kartal-satis-ofisinden-2-plus1-kat-mulkiyetli-hemen-teslim-644626657/detay
/ilan/emlak-konut-satilik-adalilar-dan-kartal-hurriyet-mah-de-satilik-kelepir-3-plus1-dublex-682761629/detay
/ilan/emlak-konut-satilik-kartal-kordonboyunda-2-plus1-sifir-daire-647037679/detay
/ilan/emlak-konut-satilik-aklife-den_yakacik_carsi_mah_ultra_lux_katta_tek_sifir_2-plus1-654883140/detay
/ilan/emlak-konut-satilik-aklife-den_yakacik_da_mukanbel_yapi_kaliteli_3-plus1_arakat_sifir-657772595/detay
/ilan/emlak-konut-satilik-ciceksan-insaat-dan-3-plus1-daireler-hemen-tapu-hemen-teslim-682770303/detay
/ilan/emlak-konut-satilik-satilik-daire-ofis-2-1-85-mt-klepir-634724740/detay
/ilan/emlak-konut-satilik-ricar-dan%2C7-24-guvenlik%2Cyuzme-havuzu%2Ckapali-otopark%2Csifir%2Csitede-682744629/detay
/ilan/emlak-konut-satilik-ricar-dan%2Cana-cadde-uzeri%2Cgenis%2Cferah%2Csifir%2Clux%2Cara-kat-649504313/detay
/ilan/emlak-konut-satilik-mertcan-dan-e5-e-yurume-mesafesinde-iskanli-2-plus1-sifir-daire-692078490/detay
/ilan/emlak-konut-satilik-kartal-atalar-da-sahile-yurume-mesafesinde-iskanli-masrafsiz-3-plus1-454709956/detay
/ilan/emlak-konut-satilik-tugcan-pala-dan-mesa-kartall-da-satilik-2-kat-buyuk-tip-2-plus1-670434988/detay
/ilan/emlak-konut-satilik-satilik-sifir-daire-soganlik-yeni-mah-2-plus1-kat-mulkiyetli-682522237/detay

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

使用BeautifulSoup从网页下载链接

来自分类Dev

使用 BeautifulSoup 检索图像链接

来自分类Dev

无法使用 BeautifulSoup 检索 href

来自分类Dev

使用BeautifulSoup从网页中抓取特定链接

来自分类Dev

如何从网页获取链接-BeautifulSoup / Python

来自分类Dev

python BeautifulSoup无法从网页获取文本

来自分类Dev

无法使用BeautifulSoup检索所需XPATH的元素

来自分类Dev

无法使用 BeautifulSoup 检索页面内容

来自分类Dev

使用BeautifulSoup抓取网页

来自分类Dev

使用 BeautifulSoup 抓取网页

来自分类Dev

Beautifulsoup 的网页抓取

来自分类Dev

BeautifulSoup 网页抓取错误

来自分类Dev

BeautifulSoup 网页抓取

来自分类Dev

使用BeautifulSoup从网页获取链接并滚动以获取更多信息

来自分类Dev

使用BeautifulSoup搜寻网页以获取链接标题和URL

来自分类Dev

beautifulsoup检索日期

来自分类Dev

BeautifulSoup 不检索元素

来自分类Dev

使用Python中的BeautifulSoup从Google搜索中检索链接

来自分类Dev

beautifulsoup不打印链接

来自分类Dev

Beautifulsoup返回双链接

来自分类Dev

BeautifulSoup获取文本链接?

来自分类Dev

无法使用python和beautifulsoup抓取网页中的某些href

来自分类Dev

使用BeautifulSoup Python抓取网页

来自分类Dev

使用BeautifulSoup保存网页内容

来自分类Dev

Python-网页搜罗-BeautifulSoup

来自分类Dev

使用BeautifulSoup Python抓取网页

来自分类Dev

使用BeautifulSoup保存网页内容

来自分类Dev

Python BeautifulSoup阅读网页

来自分类Dev

用beautifulsoup进行网页抓取