尝试提取足球数据

debugcn 发表于 Dev

菠萝69

我正在尝试从以下表单的每个匹配项中提取信息。我可以访问此肢体，但是不知道如何进行。

<tbody class="matchCentreStatsContainer"><tr><td><p class="higher">64.6</p></td><td><p>Possession %</p></td><td><p class="">35.4
</p></td></tr><tr><td><p class="higher">7</p>
</td><td><p>Shots on target</p></td>
<td><p class="">1</p></td></tr><tr><td><p class="higher">15</p></td><td><p>Shots</p>
</td><td><p class="">4</p></td></tr><tr><td><p class="higher">757</p></td>
<td><p>Touches</p></td><td><p class="">510</p></td></tr>
<tr><td><p class="higher">543</p></td><td><p>Passes</p></td><td><p class="">301</p></td></tr>
<tr><td><p class="higher">24</p></td><td><p>Tackles</p></td><td><p class="">23</p></td></tr>
<tr><td><p class="">12</p></td><td><p>Clearances</p></td><td><p class="higher">22</p></td></tr>
<tr><td><p class="higher">9</p></td><td><p>Corners</p></td><td><p class="">0</p></td></tr>
<tr><td><p class="">3</p></td><td><p>Offsides</p></td><td><p class="higher">2</p></td></tr>
<tr><td><p class="">2</p></td><td><p>Yellow cards</p></td><td><p class="higher">1</p></td></tr>
<tr><td><p class="">15</p></td><td><p>Fouls conceded</p></td><td><p class="higher">12</p></td></tr></tbody>

我有以下代码可以访问它，不能从那里移动。提取诸如通行证，触摸，占有等之类的数据的任何帮助将不胜感激。

import requests
import pandas as pd
url = "https://www.premierleague.com/match/46889"
page = requests.get(url)
import bs4
soup = bs4.BeautifulSoup(page.content, 'lxml')
tablediv = soup.find(name='div', attrs={'data-ui-tab':'Match Stats'})
tablediv.tbody

贝特朗·马特尔

该站点使用Rest API从客户端获取数据。您需要致电：

GET https://footballapi.pulselive.com/football/stats/match/{matchID}

结果是JSON数据，您可以通过查看data字段来获取统计信息，stats对象由两个团队的ID索引：

import requests
import json

matchID = "46889"
match = requests.get(
    f"https://footballapi.pulselive.com/football/stats/match/{matchID}",
    headers = {
        "origin": "https://www.premierleague.com"
    }
)
data = json.loads(match.text)

teams = [(t["team"]["id"], t["team"]["name"]) for t in data["entity"]["teams"]]

print(teams)

for t in teams:
    print(f"stats for team {t[1]} with id {t[0]}")
    stats = data["data"][str(t[0])]
    print(stats)