我一直在試圖提取這個網(wǎng)頁上的所有標(biāo)題,:-https://www . y combinator . com/companies?行業(yè)= B2B %軟件% 20和% 20服務(wù)& amp狀態(tài)=活動& amp狀態(tài)=公共& amp狀態(tài)=不活動& amp金融科技公司。標(biāo)簽=開發(fā)者% 20工具& amp人工智能。標(biāo)簽=分析。例如(gitlab、deel、fivetran等)。我正在使用beautifulSoup包來完成這個任務(wù),但是它沒有給我正確的結(jié)果。
import requests
from bs4 import BeautifulSoup
url = 'https://www.ycombinator.com/companies?industry=B2B%20Software%20and%20Services&status=Active&status=Public&status=Inactive&tags=Fintech&tags=Developer%20Tools&tags=Artificial%20Intelligence&tags=Analytics'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
a_tags = soup.find_all('a')
for a_tag in a_tags:
div_tag = a_tag.find('div', class_='right')
if div_tag:
span_tags = div_tag.find_all('span')
for span_tag in span_tags:
text = span_tag.get_text(strip=True)
print(text)
我用這個來完成任務(wù)。我首先找出頁面上的所有標(biāo)簽,然后進(jìn)入標(biāo)簽,最后到,我想我想要的標(biāo)題在那里。但是,它仍然沒有顯示任何結(jié)果。有人知道解決這個問題的方法嗎?
您沒有得到任何結(jié)果,因為您正在尋找的標(biāo)題(GitLab、Deel、Fivetran等)是包含javascript的搜索結(jié)果的一部分,并且需要幾秒鐘才能加載到頁面上。因為它不支持javascript,所以對于簡單的請求你無法做到。但是你可以用硒達(dá)到同樣的效果。
下面是使用Selenium解決這個問題的方法:
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC
driver = Chrome()
url = "https://www.ycombinator.com/companies?industry=B2B%20Software%20and%20Services&status=Active&status=Public&status=Inactive&tags=Fintech&tags=Developer%20Tools&tags=Artificial%20Intelligence&tags=Analytics"
driver.get(url)
results = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'a.WxyYeI15LZ5U_DOM0z8F')))
print(f"total search results: {len(results)}")
for result in results:
print(result.find_element(By.CSS_SELECTOR, 'div.right>div>span').text)
輸出:
total search results: 40
GitLab
Deel
Fivetran
Checkr
Retool
Podium
Algolia
Modern Treasury
Sift
Pave
Mux
Jasper.ai
Apollo
Mashgin
Veriff
Airbyte
Teleport
People.ai
Sendbird
Mixpanel
Human Interest
Heap
Frubana Inc
Replit
QuickNode
Middesk
Supabase
TRM Labs
InfluxData
Bitrise
Hightouch
Instabug
Mezmo
Routable
HackerRank
Duffel
Deepgram
RevenueCat
TrueNorth
Nuvocargo