網(wǎng)站導(dǎo)航

python 爬蟲天眼查

Python是一種十分強(qiáng)大的編程語言，其中最常見的用途之一就是通過爬蟲技術(shù)從互聯(lián)網(wǎng)上獲取數(shù)據(jù)?，F(xiàn)在有一個(gè)非常流行的網(wǎng)站，就是天眼查。

天眼查是一個(gè)提供企業(yè)信息查詢、風(fēng)險(xiǎn)預(yù)警和商業(yè)信息分析等服務(wù)的平臺(tái)。許多人都想從這個(gè)網(wǎng)站上獲取一些數(shù)據(jù)信息，但如果手工操作將是非常麻煩的一件事情。因此，這里就提供了一個(gè)Python爬蟲天眼查的例子，幫助大家快速獲取數(shù)據(jù)信息。

import requests
from lxml import etree
def main():
url = "https://www.tianyancha.com/search?key=阿里巴巴"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3", 
"cookie": "cookie_value"
}
response = requests.get(url, headers=headers)
html = etree.HTML(response.text)
company_urls = html.xpath("http://div[@class='content']/div/div[2]/a/@href")
for company_url in company_urls:
detail_url = "https://www.tianyancha.com" + company_url
detail_response = requests.get(detail_url, headers=headers)
detail_html = etree.HTML(detail_response.text)
company_name = detail_html.xpath("http://h1[@class='name']/text()")[0]
legal_name = detail_html.xpath("http://div[@class='humancompany']//div[@class='human']/a/text()")[0]
register_fund = detail_html.xpath("http://div[@class='company-content']/table/tbody//tr[4]/td[2]/text()")[0]
print(company_name, legal_name, register_fund)
if __name__ == '__main__':
main()

上述例子中，我們首先發(fā)起一個(gè)攜帶cookie的HTTP請求，然后解析此網(wǎng)頁的HTML代碼。然后我們在公司列表頁面中獲取到了每個(gè)企業(yè)的詳細(xì)頁面的URL地址，然后對于每個(gè)企業(yè)，我們都發(fā)起了一個(gè)HTTP請求來獲取詳細(xì)信息，然后提取相關(guān)信息進(jìn)行輸出。

使用Python爬蟲天眼查并不是一件很難的事情，而且這個(gè)例子中也只是在獲取一些簡單的信息。但是，其實(shí)使用Python爬蟲的應(yīng)用還是十分廣泛的。因此，無論你是要用來獲取信息，還是獲取知識點(diǎn)，都可以通過Python爬蟲實(shí)現(xiàn)。

上一篇vue中豎表頭

下一篇python 爬蟲工程是

欧美一区二区三区,国内熟女精品熟女A片视频小说,日本av网,小鲜肉男男GAY做受XXX网站

網(wǎng)站導(dǎo)航

網(wǎng)站導(dǎo)航

網(wǎng)站分類

python 爬蟲天眼查

欧美一区二区三区,国内熟女精品熟女A片视频小说,日本av网,小鲜肉男男GAY做受XXX网站

網(wǎng)站導(dǎo)航

網(wǎng)站導(dǎo)航

網(wǎng)站分類

python 爬蟲天眼查

相關(guān)文章