Python是一種流行的編程語言,它擁有強大的數據處理能力和靈活的編程方式。Python可以輕松完成爬取數據的操作,比如抓取省市區數據。
import requests from bs4 import BeautifulSoup url = 'http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2019/' response = requests.get(url) response.encoding = 'gbk' soup = BeautifulSoup(response.text, 'html.parser') provinces = soup.select('tr.provincetr td a') for p in provinces: province_url = url + p['href'] province_code = p.get_text() province_response = requests.get(province_url) province_response.encoding = 'gbk' province_soup = BeautifulSoup(province_response.text, 'html.parser') cities = province_soup.select('tr.citytr') for c in cities: city_code = c.select('td')[0].get_text() city_name = c.select('td')[1].get_text() city_url = province_url.rsplit('/', maxsplit=1)[0] + '/' + city_code + '.html' city_response = requests.get(city_url) city_response.encoding = 'gbk' city_soup = BeautifulSoup(city_response.text, 'html.parser') areas = city_soup.select('tr.countytr') for a in areas: area_code = a.select('td')[0].get_text() area_name = a.select('td')[1].get_text() print(province_code, city_code, area_code, area_name)
以上代碼使用了第三方庫requests
和beautifulsoup4
。首先,我們通過requests.get
方法獲取網頁內容,并指定編碼方式為gbk
。然后,使用BeautifulSoup
解析網頁內容。soup.select
方法可以選擇頁面中的元素,類似于CSS選擇器。接著,我們進一步抓取每個城市和區的網頁鏈接并解析出相應的數據。
最終,我們可以得到每個省市區的代碼和名稱。