環境

windows 10

檢查python版本

（使用git bash）

$ python --version

Python 3.7.7

查pip版本

$ pip --version

pip 19.2.3 from c:\users\user\appdata\local\programs\python\python37-32\lib\site-packages\pip (python 3.7)

看目前系統有安裝哪些套件

$ pip list

Package Version

---------- -------

pip 19.2.3

setuptools 41.2.0

WARNING: You are using pip version 20.2.3; however, version 20.3.3 is available.

You should consider upgrading via the 'c:\users\bear\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.

升級pip

$ python -m pip install --upgrade pip

Collecting pip

Downloading pip-21.0-py3-none-any.whl (1.5 MB)

|████████████████████████████████| 1.5 MB 469 kB/s

Installing collected packages: pip

Attempting uninstall: pip

Found existing installation: pip 20.2.3

Uninstalling pip-20.2.3:

Successfully uninstalled pip-20.2.3

Successfully installed pip-21.0

$ pip --version

pip 21.0 from c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip (python 3.9)

升級python

https://stackoverflow.com/a/57292808 How do I upgrade the Python installation in Windows 10?

直接到python官網下載最新的python安裝檔

如果你是要升級小版本3.x.y 到 3.x.z，可以直接【Upgrade Now】

如果你是要升級中版本3.x 到 3.y，安裝檔會提示你【Install Now】

安裝完成（我有禁用path length limit）

In this case, you are not upgrading, but you are installing a new version of Python. You can have more than one version installed on your machine. They will be located in different directories.

這種情況，你不是升級，你是安裝新版本的python。你的系統有多個python版本在不同的目錄。

使用py指定python版本

$ py -3.7

Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 09:44:33) [MSC v.1900 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>>

$ py -3.9

Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>>

設定環境變量

設定=》系統=》關於=》進階系統設定

環境變數

編輯Path

將原本python 3.7的路徑改成3.9的

C:\Users\user\AppData\Local\Programs\Python\Python37-32\Scripts\

C:\Users\user\AppData\Local\Programs\Python\Python37-32\

改成

C:\Users\user\AppData\Local\Programs\Python\Python39\Scripts\

C:\Users\user\AppData\Local\Programs\Python\Python39\

然後開新的prompt檢查版本

$ python --version

Python 3.9.1

$ pip --version

pip 20.2.3 from c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip (python 3.9)

編輯器

PyCharm Professional 2020.3

斷點

File => Open （打開空目錄時，如C:\python\hello），會自動生成 main.py

這個檔案會教你怎麼用PyCharm的斷點，Debug（Shift+F9）即可開始斷點

熱鍵

自定義

Resume Program => F9

正確打開新項目的姿勢

File => New Project

New environment using（環境使用）：【Virtualenv】（venv）

Base interpreter: C:\Users\user\AppData\Local\Programs\Python\Python39\python.exe （如果你裝了2個python（3.7和3.9），要在這邊選IDE環境的python版本）

這樣項目下才會有venv目錄

.gitignore 忽略掉venv和.idea目錄

.gitignore

venv/

.idea/

爬蟲

https://www.learncodewithmike.com/2020/05/python-selenium-scraper.html [Python爬蟲教學]整合Python Selenium及BeautifulSoup實現動態網頁爬蟲

在PyCharm 中安裝Selenium

https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html Install, uninstall, and upgrade packages

Settings => Project: project_name => Python Interpreter => +

搜尋Selenium =》 Install Package

然後就可以看到成功安裝selenium，和相依賴的package（urllib3）

這樣只裝在IDE的環境中，沒安裝在系統上，所以git bash上面pip list還是沒有selenium。直接命令行執行會報錯

$ python crawler.py

Traceback (most recent call last):

File "C:\bear\python\hello3\crawler.py", line 1, in <module>

from selenium import webdriver

ModuleNotFoundError: No module named 'selenium'

系統安裝selenium

$ pip install selenium

Collecting selenium

Using cached selenium-3.141.0-py2.py3-none-any.whl (904 kB)

Collecting urllib3

Using cached urllib3-1.26.2-py2.py3-none-any.whl (136 kB)

Installing collected packages: urllib3, selenium

Successfully installed selenium-3.141.0 urllib3-1.26.2

命令行執行（需先配置好webdriver）

$ python crawler.py

DevTools listening on ws://127.0.0.1:60865/devtools/browser/c51ba9ea-08d4-45f8-bee0-5780695b15ef

老天尊的死期

安裝Webdriver

前往Python套件儲存庫PyPI(Python Package Index) 查詢Selenium

點進去後，往下可以看到Drivers的地方，下載chromedriver

下載你chrome瀏覽器相對應的 ChromeDriver

解壓縮 chromedriver_win32.zip 放到Python網頁爬蟲的專案資料夾中，如下圖：

第一個爬蟲程式

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument("--disable-notifications")

chrome = webdriver.Chrome('./chromedriver', options=options)
chrome.get("https://carlislebear.blogspot.com/")

print(chrome.title)
chrome.quit()

Line 5-6：options物件，主要用途為取消網頁中的彈出視窗，避免妨礙網路爬蟲的執行。

Line 8：就是建立webdriver物件，傳入剛剛所下載的「瀏覽器驅動程式路徑（chromedriver）-可略」及「瀏覽器設定(options)-可選」

執行

Line 11：chrome.title打印爬蟲網頁的標題，在Console 中也可以打印變數

第二個爬蟲程式

https://officeguide.cc/windows-python-selenium-automation-scripts-tutorial-examples/ Windows 使用 Python + Selenium 自動控制瀏覽器教學與範例

https://stackoverflow.com/a/39191349 NoSuchElementException - Unable to locate element

https://stackoverflow.com/a/44834542 Switch to an iframe through Selenium and python

目的：打開blog =》搜尋python =》取第一篇文章的標題

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

options = Options()
options.add_argument("--disable-notifications")

# chrome = webdriver.Chrome('./chromedriver', options=options)
chrome = webdriver.Chrome()
chrome.get("https://carlislebear.blogspot.com/")

iframe = chrome.find_element_by_name('navbar-iframe')
chrome.switch_to.frame(iframe)
b_query = chrome.find_element_by_id('b-query').send_keys('python')
chrome.find_element_by_xpath('//*[@id="b-query-icon"]').click()

try:
    WebDriverWait(chrome, 3).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Blog1"]/div[1]')))
    print(chrome.find_element_by_xpath('//*[@id="Blog1"]/div[1]/div[3]/div/div/div/h3').text)
except TimeoutException:
    print('等待逾時！')

chrome.quit()

Line 5-7 & Line 23-24：搜尋後直到內容出來後再打印出第一篇文章標題

Line 16-17：切換到上方iframe，否則會報錯

NoSuchElementException: Message: no such element: Unable to locate element

Line 16、18、19：可以使用find_element_by_name、find_element_by_id、find_element_by_xpath選擇DOM

如何快速寫出XPATH？

chrome 開發者工具DOM上右鍵 =》Copy=》 Copy XPath

剪貼簿中得到：

//*[@id="b-query-icon"]

如何驗證XPath？

開發者工具Console中使用 $x ，如：

$x('//*[@id="b-query-icon"]')

使用BeautifulSoup抓取所有文章標題

需安裝beautifulsoup4

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By

+from bs4 import BeautifulSoup

options = Options()

options.add_argument("--disable-notifications")

@@ -21,6 +23,11 @@ chrome.find_element_by_xpath('//*[@id="b-query-icon"]').click()

try:

WebDriverWait(chrome, 3).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Blog1"]/div[1]')))

print(chrome.find_element_by_xpath('//*[@id="Blog1"]/div[1]/div[3]/div/div/div/h3').text)

+ soup = BeautifulSoup(chrome.page_source, 'html.parser')

+ titles = soup.find_all('h3', {

+ 'class': 'post-title'})

+ for title in titles:

+ print(title.getText())

except TimeoutException:

print('等待逾時！')

輸出：

起因

如果你的wireshark抓不到包，可以嘗試使用fiddler抓包備用

我遇過windows 10 使用wireshark抓hyper-v上的laradock http服務抓不到包，估計是hyper-v被map了外網IP造成的，因為其他內網實體機器可以抓到包

安裝版本

Fiddler Everywhere 1.4.1

chrome抓不到包

https://stackoverflow.com/a/19905099 Fiddler not capturing traffic from browsers

如果你的chrome裝了 Proxy SwitchySharp 擴充功能，必須選擇【系統代理】而不是【直接連線】，才能抓到包

抓HTTPS的包

點擊這個驚歎號

信任和啟用HTTPS

安裝DO_NOT_TRUST_FiddlerRoot憑證

設定過濾器

只能對HTTP Header 過濾（沒有wireshark 強大）

瀏覽器打開頁面即可抓到包

Replay請求

在請求上【右鍵】=》【Replay】=》【Reissue Requests】

查看請求封包和返回結果

Request =》 Raw： HTTP 封包

Response =》 Body（JSON）：返回結果

老天尊的死期

2021年1月21日星期四

python心得

環境

檢查python版本

查pip版本