本文主要介绍通过Python Selenium和pyautogui实现网页下载,包含一些网页中一些资源文件(js,css,图片等)。

1、解决方案

Selenium无法与浏览器的上下文菜单进行交互,调用浏览器另存为,可以使用外部自动化库pyautogui

pyautogui.hotkey('ctrl', 's')
time.sleep(1)
pyautogui.typewrite(SEQUENCE + '.html')
pyautogui.hotkey('enter')

2、完整代码及说明 

另存为通过其键盘快捷键打开窗口CTRL+S,然后按Enter 键将网页及其资源保存到默认下载位置。此代码还将文件命名为序列,以便为其提供唯一的名称,但您可以根据用例更改此名称。如果需要,您还可以通过选项卡和箭头键,进行更改下载位置。

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.expected_conditions import visibility_of_element_located
from selenium.webdriver.support.ui import WebDriverWait
import pyautogui
URL = 'https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome'
SEQUENCE = 'CCTAAACTATAGAAGGACAGCTCAAACACAAAGTTACCTAAACTATAGAAGGACAGCTCAAACACAAAGTTACCTAAACTATAGAAGGACAGCTCAAACACAAAGTTACCTAAACTATAGAAGGACAGCTCAAACACAAAGTTACCTAAACTATAGAAGGACA' #'GAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGAGAAGA'
# 用selenium打开网页
# (首先要下载 Chrome webdriver, 或 firefox webdriver)
driver = webdriver.Chrome()
driver.get(URL)
# 在查询框中输入SEQUENCE,点击“blast”按钮进行搜索
seq_query_field = driver.find_element_by_id("seq")
seq_query_field.send_keys(SEQUENCE)
blast_button = driver.find_element_by_id("b1")
blast_button.click()
# 等待结果加载
WebDriverWait(driver, 60).until(visibility_of_element_located((By.ID, 'grView')))
# 打开"另存为"来保存html和资源文件
pyautogui.hotkey('ctrl', 's')
time.sleep(1)
pyautogui.typewrite(SEQUENCE + '.html')
pyautogui.hotkey('enter')