使用selenium保存学习通自测题题库

运行前请手动下载相关模块以及chromedriver

selenium：这是一个用于自动化web浏览器交互的库。

pip install selenium

bs4（BeautifulSoup4）：这是一个用于解析HTML和XML文档的库，常常用于网页爬取。

pip install beautifulsoup4

安装ChromeDriver的步骤如下：

下载ChromeDriver

首先，你需要知道你的Chrome浏览器的版本。你可以在Chrome浏览器的菜单中选择"帮助" -> "关于Google Chrome"来查看。然后，前往https://sites.google.com/a/chromium.org/chromedriver/downloads 选择与你的Chrome浏览器版本相匹配的ChromeDriver版本。

解压ChromeDriver

下载后的文件是一个压缩文件，你需要将其解压。你可以右键点击下载的文件，然后选择"解压"。

添加ChromeDriver的路径到系统环境变量

这是让Selenium能够找到ChromeDriver的关键步骤。首先，记住你的ChromeDriver.exe文件所在的路径。

在Windows上，你可以通过以下步骤来添加环境变量：

右键点击"此电脑"或"计算机"，然后选择"属性"。
- 点击"高级系统设置"。
- 在弹出的窗口中点击"环境变量"按钮。
- 在"系统变量"区域找到并选中"Path"，然后点击"编辑"。
- 在新的窗口中，点击"新建"，然后输入你的ChromeDriver的路径。
- 点击"确定"保存你的更改。

验证安装

打开一个新的命令提示符窗口，输入chromedriver，如果安装成功，你应该会看到一个消息说ChromeDriver已经在运行。

运行

在运行代码之前，请确保所有的模块都已经成功安装，并且selenium能够正确找到ChromeDriver。

以下为代码

# 引入
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import time
import random
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException


chrome_options = webdriver.ChromeOptions()
#无头模式
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
#打开登录界面
driver = webdriver.Chrome(options=chrome_options)
driver.get('http://passport2.chaoxing.com/login?fid=&newversion=true&refer=http://i.chaoxing.com')
# 随机等待时间1-2秒
def sleep(a=1, b=2):
    time.sleep(random.uniform(a, b))
sleep(10)
#以下为登录
a=input("输入账号:\n")
b=input("输入密码:\n")
sjh=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[1]/input")
sjh.send_keys(a)
mima=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[2]/input")
mima.send_keys(b)
login=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[3]/button")
login.click()
#进入试题链接
shijuan=input("输入你要导出的试题链接:\n")
driver.get(shijuan)
#获取HTML
html = driver.page_source
# 通过BeautifulSoup解析HTML
soup = BeautifulSoup(html, 'html.parser')

# 查找所有包含题目的元素
questions = soup.find_all('div', class_='Sub_tit_box')

# 在G盘打开一个名为"题库.txt"的文件用于写入
with open("G:\\题库.txt", "w", encoding="utf-8") as file:
    for question in questions:
        # 写入题目
        file.write(question.h3.text.strip() + "\n")
        
        # 尝试找到并写入选项
        choices = question.find_next_sibling('ul', class_='mark_letter colorDeep')
        if choices:
            for li in choices.find_all('li'):
                file.write(li.text.strip() + "\n")
        else:
            file.write("No Choices for this question.\n")

        # 尝试找到并写入答案
        answer = question.find_next_sibling('div', class_='mark_answer')
        if answer:
            correct_answer = answer.find('span', class_='colorGreen marginRight40 fl')
            fill_answer = answer.find('dl', class_='mark_fill colorGreen')
            if correct_answer:
                file.write(correct_answer.text.strip().replace("正确答案:", "").strip() + "\n\n")
            elif fill_answer:
                file.write(fill_answer.dd.text.strip() + "\n\n")
            else:
                file.write("No Answer for this question.\n\n")
        else:
            file.write("No Answer for this question.\n\n")

print("导出成功！保存的路径为：G:\\题库.txt")