简介
Selenium是一个自动化测试框架,PhantomJs是一个javascript渲染引擎,两者结合进行网页渲染和动作模拟。
安装selenium
selenium 同时需要jdk的支持。
1 | $ pip install selenium |
安装phantomjs
sudo yum install gcc gcc-c++ make git openssl-devel freetype-devel fontconfig-devel
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh
安装后的phantomjs执行文件在 phantomjs/bin/下
vim ~/.bash_profile
export PATH="/work/build/phantomjs/bin:${PATH}"
:x
source ~/.bash_profile
phantomjs --version
1.9.8
2.0版本尚不稳定。
phantomjs settings
from selenium import webdriver
from bs4 import BeautifulSoup as bs4
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap) # 设置user-agent
driver = webdriver.PhantomJS(service_args=['--load-images=no']) # 无图片加载
driver.implicitly_wait(10) # 等待超时
driver.get(url)
source = driver.page_source # 获取网页加载后源码
driver.close() # 关闭驱动
soup = bs4(source,"lxml")