Simple Python: Setup Selenium and PhantomJs on Centos

简介

Selenium是一个自动化测试框架,PhantomJs是一个javascript渲染引擎,两者结合进行网页渲染和动作模拟。

安装selenium

selenium 同时需要jdk的支持。

1
$ pip install selenium

安装phantomjs

sudo yum install gcc gcc-c++ make git openssl-devel freetype-devel fontconfig-devel
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh

安装后的phantomjs执行文件在 phantomjs/bin/下
vim ~/.bash_profile
export PATH="/work/build/phantomjs/bin:${PATH}"
:x
source ~/.bash_profile
phantomjs --version
1.9.8
2.0版本尚不稳定。

phantomjs settings

from selenium import webdriver
from bs4 import BeautifulSoup as bs4
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
    "(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)             # 设置user-agent

driver = webdriver.PhantomJS(service_args=['--load-images=no'])     # 无图片加载
driver.implicitly_wait(10)                                          # 等待超时
driver.get(url)
source = driver.page_source                                         # 获取网页加载后源码
driver.close()                                                      # 关闭驱动
soup = bs4(source,"lxml")