關於使用CMD安裝Python第三方模組庫BeautifulSoup失敗的解決方法
阿新 • • 發佈:2018-12-17
文章目錄
問題產生
在進行爬蟲抓取時,需要安裝第三方模組庫BeautifulSoup。
探索過程
嘗試使用pip install BeautifulSoup
問題一:
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(int “Unit tests have failed!”)?
方法一:
進入官網下載安裝包https://files.pythonhosted.org/packages/1e/ee/295988deca1a5a7accd783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz
解壓後再次鍵入:
python install setup.py
pip install **.whl
但依舊發現安裝失敗。
此時在原始碼檔案setup.py:閱讀後發現其print未使用函式
from distutils.core import setup
import unittest
import warnings
warnings.filterwarnings("ignore", "Unknown distribution option")
import sys
# patch distutils if it can't cope with the "classifiers" keyword
if sys.version < '2.2.3':
from distutils.dist import DistributionMetadata
DistributionMetadata.classifiers = None
DistributionMetadata.download_url = None
from BeautifulSoup import __version__
#Make sure all the tests complete.
import BeautifulSoupTests
loader = unittest.TestLoader()
result = unittest.TestResult()
suite = loader.loadTestsFromModule(BeautifulSoupTests)
suite.run(result)
if not result.wasSuccessful():
print "Unit tests have failed!"
for l in result.errors, result.failures:
for case, error in l:
print "-" * 80
desc = case.shortDescription()
if desc:
print desc
print error
print '''If you see an error like: "'ascii' codec can't encode character...", see\nthe Beautiful Soup documentation:\n http://www.crummy.com/software/BeautifulSoup/documentation.html#Why%20can't%20Beautiful%20Soup%20print%20out%20the%20non-ASCII%20characters%20I%20gave%20it?'''
print "This might or might not be a problem depending on what you plan to do with\nBeautiful Soup."
if sys.argv[1] == 'sdist':
print
print "I'm not going to make a source distribution since the tests don't pass."
sys.exit(1)
setup(name="BeautifulSoup",
version=__version__,
py_modules=['BeautifulSoup', 'BeautifulSoupTests'],
description="HTML/XML parser for quick-turnaround applications like screen-scraping.",
author="Leonard Richardson",
author_email = " [email protected]",
long_description="""Beautiful Soup parses arbitrarily invalid SGML and provides a variety of methods and Pythonic idioms for iterating and searching the parse tree.""",
classifiers=["Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: Python Software Foundation License",
"Programming Language :: Python",
"Topic :: Text Processing :: Markup :: HTML",
"Topic :: Text Processing :: Markup :: XML",
"Topic :: Text Processing :: Markup :: SGML",
"Topic :: Software Development :: Libraries :: Python Modules",
],
url="http://www.crummy.com/software/BeautifulSoup/",
license="BSD",
download_url="http://www.crummy.com/software/BeautifulSoup/download/"
)
# Send announce to:
# [email protected]
# [email protected]
解決方法
主要原因是Python從2.0版本到3.0版本將其函式進行了大改。
print成為print()函式
親測:目前3.7版本可以使用BS4這一庫函式。
另外
要注意到BS4庫在IDLE中import時無法使用BeautifulSoup4這一庫名,暫時未知其問題出在哪。
解決
Beautiful Soup 3 目前已經停止開發,推薦在現在的專案中使用Beautiful Soup4,不過它已經被移植到BS4了,也就是說匯入時我們需要 import bs4 。所以這裡我們用的版本是 Beautiful Soup
4.3.2 (簡稱BS4),另外據說 BS4 對 Python3 的支援不夠好,不過我用的是 Python2.7.7,如果有小夥伴用的是 Python3 版本,可以考慮下載 BS3 版本。