1. 程式人生 > >關於使用CMD安裝Python第三方模組庫BeautifulSoup失敗的解決方法

關於使用CMD安裝Python第三方模組庫BeautifulSoup失敗的解決方法

文章目錄

問題產生

在進行爬蟲抓取時,需要安裝第三方模組庫BeautifulSoup

探索過程

嘗試使用pip install BeautifulSoup
問題一
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(int “Unit tests have failed!”)?
方法一
進入官網下載安裝包https://files.pythonhosted.org/packages/1e/ee/295988deca1a5a7accd783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz


解壓後再次鍵入:python install setup.py
PS使用wheel安裝包時:pip install **.whl
但依舊發現安裝失敗。
此時在原始碼檔案setup.py:閱讀後發現其print未使用函式

from distutils.core import setup
import unittest
import warnings
warnings.filterwarnings("ignore", "Unknown distribution option")

import sys
# patch distutils if it can't cope with the "classifiers" keyword
if sys.version < '2.2.3':
    from distutils.dist import DistributionMetadata
    DistributionMetadata.classifiers = None
    DistributionMetadata.download_url = None

from BeautifulSoup import __version__

#Make sure all the tests complete.
import BeautifulSoupTests
loader = unittest.TestLoader()
result = unittest.TestResult()
suite = loader.loadTestsFromModule(BeautifulSoupTests)
suite.run(result)
if not result.wasSuccessful():
    print "Unit tests have failed!"
    for l in result.errors, result.failures:
        for case, error in l:
            print "-" * 80
            desc = case.shortDescription()
            if desc:
                print desc
            print error        
    print '''If you see an error like: "'ascii' codec can't encode character...", see\nthe Beautiful Soup documentation:\n http://www.crummy.com/software/BeautifulSoup/documentation.html#Why%20can't%20Beautiful%20Soup%20print%20out%20the%20non-ASCII%20characters%20I%20gave%20it?'''
    print "This might or might not be a problem depending on what you plan to do with\nBeautiful Soup."
    if sys.argv[1] == 'sdist':
        print
        print "I'm not going to make a source distribution since the tests don't pass."
        sys.exit(1)

setup(name="BeautifulSoup",
      version=__version__,
      py_modules=['BeautifulSoup', 'BeautifulSoupTests'],
      description="HTML/XML parser for quick-turnaround applications like screen-scraping.",
      author="Leonard Richardson",
      author_email = "
[email protected]
", long_description="""Beautiful Soup parses arbitrarily invalid SGML and provides a variety of methods and Pythonic idioms for iterating and searching the parse tree.""", classifiers=["Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: Python Software Foundation License", "Programming Language :: Python", "Topic :: Text Processing :: Markup :: HTML", "Topic :: Text Processing :: Markup :: XML", "Topic :: Text Processing :: Markup :: SGML", "Topic :: Software Development :: Libraries :: Python Modules", ], url="http://www.crummy.com/software/BeautifulSoup/", license="BSD", download_url="http://www.crummy.com/software/BeautifulSoup/download/" ) # Send announce to: #
[email protected]
# [email protected]

解決方法

主要原因是Python從2.0版本到3.0版本將其函式進行了大改。
print成為print()函式
親測:目前3.7版本可以使用BS4這一庫函式。

另外

要注意到BS4庫在IDLE中import時無法使用BeautifulSoup4這一庫名,暫時未知其問題出在哪。
解決

Beautiful Soup 3 目前已經停止開發,推薦在現在的專案中使用Beautiful Soup4,不過它已經被移植到BS4了,也就是說匯入時我們需要 import bs4 。所以這裡我們用的版本是 Beautiful Soup
4.3.2 (簡稱BS4),另外據說 BS4 對 Python3 的支援不夠好,不過我用的是 Python2.7.7,如果有小夥伴用的是 Python3 版本,可以考慮下載 BS3 版本。