1. 程式人生 > >[Python]HTML轉換為TXT的指令碼

[Python]HTML轉換為TXT的指令碼

朋友給我發了一些文章,是HTML格式的。但是我的A1200手機只適合看txt格式的書,所以寫了一個指令碼,把某個目錄下的所有.htm檔案轉換成txt,並放到txt目錄下。

 formatter  AbstractFormatter, NullWriter
htmllib HTMLParser

(str, in_encoder="", out_encoder=""):
unicode(str, in_encoder).encode(out_encoder)


(NullWriter):
(self):
NullWriter.__init__(self)
self._bodyText = []

(self, str):
self._bodyText.append(str)

(self):
''.join(self._bodyText)

bodyText = property(_get_bodyText, None, None, '')

(HTMLParser):
(self, attrs):
self.metas = attrs

(filename):
mywriter = myWriter()
absformatter = AbstractFormatter(mywriter)
parser = myHTMLParser(absformatter)
parser.feed(open(filename).read())
( _(parser.title), parser.formatter.writer.bodyText )

os
os.path

OUTPUTDIR = ""
INPUTDIR = ""
__name__ == "":
os.path.exists(OUTPUTDIR):
os.mkdir(OUTPUTDIR)

file os.listdir(INPUTDIR):
file[-4:] == '':
"", file,
outfilename, text = convertFile(file)
outfilename = outfilename + ''
outfullname = os.path.join(OUTPUTDIR, outfilename)
open(outfullname, "").write(text)
""


BTW:以上這段程式碼是用vim的 :TOhtml 命令轉換而成。