1. 程式人生 > >html table 轉 Markdown表格 (python指令碼實現)

html table 轉 Markdown表格 (python指令碼實現)

如果有很多特殊符號不一定能處理好,需要自己調整下指令碼語言

in.txt (瀏覽器 複製元素 內容而來)

<table class="data-table"><tbody>
    <tr>
    <th>Name</th>
    <th>Description</th>
    <th>Type</th>
    <th>Default</th>
    <th>Valid Values</th>
    <th>Importance</th
>
</tr> <tr> <td>blacklist</td><td>Fields to exclude. This takes precedence over the whitelist.</td><td>list</td><td>""</td><td></td><td>medium</td></tr> <tr> <td>renames</td><td
>
Field rename mappings.</td><td>list</td><td>""</td><td>list of colon-delimited pairs, e.g. <code>foo:bar,abc:xyz</code></td><td>medium</td></tr> <tr> <td>whitelist</td><td>Fields to include. If specified, only these fields will be used.</td
>
<td>list</td><td>""</td><td></td><td>medium</td></tr> </tbody></table>

python指令碼

# -*- coding:utf-8 -*-

import re
from bs4 import BeautifulSoup

f = open('in.txt')
contents = f.read()
# print(contents)
f_out = open('out.md','w+')

soup = BeautifulSoup(contents, 'html5lib')
data_list = []
for idx, tr in enumerate(soup.find_all('tr')):
    if idx != 0:
        tds = tr.find_all('td')
        row_str = "|"
        for td in tds:
            #print td.contents
            #print type(td.contents)
            td_content_list = []
            for content in td.contents:
                # 強制轉換為 string
                str2 = str(content)
                # 替換 <code> </code> 為 ```
                str3 = str2.replace("<code>", "```" ).replace("</code>", "```" )
                #print str3
                td_content_list.append(str3)
            # list 轉 str
            td_content_str = ''.join(td_content_list)
            #print td_content_str
            row_str = row_str + " " + td_content_str + " |"

            # row_str = row_str + " " + td.text + " |" 
        f_out.write(row_str + "\n")
    else:
        # 表頭
        ths = tr.find_all('th')
        # tlen = len(ths)
        row_str = "|"
        row_str2 = "|"
        for th in ths:
            row_str = row_str + " " + th.text + " |"
            row_str2 = row_str2 + " :- |"
        f_out.write(row_str + "\n")  
        f_out.write(row_str2 + "\n") 

f.close()
f_out.close()

轉換後寫入到 out.md檔案中

| Name | Description | Type | Default | Valid Values | Importance |
| :- | :- | :- | :- | :- | :- |
| blacklist | Fields to exclude. This takes precedence over the whitelist. | list | "" |  | medium |
| renames | Field rename mappings. | list | "" | list of colon-delimited pairs, e.g. ```foo:bar,abc:xyz``` | medium |
| whitelist | Fields to include. If specified, only these fields will be used. | list | "" |  | medium |