1. 程式人生 > >用json.loads()將字串轉換為json格式出錯

用json.loads()將字串轉換為json格式出錯

今天爬取今日頭條的街拍時,需要將裡面的一個字串變為json格式,結果直接轉換就出現了

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

檢視發現是網頁裡面的字串裡面含有\, 如下面的字串所示,在滅一個雙引號前面和右斜槓前面都有一個\. 

{\"count\":9,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911861360624e6e374\"}],\"uri\":\"origin\\/pgc-image\\/15308911861360624e6e374\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187640293d64a75b\"}],\"uri\":\"origin\\/pgc-image\\/1530891187640293d64a75b\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911869350d7e224617\"}],\"uri\":\"origin\\/pgc-image\\/15308911869350d7e224617\",\"height\":6000},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187266752e4a248a\"}],\"uri\":\"origin\\/pgc-image\\/1530891187266752e4a248a\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891187573e72c879774\"}],\"uri\":\"origin\\/pgc-image\\/1530891187573e72c879774\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/153089118689443f3c70490\"}],\"uri\":\"origin\\/pgc-image\\/153089118689443f3c70490\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/1530891186908d2f0efbf63\"}],\"uri\":\"origin\\/pgc-image\\/1530891186908d2f0efbf63\",\"height\":6000},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308911853816554ab3238\"}],\"uri\":\"origin\\/pgc-image\\/15308911853816554ab3238\",\"height\":6000},{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\",\"width\":4000,\"url_list\":[{\"url\":\"http:\\/\\/p99.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/pgc-image\\/15308912219659b671a7fad\"}],\"uri\":\"origin\\/pgc-image\\/15308912219659b671a7fad\",\"height\":6000}],\"max_img_width\":4000,\"labels\":[\"\\u4e09\\u91cc\\u5c6f\",\"\\u6444\\u5f71\"],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\",\"\\u8857\\u62cd\\u5317\\u4eac\\uff0c\\u771f\\u5b9e\\u7684\\u4e09\\u91cc\\u5c6f\\u8857\\u62cd\\uff0c\\u6709\\u4f60\\u559c\\u6b22\\u7684\\u5417\\uff1f\"]}

解決方法就是用replace()將/替換為''即可。

以為這樣就可以順利的執行了,結果。。又出現了一個錯誤。。

json.decoder.JSONDecodeError: Extra data: 

出現這個問題主要是json的格式問題,可能是裡面包含了兩個以上的records。因為json兩個以上的records是要放在list裡面的,如下面的json檔案所示,兩個name的records是放在一個key為foo的list裡面:

{
    "foo" : [
       {"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
       {"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
    ]
}

網上也有很多可以線上檢查json格式的網站,可以幫助發現問題。

但是,我放到網上發現json格式沒有問題。。結果出現問題是因為我在匹配字串的時候匹配了到了字串前後的雙引號,與字串裡面的雙引號出現了衝突,才導致了上面的問題,如下圖所示,在最外面的大括號外面多了雙引號。

"{
    "foo" : [
       {"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
       {"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
    ]
}"

這個是我用來檢測json格式的網站連結:https://www.bejson.com/