Motivation
Trying to use the Japanese Goverment statistics through an api I found it is useless due to the format and datasets which are provided. Instead I decided to read a csv and xls file directly from a website and load them on a pnadas dataframe. Below are the codes.
Example
csv file
The sample url for csv is below
"http://www.e-stat.go.jp/SG1/estat/Csvdl.do?sinfid=000012460662"
which is the url of a csv which contains a population in each prefecture of Japan.
You can get a related csv files from 統計表一覧 政府統計の総合窓口 GL08020103 .(There seems no English websites unfortunately.)
Using the read_csv function below you will be abel to read a csv file on a pandas dataframe.
import pandas as pd url = "http://www.e-stat.go.jp/SG1/estat/Csvdl.do?sinfid=000012460662" #Read csv function def read_csv(url): print(url) res = urllib.request.urlopen(url) res=res.read().decode('shift-jis') df = pd.read_csv(StringIO( res) ) return df #Run read_csv(url)
Result
Excel file(xls )
Next, from
統計表一覧 政府統計の総合窓口 GL08020103
I will get Population and Household data.
Similarly to the read_csv function we can make read_xls function as below.
url = "http://www.e-stat.go.jp/SG1/estat/GL08020103.do?_xlsDownload_&fileId=000003562952&releaseCount=1" #Read xls function def read_xls(url): print(url) res = urllib.request.urlopen(url) f = pd.ExcelFile(res) df = f.parse() return df #Run read_xls(url)
Result