2019年9月18日 星期三

使用Python抓取台股證交所每日股價資料進行分析

使用網址「http://www.twse.com.tw/exchangeReport/STOCK_DAY?date=20180817&stockNo=2330」,由證交所所提供的網址,可以經由stockNo指定股票編號,date指定股票日期,以json格式回傳一個月的股價與交易量,以下為回傳結果。
{"stat":"OK","date":"20180817","title":"107年08月 2330 台積電           各日成交資訊","fields":["日期","成交股數","成交金額","開盤價","最高價","最低價","收盤價","漲跌價差","成交筆數"],"data":[["107/08/01","29,777,161","7,375,488,342","247.00","248.00","246.50","248.00","+2.00","11,667"],["107/08/02","22,775,110","5,611,725,541","249.00","249.50","243.50","244.50","-3.50","10,343"],["107/08/03","25,165,097","6,205,758,662","246.00","248.00","245.00","247.00","+2.50","9,585"],["107/08/06","22,364,568","5,487,396,854","245.00","247.00","244.00","245.50","-1.50","9,732"],"notes":["符號說明:+/-/X表示漲/跌/不比價","當日統計資訊含一般、零股、盤後定價、鉅額交易,不含拍賣、標購。","ETF證券代號第六碼為K、M、S、C者,表示該ETF以外幣交易。"]}

使用request.get擷取指定日期與股票編號的網頁資料,使用request的函式json進行json格式的解碼成Python的資料結構,取出data所對應的值就是當月該股票的交易資料,使用函式transform進行格式轉換

import numpy as np
import requests
import pandas as pd
import datetime

#   http://www.twse.com.tw/exchangeReport/STOCK_DAY?date=20180817&stockNo=2330  取一個月的股價與成交量
def get_stock_history(date, stock_no):
    quotes = []
    url = 'http://www.twse.com.tw/exchangeReport/STOCK_DAY?date=%s&stockNo=%s' % ( date, stock_no)
    r = requests.get(url)
    data = r.json()
    return transform(data['data'])  #進行資料格式轉換

def transform_date(date):
        y, m, d = date.split('/')
        return str(int(y)+1911) + '/' + m  + '/' + d  #民國轉西元
    
def transform_data(data):
    data[0] = datetime.datetime.strptime(transform_date(data[0]), '%Y/%m/%d')
    data[1] = int(data[1].replace(',', ''))  #把千進位的逗點去除
    data[2] = int(data[2].replace(',', ''))
    data[3] = float(data[3].replace(',', ''))
    data[4] = float(data[4].replace(',', ''))
    data[5] = float(data[5].replace(',', ''))
    data[6] = float(data[6].replace(',', ''))
    data[7] = float(0.0 if data[7].replace(',', '') == 'X0.00' else data[7].replace(',', ''))  # +/-/X表示漲/跌/不比價
    data[8] = int(data[8].replace(',', ''))
    return data

def transform(data):
    return [transform_data(d) for d in data]

def create_df(date,stock_no):
    s = pd.DataFrame(get_stock_history(date, stock_no))
    s.columns = ['date', 'shares', 'amount', 'open', 'high', 'low', 'close', 'change', 'turnover']
                #"日期","成交股數","成交金額","開盤價","最高價","最低價","收盤價","漲跌價差","成交筆數" 
    stock = []
    for i in range(len(s)):
        stock.append(stock_no)
    s['stockno'] = pd.Series(stock ,index=s.index)  #新增股票代碼欄,之後所有股票進入資料表才能知道是哪一張股票
    datelist = []
    for i in range(len(s)):
        datelist.append(s['date'][i])
    s.index = datelist  #索引值改成日期
    s2 = s.drop(['date'],axis = 1)  #刪除日期欄位
    mlist = []
    for item in s2.index:
        mlist.append(item.month)
    s2['month'] = mlist  #新增月份欄位
    return s2
        
listDji = ['2330']
for i in range(len(listDji)):
    result = create_df('20180701', listDji[i])
    print(result)
    
print(result.groupby('month').close.count())  #每個月幾個營業日
print(result.groupby('month').shares.sum())  #每個月累計成交股數

沒有留言: