Support Board
Date/Time: Sat, 23 Nov 2024 14:54:09 +0000
[User Discussion] - Python for Sierra Chart
View Count: 38859
[2013-06-13 03:00:17] |
Kiwi - Posts: 375 |
For a while I've been looking at moving my development to Python rather than C++. To enable this required a few extra bits and pieces that I'll post to the board for anyone interested. The original post is here. The attached file runs with Python3 (might run with 2 as well). It: 1. Reads an SCID file. 2. Converts it to a pandas dataframe (for time series manipulations) 3. Writes it back to an SCID file leaving the T, BV and AV fields free so that they can be used for commands to C++ code to indicate conditions on the SC chart or to act on Sierra (place orders etc). An entire read/convert/write loop takes under 1ms on a 3.2GHz machine so there is no appreciable lag between reading a new tick and writing it to the output file. . |
SCID_to_DF_RT.py - Attached On 2013-06-13 02:59:38 UTC - Size: 10.03 KB - 2198 views |
[2013-06-13 19:24:24] |
ganz - Posts: 1048 |
Hello sir Thank you for the file. Very interesting. gd lck |
[2013-06-20 23:10:17] |
Kiwi - Posts: 375 |
Updated Version with new methods in for converting the dataframe to a longer timeframe and for mapping longer timeframe development onto the lower timeframe. #!/usr/bin/python3
from __future__ import print_function import numpy as np import pandas as pd import struct import sys from time import sleep, time o = O = 'O' h = H = 'H' l = L = 'L' c = C = 'C' v = V = 'V' x = 'x' y = 'y' z = 'z' time_list = [] overrun_list = [] overruns = 0 lt = 15 mt = 5 st = 1 ohlc = {o: 'first', h: 'max', l: 'min', c: 'last', v: 'sum', x: 'sum', y: 'sum', z: 'sum'} cols = [O, H, L, C, V, x, y, z] time_list = [] class SierraFile(object): """ """ def __init__(self, filename): self.filename = str(filename) # self.tzAdjust = t imedelta(hours=+10).seconds/d2s self.tzAdjust = np.timedelta64(10, 'h') / np.timedelta64(1, 'D') self.excelDate = np.datetime64('1899-12-30') self.sizeHeader = 0x38 self.sizeRecord = 0x28 self.pos = 0 self.last = 0 def read_existing_records(self): with open(self.filename, 'rb') as fscid: fscid.read(self.sizeHeader) # discard header rows = [] ts = [] for i in range(1000000): data = fscid.read(self.sizeRecord) if data not in ('', b''): d = struct.unpack('d4f4I', data) dt = d[0] + self.tzAdjust ts.append(self.excelDate + np.timedelta64(int(dt)) + (np.timedelta64(int(round((dt - int(dt)) * 86400)), 's'))) datarow = [d[1], d[2], d[3], d[4], d[5], 0, 0, 0] rows.append(datarow) else: break self.pos = self.last = fscid.tell() return (ts, rows) def read_record(self): global overruns, overrun_list with open(self.filename, 'rb') as fscid: fscid.seek(0, 2) # Go to the end of the file self.last = fscid.tell() if self.last == self.pos: # no new data >> nothing to do return (-999, 0, 0) else: # data to collect if self.pos < self.last - self.sizeRecord: # > 1 record print('Overrun', self.last - self.pos, (self.last - self.pos) / self.sizeRecord) overruns += 1 overrun_list.append(np.datetime64('now')) late_flag = True else: late_flag = False fscid.seek(self.pos, 0) self.pos += self.sizeRecord data = fscid.read(self.sizeRecord) d = struct.unpack('d4f4I', data) dt = d[0] + self.tzAdjust new_time = (self.excelDate + np.timedelta64(int(dt)) + (np.timedelta64(int(round((dt - int(dt)) * 86400)), 's'))) datarow = [d[1], d[2], d[3], d[4], d[5], 0, 0, 0] return (new_time, datarow, late_flag) def write_existing_records(self, dataframe): with open(self.filename, 'wb') as fscid: header = b'SCID8\x00\x00\x00(\x00\x00\x00\x01\x00' fscid.write(header) for i in range(21): fscid.write(b'\x00\x00') for i in range(dataframe.end): da = ((dataframe.df.index.values[i] - self.excelDate) / np.timedelta64(1, 'D') - self.tzAdjust) db, dc, dd, de, df, dg, dh, di = dataframe.df.iloc[i] di = 0x11100111 df = int(df) dg = int(dg) dh = int(dh) di = int(di) wt = struct.pack('d4f4I', da, db, dc, dd, de, df, dg, dh, di) fscid.write(wt) def write_record(self, dataframe): with open(self.filename, 'ab') as fscid: i = dataframe.end - 1 da = ((dataframe.df.index.values[i] - self.excelDate) / np.timedelta64(1, 'D') - self.tzAdjust) db, dc, dd, de, df, dg, dh, di = dataframe.df.iloc[i] di = 0x88300388 df = int(df) dg = int(dg) dh = int(dh) di = int(di) record = struct.pack('d4f4I', da, db, dc, dd, de, df, dg, dh, di) fscid.write(record) class SierraFrame(object): """ DataFrame is the basic object for analysis: init reads the .scid file into the initial object, 5 sec assumed extend_frame adds 5000 rows to the df because appending rows is slow add appends new data in the extended frame for real time operation build_tf creates a new dataframe that is a multiplier of the input df build_htf_array creates an array showing higher timeframe bars as they develop for the lower timeframe array countfloats is a test method """ def __init__(self, time_index, data): self.df = pd.DataFrame(data, index=time_index, columns=[O, H, L, C, V, x, y, z]) self.end = len(self.df) self.pos = 0 def extend_frame(self): ''' Create a 5000 row array from last time in self.df and append it to self.df Remove lunch break from array ''' print('Extending DataFrame Now') s5 = np.timedelta64(5, 's') h1 = np.timedelta64(1, 'h') sl = np.datetime64('today') + np.timedelta64(14, 'h') el = np.datetime64('today') + np.timedelta64(15, 'h') start_time = self.df.index.values[self.end - 1] dtgen = ((start_time + i * s5) for i in range(1, 5000)) dtstrip = ((i + h1 if sl <= i < el else i) for i in dtgen) dg = pd.DataFrame(index=dtstrip, columns=self.df.columns) #dg.iloc[:] = 0.0 #dg[[v, x, y, z]] = dg[[v, x, y, z]].astype('int') self.df = self.df.append(dg) self.df = self.df.astype(np.float64) def add(self, new_time, datarow): ''' Add a row to an existing extended df but: extend if its within 5 of the end fill with last bar if its not the next bar convert the four integer columns to float for df speed of access ''' if self.end > len(self.df) - 5: self.extend_frame() # not needed if first fill > day length np_time = np.datetime64(new_time) if np_time < self.df.index.values[self.end]: return # new data is earlier than current while np_time > self.df.index.values[self.end]: self.df.iloc[self.end] = self.df.iloc[self.end - 1] self.end += 1 # fill with prior row if new is later for i in [4, 5, 6, 7]: datarow[i] = float(datarow[i]) self.df.iloc[self.end] = datarow # fill when times match #self.df.iloc[self.end] = self.df.iloc[self.end].astype(np.float64) self.end += 1 def build_tf(self, ht): ''' Create higher timeframe df that is a multiplier of the input, di with ht being the high timeframe bar length in minutes ''' return self.df.resample(str(ht)+'min', how=ohlc)[cols] def build_htf_array(self, st, ht): ''' Map higher timeframe development on to input df with ht being the high timeframe bar length in minutes ''' di = self.df.resample(str(st)+'min', how=ohlc)[cols] dih = di.iloc[:,0:5] for i in range(len(dih)): if i == 0 or i//ht > (i-1)//ht: bO = dih.iloc[i, 0] bH = dih.iloc[i, 1] bL = dih.iloc[i, 2] bC = dih.iloc[i, 3] else: dih.iloc[i, 0] = bO dih.iloc[i, 1] = bH = max(bH, dih.iloc[i, 1]) dih.iloc[i, 2] = bL = min(bL, dih.iloc[i, 2]) bC = dih.iloc[i, 3] return dih def countfloats(self): length = len (self.df) width = len(self.df.iloc[0]) floats = 0 nonfloats = 0 for i in range(length): for j in range(width): if isinstance(self.df.iloc[i,j], float): floats += 1 else: nonfloats += 1 return (floats, nonfloats) def build_htf_array(di, ht): ''' Map higher timeframe development on to input df with ht being the high timeframe bar length in minutes ''' dih = di.iloc[:,0:5].copy() for i in range(len(dih)): if i == 0 or i//ht > (i-1)//ht: bO = dih.iloc[i, 0] bH = dih.iloc[i, 1] bL = dih.iloc[i, 2] bC = dih.iloc[i, 3] else: dih.iloc[i, 0] = bO dih.iloc[i, 1] = bH = max(bH, dih.iloc[i, 1]) dih.iloc[i, 2] = bL = min(bL, dih.iloc[i, 2]) bC = dih.iloc[i, 3] return dih def build_tf(di, ht): ''' Create higher timeframe df that is a multiplier of the input, di with ht being the high timeframe bar length in minutes ''' return di.resample(str(ht)+'min', how=ohlc)[cols] def SierraRun(): global time_list time0 = time() #filename = '/home/john/zRamdisk/SierraChart/Data/HSI-201306-HKFE-TD.scid' filename = '/home/john/zRamdisk/SierraChart/Data/HSIM13-FUT-HKFE-TD.scid' hsi = SierraFile(filename) time_index, data = hsi.read_existing_records() da = SierraFrame(time_index, data) import ipdb; ipdb.set_trace() # XXX BREAKPOINT da.extend_frame() wtst = SierraFile('/home/john/zRamdisk/SierraChart/Data/HSI-INPUT.scid') wtst.write_existing_records(da) print('df ready', da.end - 1, time() - time0) print(da.df[da.end - 1:da.end + 1]) print() df = da.df print('\n', np.datetime64('now'), da.end) print(df[da.end - 5:da.end + 5]) import ipdb; ipdb.set_trace() # XXX BREAKPOINT #time_list = [] #for i in range(4000): #intime = df.index.values[da.end] #time0 = time() #da.add(intime, [1.0, 2.0, 3.0, 4.0, 5, 6, 7, 8]) #time_list.append(time() - time0) #if time_list: #print('TimeStats', max(time_list), #sum(time_list) / len(time_list)) #print('\nEnd of NaN version') # print('next', hsi.pos, hsi.last) # jtst = SierraFile('/home/john/zRamdisk/SierraChart/Data/HSI-INPUT.scid') # time_index, data = jtst.read_existing_records() # ja = SierraFrame(time_index, data) # jf = ja.df # print('\n', ja.end) # print(df[ja.end-5:ja.end+5]) # print('next', jtst.pos, jtst.last) # return # ################### counter = 0 # sys.stdout = os.fdopen(sys.stdout.fileno(), "w", newline=None) counter_flag = False timer_no_data = time() timer_no_data_flag = False overruns = 0 overrun_list = [] while True: time0 = time() new_time, data, late_flag = hsi.read_record() if new_time != -999: #time1 = time() da.add(new_time, data) #print("{:.6f}".format(time() - time1), end = ' ') sys.stdout.flush() wtst.write_record(da) if counter > 3: time_list.append(time() - time0) timer_no_data = time() #print(da.df[da.end-1:da.end], da.end) print('.', end=' ') sys.stdout.flush() if timer_no_data_flag: print('Data Restored') timer_no_data = time() timer_no_data_flag = False counter += 1 counter_flag = True if time() - timer_no_data >= 120 and not timer_no_data_flag: timer_no_data_flag = True print('Data lost for two minutes') if not late_flag: sleep_time = 0.1 - (time() - time0) if sleep_time > 0: sleep(sleep_time) if counter % 12 == 0 and counter_flag: counter_flag = False print(' Overruns:', overruns, overrun_list, end=' ') print('TimeStats', "{:.6f} {:.6f}".format(max(time_list), sum(time_list) / len(time_list)), '\n', end=' ') # print(df[da.end-1:da.end]) sys.stdout.flush() # break if counter % 60 == 0 and counter != 0: import ipdb; ipdb.set_trace() # XXX BREAKPOINT def main(): SierraRun() if __name__ == '__main__': """ Takes a SierraChart scid file (input argument 1) and converts it to a Pandas DataFrame Timezone conversion can follow the users local timezone, or a specified integer (input l or an integer but if the default filename is being used, '' must be specified for the filename) """ print('start') sys.stdout.flush() main() print('fin') if time_list != []: print('TimeStats', "{:.6f} {:.6f}".format(max(time_list), sum(time_list) / len(time_list)), '\n', end=' ') Date Time Of Last Edit: 2013-06-21 02:52:49
|
[2014-01-08 14:40:26] |
vectorTrader - Posts: 86 |
Kiwi, you may be just the guy I am looking for. can you tell me how I can either create a separate .bat file or program(possibly using python, to print my chartbook. I want to create an automatic way of printing some graphs at the end of the day. Thanks for the help |
[2014-01-08 23:39:03] |
Kiwi - Posts: 375 |
If your using Linux then, yes I probably can, but I abandoned Windows a while back.
|
[2014-01-09 04:05:37] |
vectorTrader - Posts: 86 |
I am still interested. While I use a win7 to trade, I really would like to be on linux anyway if I can. How are you graphing/trading on linux? If so I would love to see some screenshot of whatever you are using. |
[2014-01-09 05:39:24] |
onnb - Posts: 662 |
this is off the original topic on python, jbutta, for what its worth, the following study will create an image for you on bar close. I used it in order to save images to a web server and it works quite well. You might need to adapt it for your needs like saving the file on session close or at a specific time of day. You would then apply this study to all charts you want saved. SC saves them for you in the images directory same as you would be saving an image manually. if (sc.SetDefaults) { sc.GraphName = "Save Chart Image to File"; sc.StudyDescription = ""; sc.AutoLoop = 1; // true sc.GraphRegion = 2; sc.HideStudy = 1; sc.DrawZeros = 0; sc.FreeDLL = 1; return; } if (sc.GetBarHasClosedStatus() == BHCS_BAR_HAS_NOT_CLOSED) { return; } sc.SaveChartImageToFile = 1; |
[2014-01-09 13:44:53] |
vectorTrader - Posts: 86 |
Awesome thanks. I think this is what I needed to get where I want to go. I appreiciate it. |
[2014-01-09 13:55:57] |
Hendrixon - Posts: 130 |
Do you mean to develop studies in Python? What does it give that C++ don't? |
[2014-01-09 17:17:04] |
vectorTrader - Posts: 86 |
nothing, I just want to be on linux eventually for my trading platform. As for this program, I have only programmed a little for NT7 and nothing substantial. I hope to be able to write something today for it. thanks |
[2014-01-09 21:28:05] |
vectorTrader - Posts: 86 |
ONNB, thanks. I am new to coding and sc coding for that matter. I want to create a function that save the chart at the end of the day at 16:15 each day. I tried to work with what I saw in other codes, but It just keeps saving continuous png's. I just want it to save once at the end of the day. Can you tell me what I did wrong here? remember im a newbie. thanks in advance! SCDLLName("Autosave")
SCSFExport scsf_autosave(SCStudyGraphRef sc) { SCInputRef Time1 = sc.Input[0]; if (sc.SetDefaults) { sc.GraphName = "Save Chart Image to File"; sc.StudyDescription = ""; sc.AutoLoop = 1; // true sc.GraphRegion = 2; sc.HideStudy = 1; sc.DrawZeros = 0; sc.FreeDLL = 1; Time1.Name = "Time to Print"; Time1.SetTime(HMS_TIME(16,15,0)); return; } SCDateTime Print_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),Time1.GetTime() );//Set the print time for each day. SCDateTime Current_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),sc.BaseDateTimeIn[sc.Index].GetTime());//Get the current time sc.Subgraph[0][sc.Index]=0; //Flag signal if (Current_Time>=Print_Time) { sc.Subgraph[0][sc.Index]=1; //attempt to flag Print time so it only prints once per day } else return; //I want it to print/save at the change of state if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1) { sc.SaveChartImageToFile = 1; sc.AddMessageToLog("Printed file",true); } else return; } |
[2014-01-09 23:45:45] |
onnb - Posts: 662 |
the best way to do this depends on your specific circumstances but sticking as much possible to your code... what is happening is that the study function is like OnBarUpdate from NT. It gets called repeatably for any given bar. So the code here get called many times within the first bar encountered that starts at 16:15 or later if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1) { sc.SaveChartImageToFile = 1; sc.AddMessageToLog("Printed file",true); } The simplest way I can think of to leave the rest of your approach intact and still get this to print just once would be to add a condition that you are on bar close for the study to process. That way the study only processes once. You would add this like so: if (sc.GetBarHasClosedStatus() == BHCS_BAR_HAS_NOT_CLOSED) { return; } SCDateTime Print_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),Time1.GetTime() );//Set the print time for each day. SCDateTime Current_Time(sc.BaseDateTimeIn[sc.Index].GetDate(),sc.BaseDateTimeIn[sc.Index].GetTime());//Get the current time sc.Subgraph[0][sc.Index]=0; //Flag signal if (Current_Time>=Print_Time) { sc.Subgraph[0][sc.Index]=1; //attempt to flag Print time so it only prints once per day } else return; //I want it to print/save at the change of state if (sc.Subgraph[0][sc.Index-1]==0 && sc.Subgraph[0][sc.Index]==1) { sc.SaveChartImageToFile = 1; sc.AddMessageToLog("Printed file",true); } else return; hope this helps |
[2014-01-10 05:04:57] |
Kiwi - Posts: 375 |
You wouldn't develop studies in Python. Python or R or Julia give you an advantage when you want to analyse data statistically or look for generic tendencies etc etc. If you just want to create studies or executable systems (or test them) then its easiest to do it in C (there isn't much ++ in what we do for Sierra Chart fortunately). In my case, if I wanted to analyse that data in a sophisticated way then I'd export the results of the first phase in a text form and analyse it with the higher level language and its libraries. |
[2014-01-10 06:20:40] |
onnb - Posts: 662 |
So you are analyzing the feed in real time and then writing back the trading actions which are then executed by a SC study? Did I get that right?
|
[2014-01-10 11:37:13] |
norvik - Posts: 22 |
Kiwi, I have a question, do you know some Python-based library like Esper ? Esper is Complex Event Prosessing framework eventual in Java and C#. Thanks. |
[2014-01-10 13:15:29] |
Hendrixon - Posts: 130 |
Got you kiwi, thanks. Can you give an example of a statistical analysis or generic tendency that is needed to be done on the raw data? |
[2014-01-10 21:35:06] |
vectorTrader - Posts: 86 |
thanks to everyone here. I was able to figure it out even though I'm not a coder. Kiwi, actually I was working was going to teach myself R so that I could use it in my business(Full Time) to do just what you are talking about. I am also interested in R do do some statistical modeling based on my trading methodology. Have you had any success modeling and has it help your trading? |
[2014-01-12 23:48:18] |
Kiwi - Posts: 375 |
onnb, yes, I'd agree with that view. norvik, if real time event processing was a key driver I'd be using C / C++ or Java (or one of the functional languages) not Python. Hendrixon, for generic tendencies just look at Lawrence or Ernie Chan's stuff. An example of statistical analysis would be to take your test outputs and Monte Carlo them or just use statistical packages to examine their predictive power. jbutta, R has very powerful libraries although its an ugly language and rather slow. Modelling is an interest rather than a core of my trading. So far I find simple traditional approaches still provide the best results for me. |
[2014-01-29 20:36:24] |
vladaman - Posts: 1 |
Here is simple Sierra Chart SCID File Reader written in Java https://gist.github.com/vladaman/8696352
|
[2014-03-02 23:11:31] |
User15451 - Posts: 27 |
Kiwi, I have a couple of offline questions can you provide your e-mail? Sincerly TS |
[2014-07-06 04:01:40] |
ganz - Posts: 1048 |
for someone who's interested in http://www.youtube.com/watch?v=0unf-C-pBYE Date Time Of Last Edit: 2014-07-06 05:08:44
|
[2014-07-06 12:32:55] |
ganz - Posts: 1048 |
Hi All I'm not a programmer so this is my simple solution to get data from *.scid and store it to *.hdf5 in order to pandas it later #!/usr/bin/python3 import struct import datetime as dt import sys import pandas as pd import numpy as np inputfile = open(sys.argv[1],'rb') dt.timedelta(microseconds=1) with inputfile as f: f.read(56) df_src=[] ts_src=[] while True: tick=f.read(40) if not tick: break src = struct.unpack('d4f4L', tick) ts_tmp=dt.datetime(1899, 12, 30) + dt.timedelta(src[0]) ts_src.append(ts_tmp) df_tmp=[src[4],src[7],src[8]] df_src.append(df_tmp) tubus = pd.HDFStore('tubus.h5') df=pd.DataFrame(df_src, index=ts_src, columns=['Price', 'bidVol', 'askVol']) df.to_hdf('tubus.h5','df') print(df.index) print(df.head()) print(tubus) tubus.close() 1. how to run it: ~/>python3 this_script.py chart.scid 2. the script parses *.scid and creates the df DataFrame (TimeSeries): Price, bidVol, askVol 3. the script creates HDF5 file tubus.h5 and stores df for 500MB.scid it takes 48s on i5/hugeRAM/HDD Date Time Of Last Edit: 2014-07-06 12:34:11
|
[2014-07-08 00:32:15] |
Kiwi - Posts: 375 |
Hi Ganz, Very nice. One thing that has me confused. Why use HDF5? I did some research after your Python post in the weekend and it seemed to me that HDF5 was about very very big data sets and random access into them. Also the ability to group different data types. Now Sierra data is essentially sequential in nature and 2 data types (double + int) with fixed data in each column. Also the type of operations I do on them is also sequential - sometimes with some work to coerce the data into that of a higher time period before operation. In that case would it be better to store them as CSV and/or just .scid files? Possibly the CSV files could be stored with some form of compression. I'm leaping completely out of my experience here and suggesting the bz2 sequential compression ... I probably need to try it. https://docs.python.org/2/library/bz2.html#sequential-de-compression Or does HDF5 do really nice compressed serialization? My other question relates to Python vs C and how the two should fit into the Sierra Chart world so I'll address it in your other thread. I need to think about it a bit more first though. Date Time Of Last Edit: 2014-07-08 00:40:04
|
[2014-07-09 14:57:04] |
ganz - Posts: 1048 |
Kiwi Hi Why use HDF5?
In that case would it be better to store them as CSV and/or just .scid files?
other question relates to Python vs C
the reason was explained there long term request: make SC as python/pandas compatible the idea is to get an Integrated Trading Environment for stocks, options, futures, bitcoin, etf/cfd, forex ... to achive that the solution should be crossplatform, flexible and stores data using well known data format/scripting language at the production level imho |
[2014-07-10 03:45:46] |
Kiwi - Posts: 375 |
OK. I had read that and didn't see why HDF5 was chosen. Python is extremely happy reading CSVs (or even SCIDs with a little conversion). |
To post a message in this thread, you need to log in with your Sierra Chart account: