Support Board
Date/Time: Sat, 23 Nov 2024 15:15:35 +0000
[User Discussion] - Python for Sierra Chart
View Count: 38861
[2022-12-10 20:58:35] |
User719512 - Posts: 262 |
I found 1 bug and 1 nit in the excellent code posted by Kiwi in post #67 (Python for Sierra Chart | Post: 269962). Edge case with a time very close to midnight where the rounding code rounds the value up to 86400 which then causes helper() to use a factor of 1 that causes the hour to come back as 24 (valid range is 0-23). Since the code discards milliseconds, an easy fix is to detect this and coerce the time to 86399 (23:59:59). In deserialize(), change code to: def deserialize(excelDateAndTime): """ Returns excel date time as Python datetime object """ date_tok = int(excelDateAndTime) time_tok = round(86400*(excelDateAndTime - date_tok)) # account for rounding error for excelDateAndTime like 44899.99999549769 if time_tok == 86400: time_tok = 86399 d = date(1899, 12, 30) + timedelta(date_tok) t = time(*helper(time_tok, (24, 60, 60, 1000000))) return datetime.combine(d, t) On Windows, the csv will have extra newlines between each line. A fix to that is to change the open() call to pass a `newline` argument: ftsv = open(fileoutput, 'w', newline='') Thanks again Kiwi for the sample code. Not only did that help with my issue to parse scid files, I learned a bit more about Python as well in an area I had not gained much experience yet. Date Time Of Last Edit: 2022-12-10 23:13:30
|
[2022-12-10 21:44:27] |
User726340 - Posts: 30 |
Here is an efficient way to get pandas dataframe, uses numpy memmap. import sys from pathlib import Path import numpy as np import pandas as pd # bar data columns BCOLS = [ 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Trades', 'BidVolume', 'AskVolume' ] def get_scid_df(filename, limitsize=sys.maxsize): f = Path(filename) assert f.exists(), "file not found" stat = f.stat() offset = 56 if stat.st_size < limitsize else stat.st_size - ( (limitsize // 40) * 40) rectype = np.dtype([ (BCOLS[0], '<u8'), (BCOLS[1], '<f4'), (BCOLS[2], '<f4'), (BCOLS[3], '<f4'), (BCOLS[4], '<f4'), (BCOLS[6], '<i4'), (BCOLS[5], '<i4'), (BCOLS[7], '<i4'), (BCOLS[8], '<i4') ]) df = pd.DataFrame(data=np.memmap(f, dtype=rectype, offset=offset, mode="r"), copy=False) df.dropna(inplace=True) df["Time"] = df["Time"] - 2209161600000000 df.drop(df[(df.Time < 1) | (df.Time > 1705466561000000)].index, inplace=True) df.set_index("Time", inplace=True) df.index = pd.to_datetime(df.index, unit='us') df.index = df.index.tz_localize(tz="utc") return df df = get_scid_df('c:\SierraChart\Data\AAPL.scid') print(df) You can also read partial SCID file, by passing size (eg: read last 1mb of data): get_scid_df('c:\SierraChart\Data\AAPL.scid', 1024*1024) You can then use pandas resample to get any timeframe. for example, for 1min: df_1min = ( df.resample("1Min") .agg( { "Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum", "Trades": "sum", "BidVolume": "sum", "AskVolume": "sum", } ) .ffill() ) print(df_1min) Date Time Of Last Edit: 2022-12-10 22:01:46
|
[2022-12-10 22:31:43] |
User719512 - Posts: 262 |
I usually read Reminiscences of a Stock Operator over the xmas break each year. I think I should switch it up this year and devote time to (finally) diving into and learning NUMPY! |
[2022-12-11 03:10:12] |
User183724 - Posts: 183 |
maybe so ... or you could read this Advanced Custom Study Interface and Language (ACSIL) then write up a cool ACSIL study to store sc.basedata to some random bit of memory ... like the clipboard... where your python program could read this random bit of memory... and skip the scid thing with all its disc reading altogether... just a thought. |
[2023-01-12 14:49:10] |
machinagurujp - Posts: 12 |
@User726340 For some reason the close values are never correct when resampling the data. It appears correct after the next resampling timeframe passes for the historical data timeframes. So if I'm resampling by 1 min. The previous 1min and rest of the data when the candle closes is correct. But for the close price of the current candle it's incorrect. Have you seen this? |
[2023-01-12 15:52:15] |
BillsSBC - Posts: 38 |
For some reason the close values are never correct when resampling the data. It appears correct after the next resampling timeframe passes for the historical data timeframes. So if I'm resampling by 1 min. The previous 1min and rest of the data when the candle closes is correct. But for the close price of the current candle it's incorrect. Have you seen this?
In this case, my guess is that while Pandas is using the DateTime index to resample your values, SC uses microseconds as a counter which could screw up the incomplete/current bar. Example 09:30:00.3 is actually lower in time than 09:30:00.20 within SC. |
[2023-01-12 16:22:35] |
machinagurujp - Posts: 12 |
Do you have a recommendation on what I can change then?
|
[2023-01-12 16:51:14] |
BillsSBC - Posts: 38 |
Do you have a recommendation on what I can change then?
Maybe User726340 can provide a solution. The logic would be to index based on order before applying any datetime resampling, apply conversions, then re-index/sort based on the original index. |
[2023-01-12 17:24:50] |
User726340 - Posts: 30 |
it works on my side...can you change "Data/Trade Service Settings" => "Intraday Data Storage.." to 1 second and retry ? This will force SC to store at 1sec sampling in .scid file. This also helps save storage space. This should only impact how it saved in .scid, but not SC live charting. |
[2023-01-12 18:31:29] |
machinagurujp - Posts: 12 |
I originally had it at 1 sec and switched to 1 min. I'll try this again and get back to you.
|
[2023-01-12 18:45:07] |
machinagurujp - Posts: 12 |
I'm still getting the same issue with 1 sec. When you look at the data historically it works fine. But when live and printing the values for current bar based on resampling size it's wrong. After the next bar it corrects the values from previous bars that were incorrect. I provided the code below. It matches yours and technically the data is indexed before it's resampled. I didn't sort it but I can. def __init__(self) -> None: self.bcols = ['Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Trades', 'BidVolume', 'AskVolume'] def get_scid_df(self, filename, limitsize=sys.maxsize): f = Path(filename) assert f.exists(), "file not found" stat = f.stat() offset = 56 if stat.st_size < limitsize else stat.st_size - ( (limitsize // 40) * 40) rectype = np.dtype([ (self.bcols[0], '<u8'), (self.bcols[1], '<f4'), (self.bcols[2], '<f4'), (self.bcols[3], '<f4'), (self.bcols[4], '<f4'), (self.bcols[6], '<i4'), (self.bcols[5], '<i4'), (self.bcols[7], '<i4'), (self.bcols[8], '<i4') ]) df = pd.DataFrame(data=np.memmap(f, dtype=rectype, offset=offset, mode="r"),copy=False) df.dropna(inplace=True) df["Time"] = df["Time"] - 2209161600000000 df.drop(df[(df.Time < 1) | (df.Time > 1705466561000000)].index, inplace=True) df.set_index("Time", inplace=True) df.index = pd.to_datetime(df.index, unit='us') # df.index = df.index.tz_localize(tz="est") return df def resampling(self, df): df_5min = ( df.resample("5Min").agg( { "Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum", "Trades": "sum", "BidVolume": "sum", "AskVolume": "sum", } ).ffill() ) df = df_5min[["Open", "High", "Low", "Close"]] df.index.rename('Open time', inplace=True) eastern = pytz.timezone('US/Eastern') df.index = df.index.tz_localize(pytz.utc).tz_convert(eastern) df.index = df.index.tz_localize(None) df = df / 100 #Needed for CL prices return df Date Time Of Last Edit: 2023-01-12 18:56:47
|
[2023-01-12 19:33:42] |
User726340 - Posts: 30 |
I run below test.py few times, and i see same close value before and after sampling. # test.py df = get_scid_df(scid_file_path, 1024 * 1024) print(df[["Close"]][-5:]) df_1min = ( df.resample("1Min") .agg( { "Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum", "Trades": "sum", "BidVolume": "sum", "AskVolume": "sum", } ) .ffill() ) print(df_1min[["Close"]][-5:]) |
[2023-01-12 19:53:37] |
machinagurujp - Posts: 12 |
Yes, it's correct when you look at the data historically. But when you are extracting close[-1] say at the close of every 1 min candle. I'm getting a wrong value most of the time. It also applies sometimes to Open, High, and Low. So if at the next minute I run [-2:] then -2 is correct because it updated the value and now -1 is incorrect. It must be something else on my end. Thank you!
Date Time Of Last Edit: 2023-01-12 20:02:12
|
[2023-01-12 20:13:56] |
User726340 - Posts: 30 |
no, i ran the test.py repeatedly (eg: 5 times back-to-back in 5 seconds). everytime, the last value in resampled data is same as the last value from before sampling. So i do not see any incorrect close values in after-sampled data (even for the last bar, i am not referring to historical bars).
|
[2023-01-12 20:56:58] |
machinagurujp - Posts: 12 |
My apologies, it was another issue. It seems sometimes there is an issue when my scheduler run. The problems only shows up when there is an update at 4:59.59 seconds for each minutes. That's an issue with my time time. It's off a 1 second.
Date Time Of Last Edit: 2023-01-12 23:56:58
|
[2023-01-12 23:04:27] |
BillsSBC - Posts: 38 |
You can also read partial SCID file, by passing size (eg: read last 1mb of data):
I've found this code to be really helpful, thank you for sharing. The issue below seems to only be present when writing bar data out to csv and is not an issue reading directly from scid with your code: In this case, my guess is that while Pandas is using the DateTime index to resample your values, SC uses microseconds as a counter which could screw up the incomplete/current bar. Example 09:30:00.3 is actually lower in time than 09:30:00.20 within SC.
User726340, have you made any effort to read tick data (each trade) in real-time. My biggest issues are appending and analyzing each trade as the scid is updated with each Pandas update. Do you have any suggestions on how to handle this? |
[2023-01-13 02:20:47] |
User726340 - Posts: 30 |
yes tick data is possible, you may want to use in-memory filesystem(eg: ramdisk or tmpfs) for "Data" folder so latency to read from disk can be removed. You can then just run a tight loop with reasonably smallest limitsize, may need to rely on pandas to merge/overwrite existing tick rows.
|
[2023-02-24 19:45:00] |
sgne - Posts: 104 |
Here's what tick data SCIDs yield with Kiwi's python code. The open comes back as zero, and the HLC values are different numbers. Either H=C or L=C. I guess H=Ask, L=Bid, and C=Trade Price. The zero Open value prints as either 0.0 or a tiny 10^-38 value. To get eastern time, I set tzvar to -5, but the date was off by one, to judge from data spanning the eoy, so I changed this line, under def deserialize, to 29 instead of 30: d = date(1899, 12, 29) + timedelta(date_tok) There's a problem, however, with DST, which happens in early March, so then tzvar needs to be -4 for that time, and switch back in early November. It needs to vary. If anyone knows anything more about this, or if I'm doing something wrong, feel free to chime in. Date Time Of Last Edit: 2023-02-25 15:16:44
|
[2023-02-25 18:05:51] |
sgne - Posts: 104 |
Okay, so Kiwi's code needs to be adjusted. I don't think the long value that's extracted from the SCID, e.g. 3879765719337000, is microseconds. I'm thinking that the number can be broken down into three numbers: 3879765719 337 000. The first number is seconds since 12-30-1899. The second number is milliseconds. The third number indicates the trade number at that millisecond (000 is the first trade, 001 the second, etc.). The way I'm dealing with this, for now, is to make the long a string, discard the last three numbers, save the milliseconds as a string (337), make 3879765719 an integer, and subtract 25569 * 86400 from it (the seconds between 1899 and 1970, which Python uses). Then I run it through a time python function to get the time values and the time zone adjustment I want, then I tack on the milliseconds. So 3879765719337 = 20221209 114159.337 (eastern time). For now, I'm using the function localtime(), and the input is, in this case, 3879765719 - 25569 * 86400. Since we're dealing with integers, there's no rounding error. For NQ, milliseconds begin on Sunday 20200823. It looks like before that, the last 6 numbers aren't part of the time. They mark the trades at that second. Date Time Of Last Edit: 2023-02-25 20:00:05
|
[2023-09-06 11:55:47] |
User130862 - Posts: 20 |
Greetings to all snakes =) I need to get my hands on the array for number bars data bid and ask to be used with Python. Is this possible? If so, how? |
[2024-02-13 04:03:11] |
rudy5 - Posts: 5 |
Thanks so much for that function. I was having issues with the time adjustment so I did this instead, in case its helpful for anyone. The time adjustment will need to be updated manually for DST I'd imagine but that should be simple enough. def get_scid_df(symbol, limitsize=sys.maxsize):
f = Path(rf'c:\SierraChart\Data\{symbol}') assert f.exists(), "file not found" stat = f.stat() offset = 56 if stat.st_size < limitsize else stat.st_size - ((limitsize // 40) * 40) BCOLS = ['datetime', 'open', 'high', 'low', 'close', 'volume', 'trades', 'bidVolume', 'askVolume'] rectype = np.dtype([ (BCOLS[0], '<u8'), (BCOLS[1], '<f4'), (BCOLS[2], '<f4'), (BCOLS[3], '<f4'), (BCOLS[4], '<f4'), (BCOLS[6], '<i4'), (BCOLS[5], '<i4'), (BCOLS[7], '<i4'), (BCOLS[8], '<i4') ]) df = pd.DataFrame(data=np.memmap(f, dtype=rectype, offset=offset, mode="r"), copy=False) df = df.dropna() years_microseconds = 70 * 365.25 * 24 * 60 * 60 * 10**6 # accounts for leap years supposedly days_microseconds = 2 * 24 * 60 * 60 * 10**6 total_microseconds = years_microseconds + days_microseconds # 70 years, 2 days df.datetime = df.datetime - total_microseconds # Add 6 hours to get to local time df.datetime = pd.to_datetime(df.datetime, unit='us') + pd.Timedelta(hours=6) df.set_index('datetime', inplace=True) # Prices need to be adjusted if 'ES' in symbol: div = 100 if any(item in symbol for item in ['6E', '6B']): div = 10000 for c in ['high', 'low', 'close']: df[c] = df[c] / div df.open = df.close return df I have yet to get custom `limitsize` values working reliably though. Date Time Of Last Edit: 2024-02-13 04:21:36
|
[2024-02-13 07:50:44] |
User726340 - Posts: 30 |
scid is saved in UTC time format, so i am not sure why you need DST or any other adjustments. The original function should be working okay. Once you get a pandas dataframe, you can convert from UTC to DST or anytime zone , outside of that function.
|
[2024-02-13 16:44:01] |
User150671 - Posts: 69 |
If you don't need the convenience of a dataframe, you can save the extra overhead by just reading into and writing from a np.array. Example: read a scid file to a np.array: def get_scid_np(scidFile, limitsize=sys.maxsize): # Check Path Exists f = Path(scidFile) assert f.exists(), "SCID file not found" # set the offset from limitsize stat = f.stat() offset = 56 if stat.st_size < limitsize else stat.st_size - ( (limitsize // 40) * 40) ### Setup the SCID dtype sciddtype = np.dtype([ ('Time', '<u8'), ('Open', '<f4'), ('High', '<f4'), ('Low', '<f4'), ('Close', '<f4'), ('Trades','<i4'), ('Volume', '<i4'), ('BidVolume', '<i4'), ('AskVolume', '<i4') ]) scid = np.fromfile(scidFile, dtype=sciddtype, offset=offset) return scid Write a np.array to a scid file: def write_scid(scidarray,outfile,mode = 'w'): ## Write header file in write mode if mode == 'w': with open(outfile, 'wb') as fscid: header = b'SCID8\x00\x00\x00(\x00\x00\x00\x01\x00' fscid.write(header) for i in range(21): fscid.write(b'\x00\x00') fscid.write(scidarray.tobytes()) fscid.close() # Append the scid array else else: with open(outfile, 'ab') as fscid: fscid.write(scidarray.tobytes()) fscid.close() return Also to whoever wrote this piece of code originally which I see is still being used above: BCOLS = ['datetime', 'open', 'high', 'low', 'close', 'volume', 'trades', 'bidVolume', 'askVolume']
rectype = np.dtype([ (BCOLS[0], '<u8'), (BCOLS[1], '<f4'), (BCOLS[2], '<f4'), (BCOLS[3], '<f4'), (BCOLS[4], '<f4'), (BCOLS[6], '<i4'), (BCOLS[5], '<i4'), (BCOLS[7], '<i4'), (BCOLS[8], '<i4') ]) I didn't notice what you had done and just took your bcols and rectype and made it one variable... because why define something in order to then use it to define another variable with? Problem is, the bcols order DOES NOT MATCH THE ORDER THEY ARE DEFINED IN THE DTYPE (volume and trades are switched around). Took me like 40 minutes to figure out how I broke my scid file generation. Well played sir. You can define the np.dtype in one variable as follows: sciddtype = np.dtype([ ('Time', '<u8'), ('Open', '<f4'), ('High', '<f4'), ('Low', '<f4'), ('Close', '<f4'), ('Trades','<i4'), ('Volume', '<i4'), ('BidVolume', '<i4'), ('AskVolume', '<i4') ]) Date Time Of Last Edit: 2024-02-13 16:47:25
|
To post a message in this thread, you need to log in with your Sierra Chart account: