[ACCEPTED]-What's your approach for optimizing large tables (+1M rows) on SQL Server?-bigtable
- At 1 million records, I wouldn't consider this a particularly large table needing unusual optimization techniques such as splitting the table up, denormalizing, etc. But those decisions will come when you've tried all the normal means that don't affect your ability to use standard query techniques.
Now, second approach for optimization was 58 to make a clustered index. Actually the 57 primary index is automatically clusted and 56 I made it a compound index with Stock and 55 Date fields. This is unique, I can't have 54 two quote data for the same stock on the 53 same day.
The clusted index makes sure that 52 quotes from the same stock stay together, and 51 probably ordered by date. Is this second 50 information true?
It's logically true - the 49 clustered index defines the logical ordering 48 of the records on the disk, which is all 47 you should be concerned about. SQL Server 46 may forego the overhead of sorting within 45 a physical block, but it will still behave 44 as if it did, so it's not significant. Querying 43 for one stock will probably be 1 or 2 page 42 reads in any case; and the optimizer doesn't 41 benefit much from unordered data within 40 a page read.
Right now with a half million 39 records it's taking around 200ms to select 38 700 quotes from a specific asset. I believe 37 this number will get higher as the table 36 grows.
Not necessarily significantly. There 35 isn't a linear relationship between table 34 size and query speed. There are usually 33 a lot more considerations that are more 32 important. I wouldn't worry about it in 31 the range you describe. Is that the reason 30 you're concerned? 200 ms would seem to me 29 to be great, enough to get you to the point 28 where your tables are loaded and you can 27 start doing realistic testing, and get a 26 much better idea of real-life performance.
Now 25 for a third approach I'm thinking in maybe 24 splitting the table in three tables, each 23 for a specific market (stocks, options and 22 forwards). This will probably cut the table 21 size by 1/3. Now, will this approach help 20 or it doesn't matter too much? Right now 19 the table has 50mb of size so it can fit 18 entirely in RAM without much trouble.
No! This 17 kind of optimization is so premature it's 16 probably stillborn.
Another approach would 15 be using the partition feature of SQL Server.
Same 14 comment. You will be able to stick for a 13 long time to strictly logical, fully normalized 12 schema design.
What would be other good approachs 11 to make this the fastest possible?
The best 10 first step is clustering on stock. Insertion 9 speed is of no consequence at all until 8 you are looking at multiple records inserted 7 per second - I don't see anything anywhere 6 near that activity here. This should get 5 you close to maximum efficiency because 4 it will efficiently read every record associated 3 with a stock, and that seems to be your 2 most common index. Any further optimization 1 needs to be accomplished based on testing.
A million records really isn't that big. It 13 does sound like it's taking too long to 12 search though - is the column you're searching 11 against indexed?
As ever, the first port 10 of call should be the SQL profiler and query 9 plan evaluator. Ask SQL Server what it's 8 going to do with the queries you're interested 7 in. I believe you can even ask it to suggest 6 changes such as extra indexes.
I wouldn't 5 start getting into partitioning etc just 4 yet - as you say, it should all comfortably 3 sit in memory at the moment, so I suspect 2 your problem is more likely to be a missing 1 index.
Check your execution plan on that query 12 first. Make sure your indexes are being 11 used. I've found that. A million records 10 is not a lot. To give some perspective, we 9 had an inventory table with 30 million rows 8 in it and our entire query which joined 7 tons of tables and did lots of calculations 6 could run in under 200 MS. We found that 5 on a quad proc 64 bit server, we could have 4 signifcantly more records so we never bothered 3 partioning.
You can use SQL Profier to see 2 the execution plan, or just run the query 1 from SQL Management Studio or Query Analyzer.
reevaluate the indexes... thats the most 24 important part, the size of the data doesn't 23 really matter, well it does but no entirely 22 for speed purposes.
My recommendation is 21 re build the indexes for that table, make 20 a composite one for the columns you´ll need 19 the most. Now that you have only a few records 18 play with the different indexes otherwise 17 it´ll get quite annoying to try new things 16 once you have all the historical data in 15 the table.
After you do that review your 14 query, make the query plan evaluator your 13 friend, and check if the engine is using 12 the right index.
I just read you last post, theres 11 one thing i don't get, you are quering the 10 table while you insert data? at the same 9 time?. What for? by inserting, you mean 8 one records or hundred thousands? How are 7 you inserting? one by one?
But again the 6 key of this are the indexes, don't mess 5 with partitioning and stuff yet.. specially 4 with a millon records, thats nothing, i 3 have tables with 150 millon records, and 2 returning 40k specific records takes the 1 engine about 1500ms...
I work for a school district and we have 15 to track attendance for each student. It's 14 how we make our money. My table that holds 13 the daily attendance mark for each student 12 is currently 38.9 Million records large. I 11 can pull up a single student's attendance 10 very quickly from this. We keep 4 indexes 9 (including the primary key) on this table. Our 8 clustered index is student/date which keeps 7 all the student's records ordered by that. We've 6 taken a hit on inserts to this table with 5 regards to that in the event that an old 4 record for a student is inserted, but it 3 is a worthwhile risk for our purposes.
With 2 regards to select speed, I would certainly 1 take advantage of caching in your circumstance.
You've mentioned that your primary key is 15 a compound on (Stock, Date), and clustered. This 14 means the table is organised by Stock and 13 then by Date. Whenever you insert a new 12 row, it has to insert it into the middle 11 of the table, and this can cause the other 10 rows to be pushed out to other pages (page 9 splits).
I would recommend trying to reverse 8 the primary key to (Date, Stock), and adding 7 a non-clustered index on Stock to facilitate 6 quick lookups for a specific Stock. This 5 will allow inserts to always happen at the 4 end of the table (assuming you're inserting 3 in order of date), and won't affect the 2 rest of the table, and lesser chance of 1 page splits.
The execution plan shows it's using the 14 clustered index quite fine, but I forgot 13 an extremely important fact, I'm still inserting 12 data! The insert is probably locking the 11 table too often. There is a way we can see 10 this bottleneck?
The execution plan doesn't 9 seems to show anything about lock issues.
Right 8 now this data is only historical, when the 7 importing process is finished the inserts 6 will stop and be much less often. But I 5 will have a larger table for real-time data 4 soon, that will suffer from this constant insert problem and will 3 be bigger than this table. So any approach 2 on optimizing this kind of situation is 1 very welcome.
another solution would be to create an historical 17 table for each year, and put all this tables 16 in an historical database, fill all those 15 in and then create the appropriate indexes 14 for them. Once you are done with this you 13 won't have to touch them ever again. Why 12 would you have to keep on inserting data? To 11 query all those tables you just "union all" them 10 :p
The current year table should be very 9 different to this historical tables. For 8 what i understood you are planning to insert 7 records on the go?, i'd plan something different 6 like doing a bulk insert or something similar 5 every now and then along the day. Of course 4 all this depends on what you want to do.
The 3 problems here seems to be in the design. I'd 2 go for a new design. The one you have now 1 for what i understand its not suitable.
Actually the primary index is automatically 23 clusted and I made it a compound index with 22 Stock and Date fields. This is unique, I 21 can't have two quote data for the same stock 20 on the same day.
The clusted index makes 19 sure that quotes from the same stock stay 18 together, and probably ordered by date. Is 17 this second information true?
Indexes in 16 SQL Server are always sorted by column order 15 in index. So an index on [stock,date] will 14 first sort on stock, then within stock on 13 date. An index on [date, stock] will first 12 sort on date, then within date on stock.
When 11 doing a query, you should always include 10 the first column(s) of an index in the WHERE 9 part, else the index cannot be efficiently 8 used.
For your specific problem: If date 7 range queries for stocks are the most common 6 usage, then do the primary key on [date, stock], so 5 the data will be stored sequencially by 4 date on disk and you should get fastest 3 access. Build up other indexes as needed. Do 2 index rebuild/statistics update after inserting 1 lots of new data.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.