Program Connection: proportional fill using multiple files

As part of my ongoing testing of the performance impact of multiple
files vs. a single file. (Here is the link, comments would be
appreciated http://linktrim.com/yq) I have come across some
inconsistencies in how "proportional fill" works. According to books
online
"Filegroups use a proportional fill strategy across all the files
within each filegroup. As data is written to the filegroup, Microsoft=AE
SQL Server=99 2000 writes an amount proportional to the free space in
the file to each file within the filegroup, rather than writing all the
data to the first file until full, and then writing to the next file.
For example, if file f1 has 100 megabytes (MB) free and file f2 has 200
MB free, one extent is allocated from file f1, two extents from file
f2, and so on. This way both files become full at about the same time,
and simple striping is achieved."
Based on some of my testing this is not what is occurring. I created 3
files in one filegroup for my DB. The first number is the file size,
the second number is the used size after the data load.
1=2E Test one
File1 - 3000 - 520
File2 - 2000 - 520
File3 - 2000 - 520
2=2E Test two
File1 - 3000 - 293
File2 - 2000 - 293
File3 - 3000 - 293
3=2E Test three
File1 - 4000 - 416
File2 - 2000 - 232
File3 - 2000 - 232
As you can see in test one and two the used size is the same across
files even thought the 3 data files are not the same size. Not until I
make one file twice as large as the others did I sea difference in
used space.
Can anyone explain this behavior?
BertWere all the files completely empty? What size is that in? Is it MB's,
GBs, KBs etc?
--
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107271506.112735.198080@.f14g2000cwb.googlegroups.com...
As part of my ongoing testing of the performance impact of multiple
files vs. a single file. (Here is the link, comments would be
appreciated http://linktrim.com/yq) I have come across some
inconsistencies in how "proportional fill" works. According to books
online
"Filegroups use a proportional fill strategy across all the files
within each filegroup. As data is written to the filegroup, Microsoft®
SQL ServerT 2000 writes an amount proportional to the free space in
the file to each file within the filegroup, rather than writing all the
data to the first file until full, and then writing to the next file.
For example, if file f1 has 100 megabytes (MB) free and file f2 has 200
MB free, one extent is allocated from file f1, two extents from file
f2, and so on. This way both files become full at about the same time,
and simple striping is achieved."
Based on some of my testing this is not what is occurring. I created 3
files in one filegroup for my DB. The first number is the file size,
the second number is the used size after the data load.
1. Test one
File1 - 3000 - 520
File2 - 2000 - 520
File3 - 2000 - 520
2. Test two
File1 - 3000 - 293
File2 - 2000 - 293
File3 - 3000 - 293
3. Test three
File1 - 4000 - 416
File2 - 2000 - 232
File3 - 2000 - 232
As you can see in test one and two the used size is the same across
files even thought the 3 data files are not the same size. Not until I
make one file twice as large as the others did I sea difference in
used space.
Can anyone explain this behavior?
Bert|||Hey Andrew thanks for taking the time to respond. Yes the files were
completely empty, and the size is in MB. Here is the DDL for the
table.
CREATE DATABASE [testdb]
ON (NAME = N'testdb_Data',
FILENAME = N'e:\sqldata\mssql\data\testdb_Data.MDF' ,
SIZE = 3000, FILEGROWTH = 10%), (NAME = N'testdb_Data1',
FILENAME = N'e:\sqldata\mssql\data\testdb_Data1.NDF' ,
SIZE = 2000, FILEGROWTH = 10%), (NAME = N'testdb_Data2',
FILENAME = N'e:\sqldata\mssql\data\testdb_Data2.NDF' ,
SIZE = 3000, FILEGROWTH = 10%) LOG ON (NAME = N'testdb_Log',
FILENAME = N'g:\sqldata\mssql\data\testdb_Log.LDF' ,
SIZE = 500, FILEGROWTH = 10%)
I reran the tests a second time as I was wondering if checking the
batch size for my bulk insert would have an affect. I ran the test
twice once without setting the batch size and a second time of setting
the batch size of 10000. Both tests resulted in the data files being
filled equally, even though the second file is smaller than the others.|||OK this is a guess but makes a lot of sense when you think about it. The
results you had are listed below. Since the files are filled at the extent
level it can't split an extent and put half in one file and half in another.
So in your tests until you get differences in free space that are exact
multiples of each other it will round down. So with a file size of 2000
(file2) and 3000 (file3) it would allocate one extent to file 2 and one to
file 3. When you get double the free space as in File1 and file 2 it can
then allocate in a 2 to 1 fashion. I believe the algorithm has to be very
simple to keep it efficient. That and the fact it allocates in extents lend
to the behavior you see.
1. Test one
File1 - 3000 - 520
File2 - 2000 - 520
File3 - 2000 - 520
2. Test two
File1 - 3000 - 293
File2 - 2000 - 293
File3 - 3000 - 293
3. Test three
File1 - 4000 - 416
File2 - 2000 - 232
File3 - 2000 - 232
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107287428.961190.60820@.z14g2000cwz.googlegroups.com...
> Hey Andrew thanks for taking the time to respond. Yes the files were
> completely empty, and the size is in MB. Here is the DDL for the
> table.
> CREATE DATABASE [testdb]
> ON (NAME = N'testdb_Data',
> FILENAME = N'e:\sqldata\mssql\data\testdb_Data.MDF' ,
> SIZE = 3000, FILEGROWTH = 10%), (NAME = N'testdb_Data1',
> FILENAME = N'e:\sqldata\mssql\data\testdb_Data1.NDF' ,
> SIZE = 2000, FILEGROWTH = 10%), (NAME = N'testdb_Data2',
> FILENAME = N'e:\sqldata\mssql\data\testdb_Data2.NDF' ,
> SIZE = 3000, FILEGROWTH = 10%) LOG ON (NAME = N'testdb_Log',
> FILENAME = N'g:\sqldata\mssql\data\testdb_Log.LDF' ,
> SIZE = 500, FILEGROWTH = 10%)
> I reran the tests a second time as I was wondering if checking the
> batch size for my bulk insert would have an affect. I ran the test
> twice once without setting the batch size and a second time of setting
> the batch size of 10000. Both tests resulted in the data files being
> filled equally, even though the second file is smaller than the others.
>|||This is correct. In test 1, if you keep adding data, when the remaining free
space in file 1 versus file 2 become a ratio of 2:1, then we allocate 2
pages in file 1 for every page in file 2.
Please also note that when file becomes very large or when there are a lot
of concurrent transactions, we use some performance improvement technique
with a side effect that the free space calculation is not 100% accurate.
--
Wei Xiao [MSFT]
SQL Server Storage Engine Development
http://weblogs.asp.net/weix
This posting is provided "AS IS" with no warranties, and confers no rights.
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:eHQruWNCFHA.2072@.TK2MSFTNGP10.phx.gbl...
> OK this is a guess but makes a lot of sense when you think about it. The
> results you had are listed below. Since the files are filled at the
> extent level it can't split an extent and put half in one file and half in
> another. So in your tests until you get differences in free space that are
> exact multiples of each other it will round down. So with a file size of
> 2000 (file2) and 3000 (file3) it would allocate one extent to file 2 and
> one to file 3. When you get double the free space as in File1 and file 2
> it can then allocate in a 2 to 1 fashion. I believe the algorithm has to
> be very simple to keep it efficient. That and the fact it allocates in
> extents lend to the behavior you see.
> 1. Test one
> File1 - 3000 - 520
> File2 - 2000 - 520
> File3 - 2000 - 520
> 2. Test two
> File1 - 3000 - 293
> File2 - 2000 - 293
> File3 - 3000 - 293
> 3. Test three
> File1 - 4000 - 416
> File2 - 2000 - 232
> File3 - 2000 - 232
>
> --
> Andrew J. Kelly SQL MVP
>
> "bert" <bertcord@.gmail.com> wrote in message
> news:1107287428.961190.60820@.z14g2000cwz.googlegroups.com...
>> Hey Andrew thanks for taking the time to respond. Yes the files were
>> completely empty, and the size is in MB. Here is the DDL for the
>> table.
>> CREATE DATABASE [testdb]
>> ON (NAME = N'testdb_Data',
>> FILENAME = N'e:\sqldata\mssql\data\testdb_Data.MDF' ,
>> SIZE = 3000, FILEGROWTH = 10%), (NAME = N'testdb_Data1',
>> FILENAME = N'e:\sqldata\mssql\data\testdb_Data1.NDF' ,
>> SIZE = 2000, FILEGROWTH = 10%), (NAME = N'testdb_Data2',
>> FILENAME = N'e:\sqldata\mssql\data\testdb_Data2.NDF' ,
>> SIZE = 3000, FILEGROWTH = 10%) LOG ON (NAME = N'testdb_Log',
>> FILENAME = N'g:\sqldata\mssql\data\testdb_Log.LDF' ,
>> SIZE = 500, FILEGROWTH = 10%)
>> I reran the tests a second time as I was wondering if checking the
>> batch size for my bulk insert would have an affect. I ran the test
>> twice once without setting the batch size and a second time of setting
>> the batch size of 10000. Both tests resulted in the data files being
>> filled equally, even though the second file is smaller than the others.
>|||Ok I see so basically it will write equal pages to the files until the
free space is at a 2:1 ration. I wasn't thinking about it on an extent
by extent basis. I was thinking that the end result could be
calculated as such.
(FileSize/TotalFileGroupSizeFreeSpace) * TotalDataToLoad
So in this example
File1 - 3000 - 293
File2 - 2000 - 293
File3 - 3000 - 293
Results should be
File1 - 3000 - 329.625
File2 - 2000 - 219.75
File3 - 3000 - 329.625
Wei maybe you can answer this... as data is allocated at what level is
it determined what file should be written to? IS it at the extent
level? When using multiple files are teh extents grouped into larger
chunks based on file?
bert|||Bert,
There are no cases that I know of where you would break apart an extent.
Each extent always has 8 pages allocated to it (whether or not they all
include data). An extent may have pages from multiple objects (Mixed
Extent) but this is rare and only used when the object has less than 8
pages. You can read and write at the page level but the page always lives
on an extent. So if you are writing to a file and the current extent is full
it will create a new extent in the proper file and begin writing to it. It
will not write half an extent here and half there so to speak.
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107362088.012050.183690@.l41g2000cwc.googlegroups.com...
> Ok I see so basically it will write equal pages to the files until the
> free space is at a 2:1 ration. I wasn't thinking about it on an extent
> by extent basis. I was thinking that the end result could be
> calculated as such.
> (FileSize/TotalFileGroupSizeFreeSpace) * TotalDataToLoad
> So in this example
> File1 - 3000 - 293
> File2 - 2000 - 293
> File3 - 3000 - 293
> Results should be
> File1 - 3000 - 329.625
> File2 - 2000 - 219.75
> File3 - 3000 - 329.625
> Wei maybe you can answer this... as data is allocated at what level is
> it determined what file should be written to? IS it at the extent
> level? When using multiple files are teh extents grouped into larger
> chunks based on file?
> bert
>|||I was wondering if it does the opposite...not smaller than an extent
but if the proportional fill ever allocates in groups of extents. For
example under load will it write 4 extents to file1 and then 4 extents
to file2 or does it always go extent by extent switching between files.
I am trying to figure out the best configuration for my new server. I
have a database that is about 3TB in size. We currently load about
300-500 million records a month into several tables and want to make
sure I set up the files and file groups correctly.
My new server has 2 PERC4 cards with 2 channels each. Each channel has
a RAID10 array that are each 1TB in size, this gives me four drives
with 1TB each F,G on one channel and H,I on another. I have other
drives for tempdb and tlog that are mirrors.
We are using partitioned tables for data with new tables created each
month.
The strategy I am leaning towards is as follows.
When new tables are created create 2 new file groups. For this month
they would be FebuaryData and FeburaryIndex, create one file in each
filegroup
F:\ FebuaryData
H:\ FeburaryIndex
Next month I would create
G:\MarchData
I:\MarchIndex
What I am trying to determine if there is any advantage to creating
multiple files within each file group, for example.
F:\ FebuaryData-File1
H:\ FeburaryIndex-File1
G:\ FebuaryData-File2
I:\ FebuaryData-File2
F:\ FebuaryData1
H:\ FeburaryIndex1
G:\ FebuaryData2
I:\ FeburaryIndex2
The current database has multiple files located on the same drives. I
have already determined that this does not give better performance.
Any input would be appreciated
Thanks
Bert|||I suspect that it is possible they can allocate more than one extent at a
time per file when it makes sense. Wei Xiao said something to the effect
that they do make adjustments in certain circumstances. Is this going to
be a DW or a large OLTP system? Will you be doing more seeks or range
scans? I wouldn't worry as much about getting the data in as working with
it afterwards. While you may find that one technique is slightly faster
than others for loading the data it is usually a small percentage of the
overall time spent in a system. So one thing you need to decide is whether
the load time or the read time is more important. Then figure out how the
data will be read. If you do a lot of queries that scan ranges of indexes
it will most like benefit from separating the data from the indexes onto
different arrays. If you do a lot of scans (partially or totally) it may be
better to create 4 files for the filegroup and place one on each of the Raid
10's. Of coarse there are no hard and fast rules that you can ccount on in
this situation. It really depends a lot on how you use the data and most of
the time you can only find the optimal solution by trial and error. That
usually takes more time and resources than people want to give but that is
the way it goes. How long do you plan on keeping the data? Is it a yearly
process in which you drop one month out and bring in the next one? Having a
filegroup for each month in that case may not be too bad of an idea. It
allows you to backup at the filegroup level and that may help if the older
months don't have update activity. Again without knowing exactly how you
will use the data it's a guess but why only create 1 file per file group?
If you have 4 drives why not 2 files per FG and place 1 file on each drive.
Two for the indexes and 2 for the data.
--
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107403443.038720.258700@.c13g2000cwb.googlegroups.com...
>I was wondering if it does the opposite...not smaller than an extent
> but if the proportional fill ever allocates in groups of extents. For
> example under load will it write 4 extents to file1 and then 4 extents
> to file2 or does it always go extent by extent switching between files.
> I am trying to figure out the best configuration for my new server. I
> have a database that is about 3TB in size. We currently load about
> 300-500 million records a month into several tables and want to make
> sure I set up the files and file groups correctly.
> My new server has 2 PERC4 cards with 2 channels each. Each channel has
> a RAID10 array that are each 1TB in size, this gives me four drives
> with 1TB each F,G on one channel and H,I on another. I have other
> drives for tempdb and tlog that are mirrors.
> We are using partitioned tables for data with new tables created each
> month.
> The strategy I am leaning towards is as follows.
> When new tables are created create 2 new file groups. For this month
> they would be FebuaryData and FeburaryIndex, create one file in each
> filegroup
> F:\ FebuaryData
> H:\ FeburaryIndex
> Next month I would create
> G:\MarchData
> I:\MarchIndex
> What I am trying to determine if there is any advantage to creating
> multiple files within each file group, for example.
> F:\ FebuaryData-File1
> H:\ FeburaryIndex-File1
> G:\ FebuaryData-File2
> I:\ FebuaryData-File2
> F:\ FebuaryData1
> H:\ FeburaryIndex1
> G:\ FebuaryData2
> I:\ FeburaryIndex2
> The current database has multiple files located on the same drives. I
> have already determined that this does not give better performance.
> Any input would be appreciated
> Thanks
> Bert
>|||>>I suspect that it is possible they can allocate more than one extent
at a
>>time per file when it makes sense.
I guess the main thing I wanted to know is when they would fill at
differnet rates. IT makes it import to plan ahead as I dont want to
run in to a situation where I have the files spread out but they are
not being written to at the same rate. Would make it hard to plan for
space.
>>Is this going to be a DW or a large OLTP system?
OLTP, the system will do real time processing of firewall and ids event
data. A set of queries will always be runnign processing the data and
looking for attackers.
>>How long do you plan on keeping the data? Is it a yearly
>>process in which you drop one month out and bring in the next one?
yep that is what we are doing. This is actually one of 4 servers that
will be loading and processing the same data. We are going to set up
the other servers to act as our "backup" and they will use differnet
indexing stratagies. This is why I am spending so much time trying to
make sure I make the right file layout choice.
>>why only create 1 file per file group
Some of our queries will span several months. With 4 drives I was
thinking I could have month1 on adrive Index 1 on another and then do
the same for month 2. After thinking about this some more and looking
at some of the query patterns I have determined that most queries
target data in the last month so this layout would not be ideal.
Ok last question for you ;) ... I have decided on using 2 FileGroups
for now. LargeTables and LargeTablesIndex do I ...
A. Each FileGroup gets one file on each array for a total of eight
files
Pro - Extra Files distribute teh IO across all disk
Con - Index IO could impact Table IO and vs versa
F:\ LargeTables - File1, LargeTablesIndex -File1
H:\ LargeTables - File2, LargeTablesIndex -File1
G:\ LargeTables - File3, LargeTablesIndex -File1
I:\ LargeTables - File4, LargeTablesIndex -File1
B. Each Filegroup has 2 dedicated arrays
Pro - EAch filegroup has dedicated drives
Con - Not fully utilizing resources
F:\ LargeTables - File1
H:\ LargeTables - File2
G:\ LargeTablesIndex -File1
I:\ LargeTablesIndex -File1
I think I am leaning towards A... I am also going to perform some more
tests to back up this theory. I am almost done writing a multi
threading utility that can be used to load test the DB once this is
done it will make my testing much easier|||>>I suspect that it is possible they can allocate more than one extent
at a
>>time per file when it makes sense.
I guess the main thing I wanted to know is when they would fill at
differnet rates. IT makes it import to plan ahead as I dont want to
run in to a situation where I have the files spread out but they are
not being written to at the same rate. Would make it hard to plan for
space.
>>Is this going to be a DW or a large OLTP system?
OLTP, the system will do real time processing of firewall and ids event
data. A set of queries will always be runnign processing the data and
looking for attackers.
>>How long do you plan on keeping the data? Is it a yearly
>>process in which you drop one month out and bring in the next one?
yep that is what we are doing. This is actually one of 4 servers that
will be loading and processing the same data. We are going to set up
the other servers to act as our "backup" and they will use differnet
indexing stratagies. This is why I am spending so much time trying to
make sure I make the right file layout choice.
>>why only create 1 file per file group
Some of our queries will span several months. With 4 drives I was
thinking I could have month1 on adrive Index 1 on another and then do
the same for month 2. After thinking about this some more and looking
at some of the query patterns I have determined that most queries
target data in the last month so this layout would not be ideal.
Ok last question for you ;) ... I have decided on using 2 FileGroups
for now. LargeTables and LargeTablesIndex do I ...
A. Each FileGroup gets one file on each array for a total of eight
files
Pro - Extra Files distribute teh IO across all disk
Con - Index IO could impact Table IO and vs versa
F:\ LargeTables - File1, LargeTablesIndex -File1
H:\ LargeTables - File2, LargeTablesIndex -File1
G:\ LargeTables - File3, LargeTablesIndex -File1
I:\ LargeTables - File4, LargeTablesIndex -File1
B. Each Filegroup has 2 dedicated arrays
Pro - EAch filegroup has dedicated drives
Con - Not fully utilizing resources
F:\ LargeTables - File1
H:\ LargeTables - File2
G:\ LargeTablesIndex -File1
I:\ LargeTablesIndex -File1
I think I am leaning towards A... I am also going to perform some more
tests to back up this theory. I am almost done writing a multi
threading utility that can be used to load test the DB once this is
done it will make my testing much easier|||>>I suspect that it is possible they can allocate more than one extent
at a
>>time per file when it makes sense.
I guess the main thing I wanted to know is when they would fill at
differnet rates. IT makes it import to plan ahead as I dont want to
run in to a situation where I have the files spread out but they are
not being written to at the same rate. Would make it hard to plan for
space.
>>Is this going to be a DW or a large OLTP system?
OLTP, the system will do real time processing of firewall and ids event
data. A set of queries will always be runnign processing the data and
looking for attackers.
>>How long do you plan on keeping the data? Is it a yearly
>>process in which you drop one month out and bring in the next one?
yep that is what we are doing. This is actually one of 4 servers that
will be loading and processing the same data. We are going to set up
the other servers to act as our "backup" and they will use differnet
indexing stratagies. This is why I am spending so much time trying to
make sure I make the right file layout choice.
>>why only create 1 file per file group
Some of our queries will span several months. With 4 drives I was
thinking I could have month1 on adrive Index 1 on another and then do
the same for month 2. After thinking about this some more and looking
at some of the query patterns I have determined that most queries
target data in the last month so this layout would not be ideal.
Ok last question for you ;) ... I have decided on using 2 FileGroups
for now. LargeTables and LargeTablesIndex do I ...
A. Each FileGroup gets one file on each array for a total of eight
files
Pro - Extra Files distribute teh IO across all disk
Con - Index IO could impact Table IO and vs versa
F:\ LargeTables - File1, LargeTablesIndex -File1
H:\ LargeTables - File2, LargeTablesIndex -File1
G:\ LargeTables - File3, LargeTablesIndex -File1
I:\ LargeTables - File4, LargeTablesIndex -File1
B. Each Filegroup has 2 dedicated arrays
Pro - EAch filegroup has dedicated drives
Con - Not fully utilizing resources
F:\ LargeTables - File1
H:\ LargeTables - File2
G:\ LargeTablesIndex -File1
I:\ LargeTablesIndex -File1
I think I am leaning towards A... I am also going to perform some more
tests to back up this theory. I am almost done writing a multi
threading utility that can be used to load test the DB once this is
done it will make my testing much easier|||>>I suspect that it is possible they can allocate more than one extent
at a
>>time per file when it makes sense.
I guess the main thing I wanted to know is when they would fill at
differnet rates. IT makes it import to plan ahead as I dont want to
run in to a situation where I have the files spread out but they are
not being written to at the same rate. Would make it hard to plan for
space.
>>Is this going to be a DW or a large OLTP system?
OLTP, the system will do real time processing of firewall and ids event
data. A set of queries will always be runnign processing the data and
looking for attackers.
>>How long do you plan on keeping the data? Is it a yearly
>>process in which you drop one month out and bring in the next one?
yep that is what we are doing. This is actually one of 4 servers that
will be loading and processing the same data. We are going to set up
the other servers to act as our "backup" and they will use differnet
indexing stratagies. This is why I am spending so much time trying to
make sure I make the right file layout choice.
>>why only create 1 file per file group
Some of our queries will span several months. With 4 drives I was
thinking I could have month1 on adrive Index 1 on another and then do
the same for month 2. After thinking about this some more and looking
at some of the query patterns I have determined that most queries
target data in the last month so this layout would not be ideal.
Ok last question for you ;) ... I have decided on using 2 FileGroups
for now. LargeTables and LargeTablesIndex do I ...
A. Each FileGroup gets one file on each array for a total of eight
files
Pro - Extra Files distribute teh IO across all disk
Con - Index IO could impact Table IO and vs versa
F:\ LargeTables - File1, LargeTablesIndex -File1
H:\ LargeTables - File2, LargeTablesIndex -File1
G:\ LargeTables - File3, LargeTablesIndex -File1
I:\ LargeTables - File4, LargeTablesIndex -File1
B. Each Filegroup has 2 dedicated arrays
Pro - EAch filegroup has dedicated drives
Con - Not fully utilizing resources
F:\ LargeTables - File1
H:\ LargeTables - File2
G:\ LargeTablesIndex -File1
I:\ LargeTablesIndex -File1
I think I am leaning towards A... I am also going to perform some more
tests to back up this theory. I am almost done writing a multi
threading utility that can be used to load test the DB once this is
done it will make my testing much easier|||> Ok last question for you ;) ... I have decided on using 2 FileGroups
> for now. LargeTables and LargeTablesIndex do I ...
> A. Each FileGroup gets one file on each array for a total of eight
> files
> Pro - Extra Files distribute teh IO across all disk
> Con - Index IO could impact Table IO and vs versa
> F:\ LargeTables - File1, LargeTablesIndex -File1
> H:\ LargeTables - File2, LargeTablesIndex -File1
> G:\ LargeTables - File3, LargeTablesIndex -File1
> I:\ LargeTables - File4, LargeTablesIndex -File1
> B. Each Filegroup has 2 dedicated arrays
> Pro - EAch filegroup has dedicated drives
> Con - Not fully utilizing resources
> F:\ LargeTables - File1
> H:\ LargeTables - File2
> G:\ LargeTablesIndex -File1
> I:\ LargeTablesIndex -File1
>
Since you said this is going to be an OLTP you should be doing more seeks
than scans. That means relatively small amount of pages being read from
each file. So I would spread the files across all the array's as in choice
A. These are RAID 10's and the activity should be such that I am not sure
you would get as much benefit from dedicating two drives for indexes alone.
Your key is to make sure you have enough ram to mitigate the I/O anyway.
You are spending a lot of money on drives so make sure you give the memory
it's due as well. Have you considered using 64 bit at all? I am not saying
that it is the way to go, only that you should weigh it's pros and cons as
well. If you are doing lots of computations and need support for large
memory it can be a winner.
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107492155.518141.278920@.o13g2000cwo.googlegroups.com...|||>>Since you said this is going to be an OLTP you should be doing more
seeks
>>than scans.
Still lots of scnas... I guess I woudl call this a hybrid database.
Ever 24 hours about 30-50 millions rows are loaded. Most queries are
scanning this data.
>>You are spending a lot of money on drives so make sure you give the
memory
>>it's due as well.
We are starting with 8GB...hopefully this is enough.
>>Have you considered using 64 bit at all?
I would love to ;)...heck we are still running windows 2000... Other
reasons I am putting so much effort into measuring performance is so I
have a baseline to compare to. Once I get this server set up I think I
will finally have some time to start my WIn2003 and Yukon testing. I
finished my .Net app today...I call it the DBJammer. Very simple for
now but it creates two thread pools...one pool runs bulk inserts 2
threads at A time and the other pool fires of queries against the data
about 50 threads at a time. Hopefully I can use this to help get some
good metrics. I have heard that an upgrade to windows 2003 gives a
measurable performance gain as well.|||Absolutely you should use Win2003. It makes no sense to build a new app and
use Win2000. 2003 has many benefits to aide in performance across the
board.
--
Andrew J. Kelly SQL MVP
"bert" <bertcord@.gmail.com> wrote in message
news:1107580908.417955.239800@.g14g2000cwa.googlegroups.com...
>>Since you said this is going to be an OLTP you should be doing more
> seeks
>>than scans.
> Still lots of scnas... I guess I woudl call this a hybrid database.
> Ever 24 hours about 30-50 millions rows are loaded. Most queries are
> scanning this data.
>>You are spending a lot of money on drives so make sure you give the
> memory
>>it's due as well.
> We are starting with 8GB...hopefully this is enough.
>>Have you considered using 64 bit at all?
> I would love to ;)...heck we are still running windows 2000... Other
> reasons I am putting so much effort into measuring performance is so I
> have a baseline to compare to. Once I get this server set up I think I
> will finally have some time to start my WIn2003 and Yukon testing. I
> finished my .Net app today...I call it the DBJammer. Very simple for
> now but it creates two thread pools...one pool runs bulk inserts 2
> threads at A time and the other pool fires of queries against the data
> about 50 threads at a time. Hopefully I can use this to help get some
> good metrics. I have heard that an upgrade to windows 2003 gives a
> measurable performance gain as well.
>|||I agree... unfortunatley I am only the DBA..And it's not my decision.
I can't just say hey lets use windows2003 because it's better.
However I am going to run a set of tests on identical hardware, and
then I can say hey lets use windows 2003 because in these test it
performned this much faster. I think it is kind of crazy to spend all
of that money on hardware and use windows 2000.
Bert

Friday, March 30, 2012

proportional fill using multiple files

No comments:

Post a Comment

Program Connection

Blog Archive

About Me