[ACCEPTED]-How to know position(linenumber) of a streamreader in a textfile?-streamreader
I came across this post while looking for 13 a solution to a similar problem where I 12 needed to seek the StreamReader to particular 11 lines. I ended up creating two extension 10 methods to get and set the position on a 9 StreamReader. It doesn't actually provide 8 a line number count, but in practice, I 7 just grab the position before each ReadLine()
and 6 if the line is of interest, then I keep 5 the start position for setting later to 4 get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important 3 part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending 2 on your tolerance for using reflection It 1 thinks it is a fairly simple solution.
Caveats:
- While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files
(ASCII)
. - I only ever use the
StreamReader.ReadLine()
method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of 8 a "line number" is based upon the actual 7 data that's already been read, not just 6 the position. For instance, if you were 5 to Seek() the reader to an arbitrary position, it's 4 not actuall going to read that data, so 3 it wouldn't be able to determine the line 2 number.
The only way to do this is to keep 1 track of it yourself.
It is extremely easy to provide a line-counting 2 wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for 1 the sake of brevity):
- Does not check constructor argument for null
- Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
- Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
- overriding those methods via routing calls to _inner. instead of base.
- passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to 10 any poisition using the underlying stream 9 object (which could be at any point in any 8 line). Now consider what that would do to 7 any count kept by the StreamReader.
Should 6 the StreamReader go and figure out which 5 line it's now on? Should it just keep a 4 number of lines read, regardless of position 3 within the file?
There are more questions 2 than just these that would make this a nightmare 1 to implement, imho.
Here is a guy that implemented a StreamReader 9 with ReadLine() method that registers file 8 position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from 7 StreamReader, and then add the extra method 6 to the special class along with some properties 5 (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[@for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is 4 a minor bug in the code as the length of 3 the string is used to calculate file position 2 instead of using the actual bytes read (Lacking 1 support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If 5 you're just using ReadLine() and don't care 4 about using Seek() or anything, just make 3 a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then 2 you make it the normal way, say from a FileInfo 1 object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber
property.
The points already made with respect to 48 the BaseStream are valid and important. However, there 47 are situations in which you want to read 46 a text and know where in the text you are. It 45 can still be useful to write that up as 44 a class to make it easy to reuse.
I tried 43 to write such a class now. It seems to work 42 correctly, but it's rather slow. It should 41 be fine when performance isn't crucial (it 40 isn't that slow, see below).
I use the same 39 logic to track position in the text regardless 38 if you read a char at a time, one buffer 37 at a time, or one line at a time. While 36 I'm sure this can be made to perform rather 35 better by abandoning this, it made it much 34 easier to implement... and, I hope, to follow 33 the code.
I did a very basic performance 32 comparison of the ReadLine method (which 31 I believe is the weakest point of this implementation) to 30 StreamReader, and the difference is almost 29 an order of magnitude. I got 22 MB/s using 28 my class StreamReaderEx, but nearly 9 times 27 as much using StreamReader directly (on 26 my SSD-equipped laptop). While it could 25 be interesting, I don't know how to make 24 a proper reading test; maybe using 2 identical 23 files, each larger than the disk buffer, and 22 reading them alternately..? At least my 21 simple test produces consistent results 20 when I run it several times, and regardless 19 of which class reads the test file first.
The 18 NewLine symbol defaults to Environment.NewLine 17 but can be set to any string of length 1 16 or 2. The reader considers only this symbol 15 as a newline, which may be a drawback. At 14 least I know Visual Studio has prompted 13 me a fair number of times that a file I 12 open "has inconsistent newlines".
Please 11 note that I haven't included the Guard class; this 10 is a simple utility class and it should 9 be obvoius from the context how to replace 8 it. You can even remove it, but you'd lose 7 some argument checking and thus the resulting 6 code would be farther from "correct". For 5 example, Guard.NotNull(s, "s") simply checks 4 that is s is not null, throwing an ArgumentNullException 3 (with argument name "s", hence the second 2 parameter) should it be the case.
Enough 1 babble, here's the code:
public class StreamReaderEx : StreamReader { // NewLine characters (magic value -1: "not used"). int newLine1, newLine2; // The last character read was the first character of the NewLine symbol AND we are using a two-character symbol. bool insideNewLine; // StringBuilder used for ReadLine implementation. StringBuilder lineBuilder = new StringBuilder(); public StreamReaderEx(string path, string newLine = "\r\n") : base(path) { init(newLine); } public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s) { init(newLine); } public string NewLine { get { return "" + (char)newLine1 + (char)newLine2; } private set { Guard.NotNull(value, "value"); Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported."); newLine1 = value[0]; newLine2 = (value.Length == 2 ? value[1] : -1); } } public int LineNumber { get; private set; } public int LinePosition { get; private set; } public override int Read() { int next = base.Read(); trackTextPosition(next); return next; } public override int Read(char[] buffer, int index, int count) { int n = base.Read(buffer, index, count); for (int i = 0; i
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.