[ACCEPTED]-SerializationException when serializing lots of objects in .NET-serialization

Accepted answer
Score: 10

I tried reproducing the problem, but the 17 code just takes forever to run even when 16 each of the 13+ million objects is only 15 2 bytes. So I suspect you could not only 14 fix the problem, but also significantly 13 improve performance if you pack your data 12 a little better in your custom ISerialize 11 implementations. Don't let the serializer 10 see so deep into your structure, but cut 9 it off at the point where your object graph 8 blows up into hundreds of thousands of array 7 elements or more (because presumably if 6 you have that many objects, they're pretty 5 small or you wouldn't be able to hold them 4 in memory anyway). Take this example, which 3 allows the serializer to see classes B and 2 C, but manually manages the collection of 1 class A:

class Program
{
    static void Main(string[] args)
    {
        C c = new C(8, 2000000);
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
        System.IO.MemoryStream ms = new System.IO.MemoryStream();
        bf.Serialize(ms, c);
        ms.Seek(0, System.IO.SeekOrigin.Begin);
        for (int i = 0; i < 3; i++)
            for (int j = i; j < i + 3; j++)
                Console.WriteLine("{0}, {1}", c.all[i][j].b1, c.all[i][j].b2);
        Console.WriteLine("=====");
        c = null;
        c = (C)(bf.Deserialize(ms));
        for (int i = 0; i < 3; i++)
            for (int j = i; j < i + 3; j++)
                Console.WriteLine("{0}, {1}", c.all[i][j].b1, c.all[i][j].b2);
        Console.WriteLine("=====");
    }
}

class A
{
    byte dataByte1;
    byte dataByte2;
    public A(byte b1, byte b2)
    {
        dataByte1 = b1;
        dataByte2 = b2;
    }

    public UInt16 GetAllData()
    {
        return (UInt16)((dataByte1 << 8) | dataByte2);
    }

    public A(UInt16 allData)
    {
        dataByte1 = (byte)(allData >> 8);
        dataByte2 = (byte)(allData & 0xff);
    }

    public byte b1
    {
        get
        {
            return dataByte1;
        }
    }

    public byte b2
    {
        get
        {
            return dataByte2;
        }
    }
}

[Serializable()]
class B : System.Runtime.Serialization.ISerializable
{
    string name;
    List<A> myList;

    public B(int size)
    {
        myList = new List<A>(size);

        for (int i = 0; i < size; i++)
        {
            myList.Add(new A((byte)(i % 255), (byte)((i + 1) % 255)));
        }
        name = "List of " + size.ToString();
    }

    public A this[int index]
    {
        get
        {
            return myList[index];
        }
    }

    #region ISerializable Members

    public void GetObjectData(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context)
    {
        UInt16[] packed = new UInt16[myList.Count];
        info.AddValue("name", name);
        for (int i = 0; i < myList.Count; i++)
        {
            packed[i] = myList[i].GetAllData();
        }
        info.AddValue("packedData", packed);
    }

    protected B(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context)
    {
        name = info.GetString("name");
        UInt16[] packed = (UInt16[])(info.GetValue("packedData", typeof(UInt16[])));
        myList = new List<A>(packed.Length);
        for (int i = 0; i < packed.Length; i++)
            myList.Add(new A(packed[i]));
    }

    #endregion
}

[Serializable()]
class C
{
    public List<B> all;
    public C(int count, int size)
    {
        all = new List<B>(count);
        for (int i = 0; i < count; i++)
        {
            all.Add(new B(size));
        }
    }
}
Score: 3

The issue has been fixed with .NET Core 14 2.1. I have requested to backport the solution 13 to .NET Framework 4.8:

https://github.com/Microsoft/dotnet-framework-early-access/issues/46.

If you feel the issue 12 should be fixed you can leave a comment 11 that this is also important to you. The 10 fix in .NET Core was to reuse the prime 9 number generator present in Dictionary also 8 for BinaryFormatter.

If you have so many 7 objects serialized and you do not want wait 6 40 minutes to read them back make sure that 5 you add to your App.Config this:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <!-- Use this switch to make BinaryFormatter fast with large object graphs starting with .NET 4.7.2 -->
      <AppContextSwitchOverrides value="Switch.System.Runtime.Serialization.UseNewMaxArraySize=true" />
  </runtime>
</configuration>

to enable 4 the BinaryFormatter deserialization fix 3 which did finally arrive with .NET 4.7.2. More 2 information about both issues can be found 1 here:

https://aloiskraus.wordpress.com/2017/04/23/the-definitive-serialization-performance-guide/

Score: 1

Have you thought about the fact that Int32.MaxValue 23 is 2,147,483,647 - over 2 billion.

You'd need 16GB 22 of memory just to store the pointers (assuming a 64 bit machine), let 21 alone the objects themselves. Half that 20 on a 32bit machine, though squeezing 8GB 19 of pointer data into the maximum of 3GB 18 or so usable space would be a good trick.

I 17 strongly suspect that your problem is not 16 the number of objects, but that the serialization 15 framework is going into some kind of infinite 14 loop because you have referential loops 13 in your data structure.

Consider this simple 12 class:

public class Node
{
    public string Name {get; set;}
    public IList<Node> Children {get;}
    public Node Parent {get; set;}
    ...
}

This simple class can't be serialised, because 11 the presence of the Parent property means 10 that serialisation will go into an infinite 9 loop.

Since you're already implementing 8 ISerializable, you are 75% of the way to 7 solving this - you just need to ensure you 6 remove any cycles from the object graph 5 you are storing, to store an object tree instead.

One 4 technique that is often used is to store 3 the name (or id) of a referenced object instead 2 of the actual reference, resolving the name 1 back to the object on load.

Score: 1

Depending on the structure of the data, maybe 5 you can serialize / deserialize subgraphs 4 of your large object graph? If the data 3 could be somehow partitioned, you could 2 get away with it, creating only small duplication 1 of serialized data.

Score: 0

I'm guessing... serialize less objects at 13 a time?

2 main questions:

  • what objects are they?
    • POCO?
    • DataTable?
  • what type of serialization is it?
    • xml?
      • XmlSerializer?
      • DataContractSerializer?
    • binary?
      • BinaryFormatter?
      • SoapFormatter?
    • other?
      • json?
      • bespoke?

Serialization needs 12 to have some consideration of what the data 11 volume is; for example, some serialization 10 frameworks support streaming of both the 9 objects and the serialized data, rather 8 than relying on a complete object graph 7 or temporary storage.

Another option is to 6 serialize homogeneous sets of data rather 5 than full graphs - i.e. serialize all the 4 "customers" separately the "orders"; this 3 would usually reduce volumes, at the expense 2 of having more complexity.

So: what is the 1 scenario here?

Score: 0

Sounds like you ran up against an internal 3 limitation in the framework. You could write 2 your own serialization using BinaryReader/Writer or DataContractSerializer or whatever, but 1 it's not ideal I know.

Score: 0

Dude, you have reached the end of .net!

I 7 haven't hit this limit, but here are a few 6 pointers:

  1. use [XmlIgnore] to skip some of 5 the objects - maybe you don't need to serialize 4 everything

  2. you could use the serializer manually 3 (i.e. not with attributes, but by implementing 2 Serialize() ) and partition the models into 1 more files.

Score: 0

Do you need to fetch all the data at the 7 same time? Thirteen million objects is a lot of information 6 to handle at once.

You could implement a 5 paging mechanism and fetch the data in smaller 4 chunks. And it might increase the responsiveness 3 of the application, since you wouldn't have 2 to wait for all those objects to finish 1 serializing.

Score: 0

Binary serilization of very large objects

If you run into this limitation binaryformatter the internal array cannot expand to greater than int32.maxvalue elements using BinaryFormater, help 3 yourself with this code snippet Step 1

Install 2 nuget package: Install-Package Newtonsoft.Json.Bson 1 -Version 1.0.2

using Newtonsoft.Json.Bson; //import require namesapace

//Code snippet for serialization/deserialization

public byte[] Serialize<T>(T obj)
{
    using (var memoryStream = new MemoryStream())
    {
        using (var writer = new BsonDataWriter(memoryStream))
        {
            var serializer = new JsonSerializer();
            serializer.Serialize(writer, obj);
        }
        return memoryStream.ToArray();
    }
}

public T Deserialize<T>(byte[] data)
{
    using (var memoryStream = new MemoryStream(data))
    {
        using (var reader = new BsonDataReader(memoryStream))
        {
            var serializer = new JsonSerializer();
            return serializer.Deserialize<T>(reader);
        }
    }
}

More Related questions