My customer is doing data conversion in preparation for go live, of which 80 million item/warehouse records, they want to complete the upload over a week-end, but M3 is being a bottleneck, i.e. it cannot absorb the data as fast as they are feeding it to.
What would you do? I am not an expert at item/warehouse, nor at M3 performance, and from what I have seen in the past, each project handles it at their own sauce.
My customer has about 80,000 items to add to MMS001, and each item needs to be connected to about 1,000 warehouses in MMS002.
The result is about 80 million item/warehouse records to create.
M3 can handle gazillions of records. So this should be a piece of cake, right?
Failed tentative 1
Because the M3 API MMS200MI does not cover all the fields, my customer initially used M3 Web Services (MWS) of type M3 Display Program (MDP) to simulate a user entering data in all the desired fields.
It takes about 10ms per record, or 100 records per second. Loop 80 million times, and that would take about 800ks, which is about 222h, which is about 10 days to complete.
Yikes!!! 10 days to upload???
We need faster. We need one or two orders of magnitude faster.
Failed tentatives 2, 3, 4
I did A/B testing of various alternatives:
- Use MWS MDP, versus use M3 API (I thought this would be the silver bullet, nope)
- Create the item from scratch, versus create the item from an item template
- Call the Add transaction, versus call the Copy transaction
- Let M3 auto-create (we create the item oursleves but let M3 auto-create the item/warehouse), versus manually create (we create both ourselves).
- Remote Desktop Connect inside the M3 BE server or database server to avoid any potential network issues
Either way, we only get a marginal, insignificant benefit in some cases, but the order of magnitude is equally bad.
Then, I decided to create the records concurrently (in parallel) rather than sequentially (one after the other).
With the help of some command line Kung fu I was able to automatically split my big CSV file into 80 smaller chunks, create 80 M3 Data Import (MDI) descriptions, and run 80 parallel threads that each upload one million records:
Each job takes about 1% CPU. So that should max out the server to about 80% CPU utilization.
However, M3 seems to have capped well before the 80 jobs. I guess that is Amdahl’s law of maximum expected performance in parallel computing.
The result was about 1,100 records per second, which would complete in about 20h, less than a day. That is one order of magnitude improvement!!!
With some more benchmarking we could plot the curve of Amdahl’s performance and find the optimum number of jobs and have a pretty accurate estimate of the expected duration.
I also automatically generated a cleanup script that deletes everything, and I noticed the delete takes four times longer than the add.
By the time I had come up with my results, the customer had completely changed strategy and decided to upload the records directly with SQL INSERT. [Cough] [Cringe] <your reaction here>. They consulted with some folks at Infor, and apparently it is OK as long as the data has been pre-validated by API or by MWS MDP, and as long as M3 is shut down to avoid collisions with the M3 SmartCache. It is important to get clearance from Infor as SQL INSERT will void your Infor Support warranty.
I have also heard of the Item Data Interface (IDI) and the Warehouse Interface (WHI). I do not know if those could help. To be explored.
What would you do? How have you uploaded millions of records at once? Leave me a comment below.
Do not take these results as a reference. I was just providing some help to my customer and to the colleagues that are handling this.
I am not a reference for item/warehouse in M3, and I am not a reference for M3 performance, there are special teams dedicated to each.
Also, these results are specific to this customer at this time. Results will vary depending on many factors like CPU, memory, disk, tweaks, etc.