Poll: How to add 80 million item/warehouse records in M3?

My customer is doing data conversion in preparation for go live, of which 80 million item/warehouse records, they want to complete the upload over a week-end, but M3 is being a bottleneck, i.e. it cannot absorb the data as fast as they are feeding it to.

What would you do? I am not an expert at item/warehouse, nor at M3 performance, and from what I have seen in the past, each project handles it at their own sauce.

Synopsis

My customer has about 80,000 items to add to MMS001, and each item needs to be connected to about 1,000 warehouses in MMS002.

The result is about 80 million item/warehouse records to create.

M3 can handle gazillions of records. So this should be a piece of cake, right?

Failed tentative 1

Because the M3 API MMS200MI does not cover all the fields, my customer initially used M3 Web Services (MWS) of type M3 Display Program (MDP) to simulate a user entering data in all the desired fields.

It takes about 10ms per record, or 100 records per second. Loop 80 million times, and that would take about 800ks, which is about 222h, which is about 10 days to complete.

Yikes!!! 10 days to upload???

We need faster. We need one or two orders of magnitude faster.

Failed tentatives 2, 3, 4

I did A/B testing of various alternatives:

  • Use MWS MDP, versus use M3 API (I thought this would be the silver bullet, nope)
  • Create the item from scratch, versus create the item from an item template
  • Call the Add transaction, versus call the Copy transaction
  • Let M3 auto-create (we create the item oursleves but let M3 auto-create the item/warehouse), versus manually create (we create both ourselves).
  • Remote Desktop Connect inside the M3 BE server or database server to avoid any potential network issues
  • etc.

Either way, we only get a marginal, insignificant benefit in some cases, but the order of magnitude is equally bad.

Concurrency

Then, I decided to create the records concurrently (in parallel) rather than sequentially (one after the other).

With the help of some command line Kung fu I was able to automatically split my big CSV file into 80 smaller chunks, create 80 M3 Data Import (MDI) descriptions, and run 80 parallel threads that each upload one million records:
poll2

It turns out M3 can comfortably handle the throughput. It created many jobs in the API subsystem to process the records in parallel:
poll

Each job takes about 1% CPU. So that should max out the server to about 80% CPU utilization.

However, M3 seems to have capped well before the 80 jobs. I guess that is Amdahl’s law of maximum expected performance in parallel computing.

The result was about 1,100 records per second, which would complete in about 20h, less than a day. That is one order of magnitude improvement!!!

With some more benchmarking we could plot the curve of Amdahl’s performance and find the optimum number of jobs and have a pretty accurate estimate of the expected duration.

I also automatically generated a cleanup script that deletes everything, and I noticed the delete takes four times longer than the add.

Reality check

By the time I had come up with my results, the customer had completely changed strategy and decided to upload the records directly with SQL INSERT. [Cough] [Cringe] <your reaction here>. They consulted with some folks at Infor, and apparently it is OK as long as the data has been pre-validated by API or by MWS MDP, and as long as M3 is shut down to avoid collisions with the M3 SmartCache. It is important to get clearance from Infor as SQL INSERT will void your Infor Support warranty.

Future work

I have also heard of the Item Data Interface (IDI) and the Warehouse Interface (WHI). I do not know if those could help. To be explored.

Poll

What would you do? How have you uploaded millions of records at once? Leave me a comment below.

Disclaimer

Do not take these results as a reference. I was just providing some help to my customer and to the colleagues that are handling this.

I am not a reference for item/warehouse in M3, and I am not a reference for M3 performance, there are special teams dedicated to each.

Also, these results are specific to this customer at this time. Results will vary depending on many factors like CPU, memory, disk, tweaks, etc.

Published by

thibaudatwork

M3 Technical Consultant

24 thoughts on “Poll: How to add 80 million item/warehouse records in M3?”

  1. Hello,

    I think that SQL is your only choice.
    But have you test the RPS999 and CRS999 ?
    With 80 000 000 Items in MMS002 , they wil probablyl take many days to run …

    Best Regards

    Like

  2. Hi Thibauld, SQL is not a problem for initial data load. I agree that you can’t use it in production. Use an ETL tool and built your validation rules into the “T” job. When you are an experienced developer who knows M3, it is no problem to get 100% data quality. Of course this can not be used for interfaces. I did this job many times for Movex RPG and M3. Each of this jobs produced 100% customer satisfaction.
    In addition, you could consider a delta load – starting earlier and then only process additions and changes.
    /Heiko

    Like

    1. Thank you Heiko. To my surprise, I am hearing more and more reports of projects doing just that, SQL INSERT for the initial load – and everyone agrees that it must not be used after that in production. But then, why is it not an openly approved practice? I mean Infor has all these warning signs against SQL INSERT. Why don’t they have a footnote that says “except for the initial load which is OK”?

      Like

  3. For massive loads of master data, SQL is very common. As long as you have pre-validated the relationships and have sign off on the process it’s fairly safe as long as m3 is shut down. For some areas, the official loading method is still SQL (e.g. role based security even though Infor has web service templates to load this).

    For item warehouse records, I’d probably run a full MRP rebuild after loading these to deal with the ancillary records that m3 creates around MRP.

    Like

    1. Thank you AL. How do we get a sign off from Infor? Usually it’s a couple of guys down the corridor that say “yeah it’s fine go ahead”, but they won’t put their signature on a document. I wish there was less dichotomy in their message.

      And yes, the whole role-based setup by SQL INSERT is part of the confusion, because that’s often done after go live when SQL INSERT isn’t supposed to happen anymore.

      Noted about the MRP rebuild.

      Like

      1. The sign-off is the hard one. During an implementation project it’s easier as you can get it minuted during project management meetings that you’re going to do SQL inserts and the absence of dissent is effective agreement. Outside that situation it’s much harder as the consultants have been, historically at least, told they are not allowed to recommend this.

        Like

  4. Hey Thibaud,

    As everyone else is saying, I would have probably used an SQL at that point, since MDI wouldn’t be able to handle that much data in a short time. I still liked your idea of running parrallel MDI. I understand SQL isn’t the safest and cleanest way to do this kind of thing, but when efficiency and speed is key, it is still the best.

    As for the Role Based security, since this is something I deal with regularly, MNS410MI is working quite fine for thousands of records, I haven’t tried million of them. Beware when using SQL to add anything is CMNPUS/CMNPSU because you will most likely need to run a SES990 after that to get SES401 right, and that will put SES900 to work for a looooong time.

    Like

    1. Thank you Colin. I don’t think MDI is the bottleneck, because the MDI client is mostly waiting for M3 to respond. Actually I don’t know where the bottleneck is. 80 million records with about 40 fields each, of about 10 characters each, converted to double byte for Unicode along the way, that’s 64 Gbytes to upload. The servers have about 16 Gb of RAM. The network is Gbps. I think the disks are solid state drives. None of that should be a 20h bottleneck. I don’t think the bottleneck is hardware. I think one of the pieces of software somewhere is the bottleneck. It could be any of JVM, M3 Foundation, MMS002.java, SQL Server, JDBC, Windows, etc.

      Like

  5. Hi Thibaud,
    another option to pursue would be to setup Template Item(s) in MMS001 for each Item Type and then make sure they exist in all the Warehouses needed in MMS002.
    Then you can use API MMS200MI / AddItmViaItmTyp to add Item 1 time, and it will automatically add it to all the 80 Warehouses.
    After this you use MMS200MI / UpdItmBasic to update any fields that you want to be different from the Item Template defaults as per Item Type.
    Not sure how much faster this would be, but since M3 is just adding all the MMS002 / MMS003 records for you, should be a lot faster. 🙂
    Regards
    Rene

    Like

    1. Thank you René. Actually I think that was the first strategy they used, to let M3 auto-create the item/warehouse records via the Item Type. There was not a noticeable difference, at least not a tenfold difference.

      Like

  6. We utilize SQL insert statements on SQL Server. We received a tool from iStone to assist moving data into our DEV, TST, STG environments. When we converted from 10.1 to 13.2 we also used this tool to create our new production environment. There are a few things to consider initially. Identify all of your non-clustered indexes. In M3 there are a lot. Turn them off on the target system. I would also be a good idea to disable CDC on all tables AND on the database. CDC is Change Data Capture. Anything that is a DML logging (insert/updates/deletes) will create havoc in the transaction logs. For instance, take OCUSMA… SQL server writes to the tran log, SQL server “eventually” flushes to disk, CDC reader reads the transaction logs, and then CDC writes the record to a special table. This all assumes that you have a data warehouse somewhere. If you do have CDC enable it is pretty much 1 insert to the table, 1 insert to the CDC tables (net result is 2x the inserts). Nobody has time for that! You could also flip DB to SIMPLE recover mode or BULK logged as this will further reduce the IO. Once the load is done in your new environment, shrink the TLOG, change the DB back to FULL recovery mode, enable CDC on the database and then enable the tables for CDC.

    Like

  7. UPDATE: My customer said with SQL INSERT they were able to add 20 million records in 15mn, that is yet another order of magnitude improvement; and they also said the servers on the Infor Cloud are faster than their own servers on premise.

    Like

    1. Oh that’s the command. Thank you. I had tried a similar command with SQLite and was very impressed with the speed, I was intrigued even, by how instantaneous it seemed relative to the previous attempts.

      If anyone knows what causes the 20h bottleneck, please let me know.

      Like

  8. UPDATE: My colleague Miquel added the following:

    “To upload data in IBM iSeries, we disconnected the database journals. In Microsoft SQL Server, there are database log files which is the same, and one has to go carefully and shrink the database to recover empty disk space.

    We could explore the option that with data upload with SQL, M3 BE has to be turned off with the CDC disabled, and then enable it again to be able to start M3.”

    Like

  9. UPDATE: I found this about the Item Data Interface:

    Infor M3 Documentation Infocenter > M3 Key Process Documentation – Supply Chain Execution > Item Management Interface > Manage Items in the Item Data Interface
    https://docs.infor.com/help_m3kpd_15.1.2/index.jsp?topic=%2Fcom.lawson.help.scexechs-uwa%2Fc100306.html
    “This document explains how you manage items in the item data interface. The item data interface enables you to do the following: Receive large amount of structured or unstructured item-related data […]”

    Like

    1. I’ve used this on and off over the years. It’s the best way for creating fashion items from memory, but it’s not necessarily any more efficient that the other APIs. About 10 years ago I used it to create 200k fashion SKUs during a data conversion. My recollection is it took about a day to run.

      Liked by 1 person

  10. Hi Thibaud
    in case of massive post go lives updates, I also use multi threaded API calls.
    As already stated SQL is the best option in terms of performance, but I would personally not risk it in a PRD environment.

    Like

    1. Alain, I agree about avoiding SQL INSERT in PRD. Do you do multithreaded API using the MvxApiPool or custom code? And do you have performance metrics of single threaded, versus multithreaded, versus SQL INSERT? Thank you.

      Like

  11. Hi Thibau,

    I apologize for adding an unrelated question on this thread, but I wasn’t sure of the best way to contact you.

    We are not live yet, but are trying to figure out the best way to create a large number of facilities and warehouses for our initial data import. After go-live, we will still have a need to create new facilities and warehouses more often than most companies. The reason for this is that we have ~100 temporary storage yards that change frequently, and our setup requires that each yard be represented by a facility/warehouse.

    Like

    1. Hi Chris. I don’t know the specifics of facilities and warehouses; for me it’s all bytes moving from A to B. All I know is what has been discussed in this thread and throughout the comments, mostly all the comments. SQL INSERT is the fastest but the most risky, and in your case post go live will likely not be possible. Did you run tests and do you have any metrics? /Thibaud

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s