There is a paradigm in technical/scientific computing in mining. It is so fundamental that it is rarely questioned even though the industry spends large sums of money trying to deal with the consequences of that paradigm.
This post asks the questions: What if technology is now sufficient to turn this paradigm on its head? What are the consequences? How would such a change impact on the design of enterprise solutions involving technical/scientific algorithms and solutions?
First I better name the paradigm.
“Technical/scientific software works best when the application and its data is on a PC and is slow or unusable if you run the same application over a network.”
This “truth” has required most of mining software providers to write applications for the PC.
This approach has lead to a proliferation of proprietary data formats as each vendor optimises the performance of their application on the PC.
The big problems arise when you want to collaborate within teams, and across disciplines and/or provide governance over the technical data. This has created a fundamental tension in design of enterprise systems for technical applications. The technical applications tend to want the data on the PC and push the data to the edge of the network while the collaboration/governance demand tend to pull the data towards the centre. A number of companies have developed solutions to manage this tension for example Datamine’s Summit or MineRP’s SpacialDB.
At GlassTerra we have been investigating technologies that will allow companies to deal with big geospatial data on the cloud. Our stated stretch target was to handle Petabyte scale data through a low bandwidth browser interface. Recently our experiments have led us to a direction that now challenges the fundamental truth of the above mentioned mining’s technical computing paradigm. We now believe that technology now exists to make the following statement potentially true.
“Technical/scientific software works best when the application and its data is centralised. PCs software typically is at least 1000 times slower than this alternative.”
When we started GlassTerra early last year, the first technical challenge we took on was the delivery of geospatial data (meshes, voxels, point clouds etc) to a browser needing only a low bandwidth connection. Last year we achieved this milestone. We did this using standard graphics packages that had their origins from PC software. We got it working and it was usable but the user experience was not as good as if the same software and its data was installed on a PC. Nonetheless we were good enough to explore a wide array of applications around publishing and collaboration on geospatial data that were unexplored because data was tethered to the PC.
The technical challenge then in front of us was that the data sets we could handle were similar to those for PCs. We were being challenged by ever larger datasets and we wanted to find a way of making the performance as good as PCs. We set our challenge as wanting to be able to work with Petabyte scale geospatial data with a user experience similar to working with PC based applications working with Gigabyte scale data sets.
We explored a number of different methods in order to achieve this goal but finally settled on parallel computing exploiting the scalable computing infrastructure inherent in AWS, Azure or Softlayer. (You may sometimes see this approach called High Performance Computing or shortened to HPC).
In February 2016, we got our first lab experiments working on mining data and the results rocked me back on my heels. We took a gold model and ran grade shells on it. The parallel computing techniques were more than 1000 times faster than standard PC software.
I was shocked. This result challenged my over 30 years’ of experience in mining software. The paradigm of the network is too slow for technical computing is really entrenched. We had to test this at bigger scale. Fortunately, a large mining company heard about our results and wanted to see if we could get it working at scale. We engaged with the mining company and took on the challenge.
The results of our experiment are that we can now work at Terabyte scale data sets containing mixed geospatial data (data sets of a size a PC can’t handle) and run simple queries on the data in fractions of a second. We think that most algorithms used by mining software could be rewritten to operate using these techniques and obtain similar speed improvements. If so, not only almost any of the current applications being sold to mining companies could work with datasets orders of magnitude larger than they currently can handle but also with faster response time.
Here are my thoughts on some of the implications.
- This will be strong incentive to go for a standard data format. Geospatial data with all the attributes needed for mining applications can be hosted on cloud platforms and most scientific functions used by the various mining packages can be made to work faster. Potentially over 1000 times faster. Data structures optimised for specific use case or task would not be a valid excuse.
- Technical software will no longer be limited by data set size. It would be possible for geological modelling, mine design, mineral processing etc to work at regional scale datasets and incorporate big sensor datasets such as lidar, hyperspectral etc.
- Because the database is structured for scalable parallel processing it is also structured for distributed storage. The data can be stored in multiple locations but have global query capability. Mines with limited bandwidth therefore can store large datasets collected at the mine on local servers while the corporation can centralise, and even manage, that data where that adds value. Technical professionals could operate on the combined datasets regardless of where they are located.
A demo of our above mentioned technology will be available on our website soon, so please come back to our website later to check it out if you are interested in this technology. In the meantime, I would like to hear about your thoughts in this area.
Till next time,