Modern Data Platform Architecture - Part 1
Part 1 of our data platform architecture series.
How do organizations make sense of the explosion of internally and externally generated data? Over the past 20+ years, data warehouses have been the mainstay, however, given the evolution of cloud and integration architectures (e.g. microservices) as well as the proliferation of streaming data sources and devices, a traditional modeled data warehouse is no longer sufficient.
Digital workplaces in the age of the Coronavirus
The coronavirus has essentially forced everyone, almost globally to work from home. Project-focused workforces are well versed in remote and digital collaboration but those in BAU roles are less adept in this area. With a multitude of tools and software companies will need help transitioning work practices, building online ways of working, strengthening networks and cloud applications and optimising software license spend.
2020 Wakeup Call, challenges for local government
Severe bushfires and the global impacts of the corona virus have derailed what was hoped to be a year of recovery. This has given us cause to reflect on what the biggest issues facing our generation are and roles local governments need to play in order for us to survive the major disruptive trends and environmental challenges of the next few decades.
What’s next for non-bank lending
With banks realing from a raft of post-royal comission compliance and remendation issues, non-bank lending has been booming, growing roughly 15% p.a. for the past 3 years. Non-bank issued RMBS’ has double as compared to the prior 2 year period. But what are the biggest challenges non-bank lendors face and how will they consolidate the small beach-head they have grabbed from the big banks in the residential mortgage market.
Transaction Categorisation, what about Product Categorisation?
Australian financial services has been obsessed with spend categorisation, but are they ready for the next wave of data that wil be heading their way. Product level data is the holy-grail of behavioural analytics but taiming that data set is harder than you think. This is definitely a high value banking AI use case.
Unifying data management
Walking into any organisation, the one thing we’ve found that all companies have in common is that, various aspects of the data management competency are claimed by different people with very little connectivity between them. In other words, fragmented ownership of a poorly defined area of responsibility.
Cognitivo has developed a unified approach to data management that incorporates privacy, information security, records management and data quality into a single Risk Management Framework aligned to ISO 31000 the Risk Management standard.
Innovation in local government
There’s been a flurry of investment in digital and data-driven transformation in Australian local government. This has been spurred on by Malcolm Turnbull’s 2016 Smart Cities Plan. However how much real innovation have you seen across Australian local councils? The opportunities, the technology both exist. The mindset doesn’t
You must deal with re-identification risk before sharing data but your Privacy Impact Assessments are inadequate
The digital economy is characterised by data-driven ecosystems. Organisations have recently started to experiment and engage in data commercialisation and data sharing initiatives in order innovate with data. However, few have been able to assess the the risk of re-identification and some organisations have been caught out. Do not sleep well at night, Your current PIA’s will not save you!
Data-driven, but first we must tackle the Enterprise Data Quality challenge
Competing effectively in the digital age means being data-driven to make the right long term and short term decisions. However the quality of your decisions will be proportional to the quality of your facts. Data quality is the critical stable foundation for your organisation to transition to a data-driven and AI enabled organisation.
2019, Year in Review and Looking into 2020
Financial Disintermediation - Reducing asymmetries between consumers and lenders
The World Economic Forum discussed the topic of financial disintermediation in 2015, then again in 2016 and in 2018, the forum declared that “blockchain can no longer be ignored”. The best starting point for understanding why financial disintermediation is such a talked about topic and what this has to do with blockchain is to get a clear definition of what it is.
What is financial disintermediation?
As the word implies, disintermediation occurs when parties with excess funds (let’s call them ‘surplus entities’) directly transact with parties which are in need of funds (let’s call them ‘deficit entities’), rather than using an intermediary (i.e. a financial institution) to facilitate this process. Traditionally, financial institutions were important because they were able to diversify the risk of providing funds to deficit entities. In a world of disintermediated (or direct) transactions, it was hard for surplus entities to diversify as every time they had surplus funds and wanted to earn interest by providing funding to a deficit entity, they would have high search and research costs. Of course, surplus entities are also limited by the volume of funds they have. One deficit entity’s project may require all of the available funds of a surplus entity, or the surplus entity, on their own, may not have enough funds to get the deficit entity’s project off the ground. Thus, it is not hard to understand why financial intermediaries developed. These intermediaries could pool surplus entities’ funds (offering them a fixed interest rate on their deposit) and become experts in evaluating deficit entities’ funding needs. They could therefore diversify the risk of the entire portfolio of many surplus entities’ investments in all the deficit entities’ projects.
Many of you are probably thinking, well, wait a minute, when companies do IPOs, they directly acquire funds from surplus entities. This has been occurring for ages, so this doesn’t explain why disintermediation is suddenly a hot topic. You are correct. The relative share of funding provided by the capital markets is generally considered to be a measure of how disintermediated the financial system in a geographic region is. Nonetheless, the systems which were developed to maintain records of who owns which small chunks of what assets used to be complex enough that this was only worth doing with big, valuable assets. Blockchains and distributed ledger technologies are now making this so simple that individual borrowers looking to buy personal real estate can ‘crowdfund’ their mortgage via a platform with a pool of capital providers, who are all willing to fund a small chunk of the mortgage. The platform then keeps track of interest payments and apportioning the loan repayment to the respective lenders.
what does this really have to do with blockchain?
In fact, these kinds of peer-driven lending platforms which have sprung up as part of the disintermediation trend in recent years don’t even need to be built on blockchain technology. This can be done using other, modern technologies. There are also regulatory and structural drivers which have led companies to launch these sorts of platforms. The reason the regulators of major financial markets are particularly interested in blockchain is its potential to create a decentralised guarantee of trustworthiness.
It used to be the case that the reputation of a bank was a guarantee of trustworthiness. Nowadays, it is common knowledge that even the largest banks are at risk of failing in a major financial crisis. The complex interdependencies in risk exposures between banks have led to various measures to evaluate the stability of banks. The most important of which is the Basel committee’s methodology which is used to identify ‘global systemically important banks’ (G-SIBs) and to require those identified as such to meet additional loss absorbency requirements. Nonetheless, this evaluation process still does not address the core of the issue of banks being systematically inter-connected in a complex and opaque web of exposures. In reality, it is still impossible to say where the next financial crisis will emanate from and what consequences it will have.
To be fair, although the recent headlines about Australian banks have done nothing to inspire confidence in the reputation of our banks, none of our institutions are anywhere near being G-SIBs and Australia is highly unlikely to be the epicentre of the next global financial crisis. Still, there are reasons why having a decentralised, non-corruptible record of trustworthiness and creditworthiness might be useful. For example, the Australian government has announced the Australian Business Securitisation Fund to provide additional funding to smaller banks and non-bank lenders with the ultimate goal of encouraging lending to small businesses. This is because small businesses are an important part of the Australian economy – but they are often subject to neglect when it comes to lending as they are not necessarily the most attractive debtors for major banks to have on their books.
If a small business could reliably communicate its creditworthiness to a pool of potentially interested private lenders – without having to actually expose all of its financial information to every Tom, Dick and Harry who would be an interested lender – this could help the small business to get access to credit on a more flexible basis. It would also open up a whole new world of investment opportunities for private investors, as some of the peer-to-peer lending platforms have started to do, particularly in the mortgage space. This is all possible with a clever blockchain implementation. It’s use cases like these that threaten to make banks obsolete and which will change the game for regulators of the financial system. It is both daunting and exciting in terms of the potential it holds for a whole new world of finance.
“Blockchain threatens to make banks obsolete and to change the game for regulators of the financial system. It is both daunting and exciting in terms of the potential it holds for a whole new world of finance.”
Our view (and hope) is that the difference between the emerging business models and how we have conducted finance fore the past few hundred years will be that the information asymmetry that has existed in favour of financial institutions will be mitigated. It is currently still the case that banks ask anything they want to know about you - but you don’t know what they know about you or what they think of you. Financial disintermediation and blockchain technology holds the power to give consumers the reigns.
Blog by:
Dr. Christina Kleinau & Alan Hsiao
CDAO Conference Melbourne 2019 – Data Commercialization Panel
The recent Cognitivo-sponsored CDAO Melbourne 2019 conference has come to an end. We had a great time meeting the speakers and vendors and engaging in deep conversations about data, privacy and emerging AI technologies.
We also had the privilege of hosting a distinguished panel with our partners and clients leading a discussion around data commercialization and privacy. Alan Hsiao, our Managing Director, was on hand to lead the conversation.
The Panelists
Jade Clark – Director of Data Partnerships at Westpac and a passionate explorer of ways data can be used to solve societal problems.
Paul Weingarth – Co-founder of Slyp, a data-driven e-Receipting fintech which just closed a funding round with three of the four major Australian banks.
Toby Johnston – a veteran Chief Data officer working at organizations like Commonwealth Bank, ASB Bank and currently Optus.
Paul Tyler – a long time research engineer and expert in all things privacy who currently serves as Data Privacy Lead at CSIRO’s Data61
Benjamin Szymkow – CTO at Cognitivo and Country Lead for OpenMined, an expert in current and emerging data privacy-preserving techniques
The Conversation
1) Is there a level playing field between digital innovators, such as Google, and more traditional companies which provide vital services, such as banks and telcos, in the way they can exploit data to enhance operations and generate new revenue streams?
It was discussed that there are industry specific restrictions that the likes of Google, Amazon etc. are not subject to, for example the Telecommunications Act. These laws do create restrictions in addition to other legislative obligations, such as privacy, which all companies are subject to. It was noted that the challenge for firms providing essential services is to understand the specific data use cases that could be done within these constraints, among the very large spectrum of possible cases, and how that use case might be valuable to the customer and the business.
An example was the potential to use data for population movement trends, which is not about individuals and their data. Using population data to determine where to put more cell towers, thereby providing better customer experience and coverage, can also help to optimise company investment decisions. This use case is an example of finding an opportunity that hits the sweet spot for benefiting both customers and companies.
The panel noted that changing customer expectations and the potential for legislation changes to reflect various types of customer data rights is likely to see companies needing to receive express permission from customers to use data in certain ways in the future, which will level the playing field somewhat. It will no longer be enough for companies to simply disclose the general nature of how they use data, they will be restricted to certain approvals received.
2) Given that we can never 100% guarantee that any data we hold is safe, what level of information security can we expect from data-driven organisations?
“The application of tools for assessing and quantifying privacy risk in data is vital in the modern era. We not only need to worry about security in terms of databases being breached or leaked in some way, but aggregated and de-identified data and even AI models themselves could leak data. Understanding and managing this risk is something CSIRO’s Data61 has been researching and the organisation is continuing to drive research in this field.”
- Paul Tyler
3) What are the latest technological developments which are helping organisations in their quest to exploit data commercialisation opportunities, without unleashing significant new sources of risk?
“Working with OpenMined, we focus on three of the most promising privacy-preserving techniques - Differential privacy, federated learning and secure multi-party compute.
Differential privacy is a technique that provides a mathematical guarantee of individual privacy. It means that we can perform analytical activities over a private data set, learn aggregate behaviours of individuals whilst maintaining ther privacy.
Federated learning is a technique which flips the need to move all data into a single location before we perfom any compute heavy training. Instead we bring our model to the data, training it on mobile phones, websites and edge devices without individuals’ private data ever leaving.
Secure multi-party computation is a technique that moves the cryptographic burden from compute to the network. It is a protocol that enables multiple organizations to collaborate with their own private data sets, supporting their own data governance models. It means organizations can keep their inputs to joint computation efforts private.
These techniques can be applied in combination to develop enhanced analytical insights and models, without increasing the level of data risk.”
- Benjamin Szymkow
The debate continued to build on the idea that value creation and risk management go hand in hand. Provoked by the argument that “A silo-ed organisation is a non-competitive organisation – data is a driver of both value and risk”, Paul Weingarth and Jade Clark reminded the audience that an organisation is nothing without customer trust. Slyp and Westpac both see the opportunities of data commercialisation with a focus on value for the customer. This means activities need to be based on a foundation of consumer consent and transparent frameworks focused on allowable and non-allowable behaviours.
Around the conference, the topic of GDPR and how this applies to Australian companies was frequently discussed. The issue of whether AI models can be developed in a way which overcomes human biases – or if the biases inherent to the datasets resulting from human decision making condemn the algorithms to only be as good as the people who train them – was also a topic of debate.
Keynote speakers emphasised the cost saving and value creation potential of cutting-edge data strategy in the public service (e.g. at the Australian Taxation Office), in sports (e.g. the data-driven transformation of the Formula1 brand), for non-profit organisations (e.g. how the Heart Foundation is improving heart attack survivors’ information and motivation to act on it) and in business (e.g. how carsales.com is increasing its level of service, whilst reducing the cost).
We look forward to participating in the conference again in future and of course, to continuing to help our clients tackle these problems in their day-to-day business.
The Ethics of AI - Technologies change, ethics stay the same
The concept of ethics has existed for about as long as humans have been humans. Although nowadays, many people are disinclined to become entangled in discussions of ethics, whenever a new discipline or technology emerges, the question as to what the ethics of that technology are will inevitably be asked. Such is the case with the ethics of AI [1].
The current consensus around the ethics of AI is that we can build on the four basic principles of bioethics. These are beneficence, non-maleficence, autonomy, and justice. There is also a general tenor that we should augment that list with a fifth principle about ‘explicability’ [2]. Realistically, this was probably always an implicit component of the application of bioethics. Nevertheless, just as when a new CEO comes in, they are obliged to change something about the company, so too when a new technology comes along, we feel compelled to try to re-define the multiple thousands of years old discipline of ethics which has guided us thus far.
To be fair, the application of ethics in any new discipline is a topic which should be discussed and re-discussed. Humans have never agreed on a single, clear definition of what ‘ethics’ is - but we have still been generally pretty good at agreeing on what we definitely do not want to happen. Examples of what not to do in terms of ethics of AI include:
Not disclosing sensitive information
Not creating opaque applications, whereby the users and the creators don’t exactly understand what they do
Not using AI to enhance activities we generally consider to be unethical, such as stealing
Nonetheless, the core problem of ethics has always been bridging the gap between ‘knowing’ and ‘doing’. We can agree on the above examples. The general tenet about what is wrong and right is something that humans have an innate sense for – even if many different words and concepts can be used to describe that.
The question is - How do we ensure that the innate ethical ideals we have been following for centuries are implemented in practice for AI?
The first thing is to ensure that discussions on ethics are held, even if it is difficult to agree on specific terminology.
The second thing is to use those discussions to ask pragmatic questions about the concrete applications of AI, rather than trying to put labels on lofty ideals.
Aristotle would want you to ask yourself – Is this action consistent with what I consider to be virtuous behaviour? Use whatever labels which come to mind when you ask yourself that question (honest, integral, just, whatever). Other versions of this question include – Would you still do this action if you had to explain it to your mother or your daughter tomorrow? Would you do it if it were on the front page of tomorrow’s newspaper?
Immanuel Kant, the philosopher who defined deontological (read: rules-based) ethics, would want you to ask – Is this action universalizable? That is, if everyone in the world decided to do this action tomorrow, would that be logically possible? Taking an example from the finance world - momentum trading, it would actually fail this test. Momentum trading lives off the assumption that other market participants have identified fundamental information and are trading on that basis. If everyone only conducted momentum trading, so noone is actually conducting fundamental research, this assumption would not hold. Thus, momentum trading is unethical. Obviously, no one is physically harmed by momentum trading, but financial markets are more prone to bubbles (boom and bust cycles) because of it. There is a general consensus that bubbles are bad because they mislead the productivity of the real economy. This question about the universalizability of an action is likely to be helpful in many AI applications. Often, people will not be directly harmed by AI - but if you have a general feeling of unease about a certain use case, this may be your problem.
Utilitarian ethicists would ask - Does the sum of the benefit of the action outweigh the sum of the negative consequences of the action? This utilitarian idea underpins modern economic theory (yes, economic theory is a practical derivative of ethical theory) and it works quite well in an economic context. Monetary gains and losses can be neatly summed and negated. It becomes more challenging when the benefits and drawbacks leave the economic domain and enter e.g. the social and environmental domain. Still, the main thing is to discuss the list of pros and cons and take a decision you feel comfortable with, on balance.
John Rawls did subsequently invent what is known as contractualism, a school of ethics which focuses on the idea of the social contract and asks – Does conducting this action generate the greatest possible benefit for the person in society who is the worst off? This idea recognises that justice can’t mean that everyone is entitled to the exact same life circumstances. Instead, we have to somehow ensure that the people worst off in society would still consent to the social contract of that society. In terms of the classic question as to whether an autonomous car should run over the elderly person or the baby, we could say the person who dies is the worst off. Arguably, a person would prefer to be run over as an elderly person than as a baby. It’s a tough call to make – but ethics has always been about making decisions in tough situations.
Further reading & references:
[1] This website collates existing attempts to define ethics of AI: https://algorithmwatch.org/en/project/ai-ethics-guidelines-global-inventory/
[2] This publication summarizes the five principles of the ethics of AI and makes recommendations: https://link.springer.com/article/10.1007/s11023-018-9482-5
Blog by:
Strategy meets design thinking
Connecting traditional funding models to modern ways of working
Top down strategy setting and budgeting (Command and Control)
Traditional strategy and organisational management approaches are focused on creating a big picture strategy to maximise the allocation and impact of a finite set resources (capital, people, time etc.). Top down, centralised planning, with well communicated plans to be executed through rigid hierarchical structures.
Centralised strategy & planning approaches are great for breaking down and coordinating work efforts across large organisations. It suits well when delivering outcomes that require little customer input, but is weak when there is limited information or variability in user preferences. (e.g. Employees don’t really have a choice when it comes to what systems they use, customers can walk away.)
Key issues
Big gap between planners and users / customers.
Budgets go to the most influential executives.
Good ideas generated by front-line staff rarely make it up the chain to be prioritised / funded.
Periodic strategic planning means long lead time between budget allocation and execution (lacks nimbleness)
Divides up budgets and resources based on organisational structures rather than customer outcomes
Progress often judged by spend rate rather than earned value (since there is little to show until right at the end of the project).
Good at continuous / incremental change but can only do things where the outcome is well known.
Competing functional priorities and silos can produce disjointed customer experiences.
The real problem
Imagine the following scenario: “I have a great idea that our customers will love…Great, what’s the cost, timeframe and return?”
In a traditional structured business environment, significant effort would be spent justifying, obtaining organisational buy-in and estimating the potential costs and benefits of any particular initiative…all before there is any real understanding whether the idea would be desirable to customers.
Bridging the human-computer gap
Enterprises are embracing digital today, are you being left behind?
Accessibility of enterprise data exposes the challenges for value generation
Accessibility of new technologies means significant upsides in both customer experience and productivity, however Data is a major stumbling block
New technologies within the fields of advanced data analytics, cognitive computing / artificial intelligence and visual recognition will create innovative commercial opportunities for firms that can bind those capabilities with their existing strengths. With the increasing availability of cloud applications, these new capabilities are no longer exclusively available to the largest or technically advanced companies.
Against the backdrop of “being digital” businesses are increasingly seeking commercial applications of these technologies in order to enhance their customers’ experience as well as to reduce costs through intelligent automation.
How enterprises are embracing digital
Improving customer experience
Help find relevant products / services within companies or in the market.
Help manage finances, budget, household bills and find the cheapest utility providers based on predicted usage.
Help identify fraud by reconciling my receipts to my credit card statement.
Cost Saving Propositions through automation
Data Entry & form processing
Matching invoices, financial reconciliation
Legal contract review and compliance reporting
Where to from here?
We are now in a world where the crucial barrier to leveraging these new technologies isn’t necessarily the upfront cost or complexity of managing the software / hardware rather the ability to gain mastery of the underlying data that is required to drive these applications. Something you cannot buy off-the-shelf.
Based on where we are today, to achieve aspirations of automating key customer facing and operational tasks, we need a bridge between the digital and dynamic human world.
Everyday new documents and spreadsheets are being created, businesses have different invoice format, every department has a different spreadsheets and databases. In addition, within most organisations today, significant amounts of data still exists in physical form (printers and filing cabinets are still critical pieces of office furniture). Even when documents are in digital form (word documents), they are not always machine readable (e.g. a document scanned as a PDF).
So far most companies have only been able to digitize forms and computerize processing of very repeatable tasks. It takes time to create new screens in applications and database fields, this process is unable to keep up with the pace of business change.
As data / information management and digitization capabilities evolve the degree of automation available in dynamic commercial environments will also increase.
Getting this right will bring:
More agile software applications due to more flexible data structures.
The ability to develop deeper customer or organizational insights by tapping into unstructured data and linking it to structured data.
More real-time / on demand insights (dynamically tapping into new data sources such as IOT sources).
Ultimately lower barriers and incremental costs of future automation.
Consider the example of automating the process of invoice matching for a finance department; instead of defining within an OCR tool which fields must be captured, we simply feed it multiple versions of the an invoice from a certain supplier and it can figure separate out the data fields vs the data itself. Additionally applications can further work out which general ledger account the value of that invoice needs to be entered into based on what’s happened to similar invoices in the past.
In the near term, manual intervention will be required to curate and correct automated actions, however the burden of design and preparation for automation is dramatically lowered through machine learning techniques.
To push the extent of human tasks we can automate, we will need to master a number of new techniques and change a few old ones in the way we manage data.
As data volume grows within organisations in both volume, variety and velocity; using and managing data within organisations will be more like using and managing data on the internet. Don’t seek to sort of control it, think about harnessing what's relevant for you.
This means:
Access to organisational data needs to be opened up, data silos inhibit processing of tasks which span multiple business units.
We will need more flexible database structures (and physical places to store data) that business users are able to design, use and control.
More collaborative ways of managing data (definitions and descriptions) that is the responsibility of business users. (As opposed to centralized classification schemes maintained by technical experts.)
We will need the ability for machines to reliably infer meaning from natural language text (i.e. summarizing data, think of automatically adding hashtags)
More accurate OCR (optical character recognition) capabilities that includes the ability to translate hand writing to machine readable data.
We need to employ AI / machine learning techniques that can automatically classify data and cross reference data between documents and structured data sources.
Hey I'm not reading anything here that I don't already know!
Some of you are thinking that there's nothing new here and that is exactly my point. Most of the software is available today.
We use to have to go out and buy expensive scanning solutions or document management systems, now it's available in the cloud. If there's functionality you need, there's likely an API you can call to obtain it or there will be one available shortly.
The data only the other hand, is created by people. Often inconsistently, with differing purposes and meanings over time.
Over the last 30-40 years, we've only been trying to force people to conform to processes that machines understand. We are now at a time where we can start bringing machines closer to understanding people.
This post represents my personal views. Let me know your thoughts in the comments.
Blog by:
Alan Hsiao
Digital for Australian businesses
Is your organisation ready to take advantage of the digital economy?
Why do companies need help in adapting to the new digital economy?
There are unprecedented challenges and opportunities with relation to the new digital economy. Customer preferences and behaviours are changing and data is growing. Companies will need to be able to master insights and connectivity and insights with their customers to bring relevant offerings with speed and scale in order to succeed.
Problems facing mid-sized firms wedged between large enterprise and small businesses
They are not as nimble as small business that can easily re-invent, pivot, hire/fire people and are subject to more onerous labour laws.
They require general management practices and suffer from all traditional organisational challenges (silos, red-tape)
They do not have as deep pockets as large corporates so therefore find it harder to attract and sustain diverse pools of specialist talent. (you can’t hire a bunch of people then fire them when you pivot your offering)
They can’t afford high end advisory services of large corporates.
How we can help
Bring management and cultural mindsets and organisational practices adopted by the most innovative companies in the world.
Tailor a set of lean start-up principles, processes and approach that can be gradually scaled out across the organisation (e.g. islands of freedom).
Adopt human centred design principles and tools when developing customer facing offerings.
Leverage aspects of traditional strategy development and management principles so that these new practices can co-exist and mature within the organisation. (i.e. keeping the CFO happy).
Our value proposition
We will upskill your leadership team and get them into the required mindset for digital change by taking them through an interactive process that facilitates them to build a digital vision and roadmap that they own. This ensures any digital initiative is supported by the broader organisation.
Through our workshops, we develop actionable plans as well as organisational roles and responsibilities that lead into your first digital / innovation project.
We utilise lean start-up and human centred design methods our methods to help customers products and services to market, however we develop tailored processes for our clients based on their current organisational structure and level of maturity.
We have a deep understanding of modern capabilities such as UX/design, Cloud, Big Data, advanced analytics, machine learning and enterprise collaboration tools.
We are experienced in large Enterprise strategy planning and project portfolio management.
Our Approach









