Overreliance on data to drive design decisions can be just as harmful as ignoring it. Data only tells one kind of story. But your project goals are often more complex than that. Goals can’t always be objectively measured.
Data-driven design is about using information gleaned from both quantitative and qualitative sources to inform how you make decisions for a set of users. Some common tools used to collect data include user surveys, A/B testing, site usage and analytics, consumer research, support logs, and discovery calls.
Designers justified their value through their innate talent for creative ideas and artistic execution. Those whose instincts reliably produced success became rock stars.
In today’s data-driven world, that instinct is less necessary and holds less power. But make no mistake, there’s still a place for it.
Data is good at measuring things that are easy to measure. Some goals are less tangible, but that doesn’t make them less important.
Data has become an authoritarian who has fired the other advisors who may have tempered his ill will. A designer’s instinct would ask, “Do people actually enjoy using this?” or “How do these tactics reflect on our reputation and brand?”
Deciding between two or three options? This is where data shines. Nothing is more decisive than an A/B test to compare potential solutions and see which one actually performs better. Make sure you’re measuring long-term value metrics and not just views and clicks.
Sweating product quality and aesthetics? Turn to your instinct. The overall feeling of quality is a collection of hundreds of micro-decisions, maintained consistency, and execution with accuracy. Each one of those decisions isn’t worth validating on its own. Your users aren’t design experts, so their feedback will be too subjective and variable. Trust your design senses when finessing the details.
Unsure about user behavior? Use data rather than asking for opinions. When asked what they’ll do, customers will do what they think you want them to. Instead, trust what they actually do when they think nobody’s looking.
Building brand and reputation? Data can’t easily measure this. But we all know trustworthiness is as important as clicks (and sometimes they’re opposing goals). When building long-term reputation, trust your instinct to guide you to what’s appealing, even if it sometimes contradicts short-term data trends. You have to play the long game here.
large data is inherently noisy. \In general, the more “democratic” the production channel, the dirtier the data – which means that more effort has to be spent on its cleaning. For example, data from social media will require a longer cleaning pipeline. Among others, you will need to deal with extravagancies of self-expression like smileys and irregular punctuation, which are normally absent in more formal settings such as scientific papers or legal contracts.
The other major challenge is the labeled data bottleneck
crowd-sourcing and Training Data as a Service (TDaaS). On the other hand, a range of automatic workarounds for the creation of annotated datasets have also been suggested in the machine learning community.
Algorithms: a chain of disruptions in Deep Learning
Neural Networks are the workhorse of Deep Learning (cf. Goldberg and Hirst (2017) for an introduction of the basic architectures in the NLP context). Convolutional Neural Networks have seen an increase in the past years, whereas the popularity of the traditional Recurrent Neural Network (RNN) is dropping. This is due, on the one hand, to the availability of more efficient RNN-based architectures such as LSTM and GRU. On the other hand, a new and pretty disruptive mechanism for sequential processing – attention – has been introduced in the sequence-to-sequence (seq2seq) model by Sutskever et al. (2014).
Consolidating various NLP tasks
the three “global” NLP development curves – syntax, semantics and context awareness
the third curve – the awareness of a larger context – has already become one of the main drivers behind new Deep Learning algorithms.
A note on multilingual research
Think of different languages as different lenses through which we view the same world – they share many properties, a fact that is fully accommodated by modern learning algorithms with their increasing power for abstraction and generalization.
Spurred by the global AI hype, the NLP field is exploding with new approaches and disruptive improvements. There is a shift towards modeling meaning and context dependence, probably the most universal and challenging fact of human language. The generalisation power of modern algorithms allows for efficient scaling across different tasks, languages and datasets, thus significantly speeding up the ROI cycle of NLP developments and allowing for a flexible and efficient integration of NLP into individual business scenarios.
This position paper is meant to i) support the dialog among European and national policy makers, industry, research, public sector and civic society in the definition of a common roadmap for the development and adoption of a pan-European Data Sharing Space, and ii) guide public and private investments in this area in the next Multiannual Financial Framework.
At the Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC) this week, the UAE’s minister of state for Artificial Intelligence, Omar bin Sultan Al Olama, went so far as to declare that “Data is the new oil.”
according to Pulitzer Prize-winning author, economic historian and one of the world’s leading experts on the oil & gas sector; Daniel Yergin, there is now a “symbiosis” between energy producers and the new knowledge economy. The production of oil & gas and the generation of data are now, Yergin argues, “wholly inter-dependent”.
What does Oil & Gas 4.0 look like in practice?
the greater use of automation and collection of data has allowed an upsurge in the “de-manning” of oil & gas facilities
Thanks to a significant increase in the number of sensors being deployed across operations, companies can monitor what is happening in real time, which markedly improves safety levels.
in the competitive environment of the Fourth Industrial Revolution, no business can afford to be left behind by not investing in new technologies – so strategic discussions are important.
James Dixon, the CTO of Pentaho is credited with naming the concept of a data lake. He uses the following analogy:
“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”
A data lake holds data in an unstructured way and there is no hierarchy or organization among the individual pieces of data. It holds data in its rawest form—it’s not processed or analyzed. Additionally, a data lakes accepts and retains all data from all data sources, supports all data types and schemas (the way the data is stored in a database) are applied only when the data is ready to be used.
What is a data warehouse?
A data warehouse stores data in an organized manner with everything archived and ordered in a defined way. When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyze data sources and understand business processes.
Data lakes retain all data—structured, semi-structured and unstructured/raw data. It’s possible that some of the data in a data lake will never be used. Data lakes keep all data as well. A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.
Since a data lake lacks structure, it’s relatively easy to make changes to models and queries.
Data scientists are typically the ones who access the data in data lakes because they have the skill-set to do deep analysis.
Since data warehouses are more mature than data lakes, the security for data warehouses is also more mature.
Combine the superfast calculational capacities of Big Compute with the oceans of specific personal information comprising Big Data — and the fertile ground for computational propaganda emerges. That’s how the small AI programs called bots can be unleashed into cyberspace to target and deliver misinformation exactly to the people who will be most vulnerable to it. These messages can be refined over and over again based on how well they perform (again in terms of clicks, likes and so on). Worst of all, all this can be done semiautonomously, allowing the targeted propaganda (like fake news stories or faked images) to spread like viruses through communities most vulnerable to their misinformation.
According to Bolsover and Howard, viewing computational propaganda only from a technical perspective would be a grave mistake. As they explain, seeing it just in terms of variables and algorithms “plays into the hands of those who create it, the platforms that serve it, and the firms that profit from it.”
Computational propaganda is a new thing. People just invented it. And they did so by realizing possibilities emerging from the intersection of new technologies (Big Compute, Big Data) and new behaviors those technologies allowed (social media). But the emphasis on behavior can’t be lost.
People are not machines. We do things for a whole lot of reasons including emotions of loss, anger, fear and longing. To combat computational propaganda’s potentially dangerous effects on democracy in a digital age, we will need to focus on both its howand its why.