large data is inherently noisy. \In general, the more “democratic” the production channel, the dirtier the data – which means that more effort has to be spent on its cleaning. For example, data from social media will require a longer cleaning pipeline. Among others, you will need to deal with extravagancies of self-expression like smileys and irregular punctuation, which are normally absent in more formal settings such as scientific papers or legal contracts.
The other major challenge is the labeled data bottleneck
crowd-sourcing and Training Data as a Service (TDaaS). On the other hand, a range of automatic workarounds for the creation of annotated datasets have also been suggested in the machine learning community.
Algorithms: a chain of disruptions in Deep Learning
Neural Networks are the workhorse of Deep Learning (cf. Goldberg and Hirst (2017) for an introduction of the basic architectures in the NLP context). Convolutional Neural Networks have seen an increase in the past years, whereas the popularity of the traditional Recurrent Neural Network (RNN) is dropping. This is due, on the one hand, to the availability of more efficient RNN-based architectures such as LSTM and GRU. On the other hand, a new and pretty disruptive mechanism for sequential processing – attention – has been introduced in the sequence-to-sequence (seq2seq) model by Sutskever et al. (2014).
Consolidating various NLP tasks
the three “global” NLP development curves – syntax, semantics and context awareness
the third curve – the awareness of a larger context – has already become one of the main drivers behind new Deep Learning algorithms.
A note on multilingual research
Think of different languages as different lenses through which we view the same world – they share many properties, a fact that is fully accommodated by modern learning algorithms with their increasing power for abstraction and generalization.
Spurred by the global AI hype, the NLP field is exploding with new approaches and disruptive improvements. There is a shift towards modeling meaning and context dependence, probably the most universal and challenging fact of human language. The generalisation power of modern algorithms allows for efficient scaling across different tasks, languages and datasets, thus significantly speeding up the ROI cycle of NLP developments and allowing for a flexible and efficient integration of NLP into individual business scenarios.
This position paper is meant to i) support the dialog among European and national policy makers, industry, research, public sector and civic society in the definition of a common roadmap for the development and adoption of a pan-European Data Sharing Space, and ii) guide public and private investments in this area in the next Multiannual Financial Framework.
At the Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC) this week, the UAE’s minister of state for Artificial Intelligence, Omar bin Sultan Al Olama, went so far as to declare that “Data is the new oil.”
according to Pulitzer Prize-winning author, economic historian and one of the world’s leading experts on the oil & gas sector; Daniel Yergin, there is now a “symbiosis” between energy producers and the new knowledge economy. The production of oil & gas and the generation of data are now, Yergin argues, “wholly inter-dependent”.
What does Oil & Gas 4.0 look like in practice?
the greater use of automation and collection of data has allowed an upsurge in the “de-manning” of oil & gas facilities
Thanks to a significant increase in the number of sensors being deployed across operations, companies can monitor what is happening in real time, which markedly improves safety levels.
in the competitive environment of the Fourth Industrial Revolution, no business can afford to be left behind by not investing in new technologies – so strategic discussions are important.
James Dixon, the CTO of Pentaho is credited with naming the concept of a data lake. He uses the following analogy:
“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”
A data lake holds data in an unstructured way and there is no hierarchy or organization among the individual pieces of data. It holds data in its rawest form—it’s not processed or analyzed. Additionally, a data lakes accepts and retains all data from all data sources, supports all data types and schemas (the way the data is stored in a database) are applied only when the data is ready to be used.
What is a data warehouse?
A data warehouse stores data in an organized manner with everything archived and ordered in a defined way. When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyze data sources and understand business processes.
Data lakes retain all data—structured, semi-structured and unstructured/raw data. It’s possible that some of the data in a data lake will never be used. Data lakes keep all data as well. A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.
Since a data lake lacks structure, it’s relatively easy to make changes to models and queries.
Data scientists are typically the ones who access the data in data lakes because they have the skill-set to do deep analysis.
Since data warehouses are more mature than data lakes, the security for data warehouses is also more mature.
Combine the superfast calculational capacities of Big Compute with the oceans of specific personal information comprising Big Data — and the fertile ground for computational propaganda emerges. That’s how the small AI programs called bots can be unleashed into cyberspace to target and deliver misinformation exactly to the people who will be most vulnerable to it. These messages can be refined over and over again based on how well they perform (again in terms of clicks, likes and so on). Worst of all, all this can be done semiautonomously, allowing the targeted propaganda (like fake news stories or faked images) to spread like viruses through communities most vulnerable to their misinformation.
According to Bolsover and Howard, viewing computational propaganda only from a technical perspective would be a grave mistake. As they explain, seeing it just in terms of variables and algorithms “plays into the hands of those who create it, the platforms that serve it, and the firms that profit from it.”
Computational propaganda is a new thing. People just invented it. And they did so by realizing possibilities emerging from the intersection of new technologies (Big Compute, Big Data) and new behaviors those technologies allowed (social media). But the emphasis on behavior can’t be lost.
People are not machines. We do things for a whole lot of reasons including emotions of loss, anger, fear and longing. To combat computational propaganda’s potentially dangerous effects on democracy in a digital age, we will need to focus on both its howand its why.
Studying Connections between Student Well-Being,
Performance, and Active Learning
Amy Godert, Cornell University; Teresa Pettit, Cornell University
Treasure in the Sierra Madre? Digital Badges and Educational
Chris Clark, University of Notre Dame; G. Alex Ambrose, University
of Notre Dame; Gwynn Mettetal, Indiana University South Bend;
David Pedersen, Embry-Riddle Aeronautical University; Roberta
(Robin) Sullivan, University of Buffalo, State University of New York
Learning and Teaching Centers: The Missing Link in Data
Denise Drane, Northwestern University; Susanna Calkins,
Identifying and Supporting the Needs of International Faculty
Deborah DeZure, Michigan State University; Cindi Leverich, Michigan
Online Discussions for Engaged and Meaningful Student
Danilo M. Baylen, University of West Georgia; Cheryl Fulghum,
Haywood Community College
Why Consider Online Asynchronous Educational Development?
Christopher Price, SUNY Center for Professional Development
Online, On-Demand Faculty Professional Development for Your
Roberta (Robin) Sullivan, University at Buffalo, State University of
New York; Cherie van Putten, Binghamton University, State
University of New York; Chris Price, State University of New York
The Tools of Engagement Project (http://suny.edu/toep) is an online faculty development model that encourages instructors to explore and reflect on innovative and creative uses of freely-available online educational technologies to increase student engagement and learning. TOEP is not traditional professional development but instead provides access to resources for instructors to explore at their own pace through a set of hands-on discovery activities. TOEP facilitates a learning community where participants learn from each
other and share ideas. This poster will demonstrate how you can implement TOEP at your campus by either adopting your own version or joining the existing project.
Video Captioning 101: Establishing High Standards With
Stacy Grooters, Boston College; Christina Mirshekari, Boston
College; Kimberly Humphrey, Boston College
Recent legal challenges have alerted institutions to the importance of ensuring that video content for instruction is properly captioned. However, merely meeting minimum legal standards can still fall significantly short of the best practices defined by disability rights
organizations and the principles of Universal Design for Learning. Drawing from data gathered through a year-long pilot to investigate the costs and labor required to establish “in-house” captioning support at Boston College, this hands-on session seeks to give
participants the tools and information they need to set a high bar for captioning initiatives at their own institutions.
Sessions on mindfulness
52 Cognitive Neuroscience Applications for Teaching and Learning (BoF)
53 Contemplative Practices (BoF) Facilitators: Penelope Wong, Berea College; Carl S. Moore, University of the District of Columbia
79 The Art of Mindfulness: Transforming Faculty Development by Being Present Ursula Sorensen, Utah Valley University
93 Impacting Learning through Understanding of Work Life Balance Deanna Arbuckle, Walden University
113 Classroom Mindfulness Practices to Increase Attention, Creativity, and Deep Engagement Michael Sweet, Northeastern University
132 Measuring the Impacts of Mindfulness Practices in the Classroom Kelsey Bitting, Northeastern University; Michael Sweet, Northeastern University
• Who collects and controls the data?
• Is it accessible to all stakeholders?
• How are the data being used, and is there a possibility for abuse?
• How do we assess data quality?
• Who determines which data to trust and use?
• What happens when the data analysis yields flawed results?
• How do we ensure due process when data-driven errors are uncovered?
• What policies are in place to address errors?
• Is there a plan for handling data breaches?