Searching for "algorithm"
predictive algorithms to better target students’ individual learning needs.
Personalized learning is a lofty aim, however you define it. To truly meet each student where they are, we would have to know their most intimate details, or discover it through their interactions with our digital tools. We would need to track their moods and preferences, their fears and beliefs…perhaps even their memories.
There’s something unsettling about capturing users’ most intimate details. Any prediction model based off historical records risks typecasting the very people it is intended to serve. Even if models can overcome the threat of discrimination, there is still an ethical question to confront – just how much are we entitled to know about students?
We can accept that tutoring algorithms, for all their processing power, are inherently limited in what they can account for. This means steering clear of mythical representations of what such algorithms can achieve. It may even mean giving up on personalization altogether. The alternative is to pack our algorithms to suffocation at the expense of users’ privacy. This approach does not end well.
There is only one way to resolve this trade-off: loop in the educators.
Algorithms and data must exist to serve educators
more on algorithms in this IMS blog
How algorithms impact our browsing behavior? browsing history?
What is the connection between social media algorithms and fake news?
Are there topic-detection algorithms as they are community-detection ones?
How can I change the content of a [Google] search return? Can I?
Larson, S. (2016, July 8). What is an Algorithm and How Does it Affect You? The Daily Dot
. Retrieved from https://www.dailydot.com/debug/what-is-an-algorithm/
Johnson, C. (2017, March 10). How algorithms affect our way of life. Desert News
. Retrieved from https://www.deseretnews.com/article/865675141/How-algorithms-affect-our-way-of-life.html
Understanding algorithms and their impact on human life goes far beyond basic digital literacy, some experts said.
An example could be the recent outcry over Facebook’s news algorithm, which enhances the so-called “filter bubble”
Massanari, A. (2017). #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3), 329-346. doi:10.1177/1461444815608807
community detection algorithms:
Bedi, P., & Sharma, C. (2016). Community detection in social networks. Wires: Data Mining & Knowledge Discovery, 6(3), 115-135.
CRUZ, J. D., BOTHOREL, C., & POULET, F. (2014). Community Detection and Visualization in Social Networks: Integrating Structural and Semantic Information. ACM Transactions On Intelligent Systems & Technology, 5(1), 1-26. doi:10.1145/2542182.2542193
Bai, X., Yang, P., & Shi, X. (2017). An overlapping community detection algorithm based on density peaks. Neurocomputing, 2267-15. doi:10.1016/j.neucom.2016.11.019
Zeng, J., & Zhang, S. (2009). Incorporating topic transition in topic detection and tracking algorithms. Expert Systems With Applications, 36(1), 227-232. doi:10.1016/j.eswa.2007.09.013
topic detection and tracking (TDT) algorithms based on topic models, such as LDA, pLSI (https://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis), etc.
Zhou, E., Zhong, N., & Li, Y. (2014). Extracting news blog hot topics based on the W2T Methodology. World Wide Web, 17(3), 377-404. doi:10.1007/s11280-013-0207-7
The W2T (Wisdom Web of Things) methodology considers the information organization and management from the perspective of Web services, which contributes to a deep understanding of online phenomena such as users’ behaviors and comments in e-commerce platforms and online social networks. (https://link.springer.com/chapter/10.1007/978-3-319-44198-6_10)
ethics of algorithm
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society
(2), 2053951716679679. https://doi.org/10.1177/2053951716679679
Malyarov, N. (2016, October 18). Journalism in the age of algorithms, platforms and newsfeeds | News | FIPP.com. Retrieved September 19, 2017, from http://www.fipp.com/news/features/journalism-in-the-age-of-algorithms-platforms-newsfeeds
more on algorithms in this IMS blog
NLP – natural language processing; ACL – Association for Computational Linguistics (ACL 2019)
Major trends in NLP: a review of 20 years of ACL research
Janna Lipenkova, July 23, 2019
The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
Data: working around the bottlenecks
large data is inherently noisy. \In general, the more “democratic” the production channel, the dirtier the data – which means that more effort has to be spent on its cleaning. For example, data from social media will require a longer cleaning pipeline. Among others, you will need to deal with extravagancies of self-expression like smileys and irregular punctuation, which are normally absent in more formal settings such as scientific papers or legal contracts.
The other major challenge is the labeled data bottleneck
crowd-sourcing and Training Data as a Service (TDaaS). On the other hand, a range of automatic workarounds for the creation of annotated datasets have also been suggested in the machine learning community.
Algorithms: a chain of disruptions in Deep Learning
Neural Networks are the workhorse of Deep Learning (cf. Goldberg and Hirst (2017) for an introduction of the basic architectures in the NLP context). Convolutional Neural Networks have seen an increase in the past years, whereas the popularity of the traditional Recurrent Neural Network (RNN) is dropping. This is due, on the one hand, to the availability of more efficient RNN-based architectures such as LSTM and GRU. On the other hand, a new and pretty disruptive mechanism for sequential processing – attention – has been introduced in the sequence-to-sequence (seq2seq) model by Sutskever et al. (2014).
Consolidating various NLP tasks
the three “global” NLP development curves – syntax, semantics and context awareness
the third curve – the awareness of a larger context – has already become one of the main drivers behind new Deep Learning algorithms.
A note on multilingual research
Think of different languages as different lenses through which we view the same world – they share many properties, a fact that is fully accommodated by modern learning algorithms with their increasing power for abstraction and generalization.
Spurred by the global AI hype, the NLP field is exploding with new approaches and disruptive improvements. There is a shift towards modeling meaning and context dependence, probably the most universal and challenging fact of human language. The generalisation power of modern algorithms allows for efficient scaling across different tasks, languages and datasets, thus significantly speeding up the ROI cycle of NLP developments and allowing for a flexible and efficient integration of NLP into individual business scenarios.
Facebook’s board works more like an advisory committee than an overseer, because Mark controls around 60 percent of voting shares. Mark alone can decide how to configure Facebook’s algorithms to determine what people see in their News Feeds, what privacy settings they can use and even which messages get delivered. He sets the rules for how to distinguish violent and incendiary speech from the merely offensive, and he can choose to shut down a competitor by acquiring, blocking or copying it.
We are a nation with a tradition of reining in monopolies, no matter how well intentioned the leaders of these companies may be. Mark’s power is unprecedented and un-American.
It is time to break up Facebook.
America was built on the idea that power should not be concentrated in any one person, because we are all fallible. That’s why the founders created a system of checks and balances.
More legislation followed in the 20th century, creating legal and regulatory structures to promote competition and hold the biggest companies accountable.
Starting in the 1970s, a small but dedicated group of economists, lawyers and policymakers sowed the seeds of our cynicism. Over the next 40 years, they financed a network of think tanks, journals, social clubs, academic centers and media outlets to teach an emerging generation that private interests should take precedence over public ones. Their gospel was simple: “Free” markets are dynamic and productive, while government is bureaucratic and ineffective.
American industries, from airlines to pharmaceuticals, have experienced increased concentration, and the average size of public companies has tripled. The results are a decline in entrepreneurship, stalled productivity growth, and higher prices and fewer choices for consumers.
From our earliest days, Mark used the word “domination” to describe our ambitions, with no hint of irony or humility.
Facebook’s monopoly is also visible in its usage statistics. About 70 percent of American adults use social media, and a vast majority are on Facebook products. Over two-thirds use the core site, a third use Instagram, and a fifth use WhatsApp. By contrast, fewer than a third report using Pinterest, LinkedIn or Snapchat. What started out as lighthearted entertainment has become the primary way that people of all ages communicate online.
The F.T.C.’s biggest mistake was to allow Facebook to acquire Instagram and WhatsApp. In 2012, the newer platforms were nipping at Facebook’s heels because they had been built for the smartphone, where Facebook was still struggling to gain traction. Mark responded by buying them, and the F.T.C. approved.
The News Feed algorithm reportedly prioritized videos created through Facebook over videos from competitors, like YouTube and Vimeo. In 2012, Twitter introduced a video network called Vine that featured six-second videos. That same day, Facebook blocked Vine from hosting a tool that let its users search for their Facebook friends while on the new network. The decision hobbled Vine, which shut down four years later.
unlike Vine, Snapchat wasn’t interfacing with the Facebook ecosystem; there was no obvious way to handicap the company or shut it out. So Facebook simply copied it. (opyright law does not extend to the abstract concept itself.)
As markets become more concentrated, the number of new start-up businesses declines. This holds true in other high-tech areas dominated by single companies, like search (controlled by Google) and e-commerce (taken over by Amazon). Meanwhile, there has been plenty of innovation in areas where there is no monopolistic domination, such as in workplace productivity (Slack, Trello, Asana), urban transportation (Lyft, Uber, Lime, Bird) and cryptocurrency exchanges (Ripple, Coinbase, Circle).
The choice is mine, but it doesn’t feel like a choice. Facebook seeps into every corner of our lives to capture as much of our attention and data as possible and, without any alternative, we make the trade.
Just last month, Facebook seemingly tried to bury news that it had stored tens of millions of user passwords in plain text format, which thousands of Facebook employees could see. Competition alone wouldn’t necessarily spur privacy protection — regulation is required to ensure accountability — but Facebook’s lock on the market guarantees that users can’t protest by moving to alternative platforms.
Mark used to insist that Facebook was just a “social utility,” a neutral platform for people to communicate what they wished. Now he recognizes that Facebook is both a platform and a publisher and that it is inevitably making decisions about values. The company’s own lawyers have argued in court that Facebook is a publisher and thus entitled to First Amendment protection.
As if Facebook’s opaque algorithms weren’t enough, last year we learned that Facebook executives had permanently deleted their own messages from the platform, erasing them from the inboxes of recipients; the justification was corporate security concerns.
Mark may never have a boss, but he needs to have some check on his power. The American government needs to do two things: break up Facebook’s monopoly and regulate the company to make it more accountable to the American people.
We Don’t Need Social Media
The push to regulate or break up Facebook ignores the fact that its services do more harm than good
Colin Horgan, May 13, 2019
Hughes joins a growing chorus of former Silicon Valley unicorn riders who’ve recently had second thoughts about the utility or benefit of the surveillance-attention economy their products and platforms have helped create. He is also not the first to suggest that government might need to step in to clean up the mess they made
Nick Srnicek, author of the book Platform Capitalism and a lecturer in digital economy at King’s College London, wrotelast month, “[I]t’s competition — not size — that demands more data, more attention, more engagement and more profits at all costs
more on Facebook in this IMS blog
Technology is a branch of moral philosophy, not of science
The process of making technology is design
Design is a branch of moral philosophy, not of a science
System design reflects the designer’s values and the cultural content
Byzantine history professor Bulgarian – all that is 200 years old is politics, not history
Access, privacy, equity, values for the prof organization ARLD.
This is how bad design makes it out into the world, not due to mailcioius intent, but whith nbo intent at all
Our expertise, our service ethic, and our values remain our greatest strengths. But for us to have the impat we seek into the lives of our users, we must encode our services and our values in to the software
Design interprets the world to crate useful objects. Ethical design closes the loop, imaging how those object will affect the world.
A good science fiction story should be able to predict not the automobile, ut the traffics jam. Frederic Pohl
Victor Papanek The designer’s social and moral judgement must be brought into play long before she begins to design.
We need to fear the consequences of our work more than we love the cleverness of our ideas Mike Monteiro
Qual and quan data – lirarainas love data, usage, ILL, course reserves, data – QQLM.
IDEO – the goal of design research isn’t to collect data, I tis to synthesize information and provide insight and guidance that leads to action.
Google Analytics: the trade off. besides privacy concners. sometimes data and analytics is the only thing we can see.
Frank CHimero – remove a person;s humanity and she is just a curiosity, a pinpoint on a map, a line in a list, an entry in a dbase. a person turns into a granular but of information.
Gale analytics on demand – similar the keynote speaker at Macalester LibTech 2019. https://www.facebook.com/InforMediaServices/posts/1995793570531130?comment_id=1995795043864316&comment_tracking=%7B%22tn%22%3A%22R%22%7D
by designing for yourself or your team, you are potentially building discrimination right into your product Erica Hall.
what is relevance. the relevance of the ranking algorithm. for whom (what patron). crummy searches.
reckless associsations – made by humans or computers – can do very real harm especially when they appear in supposedly neutral environments.
Donna Lanclos and Andrew Asher Ethonography should be core to the business of the library.
technology as information ecology. co-evolve. prepare to start asking questions to see the effect of our design choices.
ethnography of library: touch point tours – a student to give a tour to the librarians or draw a map of the library , give a sense what spaces they use, what is important. ethnographish
Q from the audience: if instructors warn against Google and Wikipedia and steer students to library and dbases, how do you now warn about the perils of the dbases bias? A: put fires down, and systematically, try to build into existing initiatives: bi-annual magazine, as many places as can
Think You’re Discreet
APRIL 21, 2019 Zeynep Tufekci
Think You’re Discreet Online? Think Again
Because of technological advances and the sheer amount of data now available about billions of other people, discretion no longer suffices to protect your privacy. Computer algorithms and network analyses can now infer, with a sufficiently high degree of accuracy, a wide range of things about you that you may have never disclosed, including your moods, your political beliefs, your sexual orientation and your health.
There is no longer such a thing as individually “opting out” of our privacy-compromised world.
In 2017, the newspaper The Australian published an article, based on a leaked document from Facebook, revealing that the company had told advertisers that it could predict when younger users, including teenagers, were feeling “insecure,” “worthless” or otherwise in need of a “confidence boost.” Facebook was apparently able to draw these inferences by monitoring photos, posts and other social media data.
In 2017, academic researchers, armed with data from more than 40,000 Instagram photos, used machine-learning tools to accurately identify signs of depression in a group of 166 Instagram users. Their computer models turned out to be better predictors of depression than humans who were asked to rate whether photos were happy or sad and so forth.
Computational inference can also be a tool of social control. The Chinese government, having gathered biometric data on its citizens, is trying to use big data and artificial intelligence to single out “threats” to Communist rule, including the country’s Uighurs, a mostly Muslim ethnic group.
Zeynep Tufekci and Seth Stephens-Davidowitz: Privacy is over
Zeynep Tufekci writes about security and data privacy for NY Times, disinformation’s threat to democracy for WIRED
more on privacy in this IMS blog
keynote: equitable access to information
the type of data: wikipedia. the dangers of learning from wikipedia. how individuals can organize mitigate some of these dangers. wikidata, algorithms.
IBM Watson is using wikipedia by algorythms making sense, AI system
youtube videos debunked of conspiracy theories by using wikipedia.
semantic relatedness, Word2Vec
how does algorithms work: large body of unstructured text. picks specific words
lots of AI learns about the world from wikipedia. the neutral point of view policy. WIkipedia asks editors present as proportionally as possible. Wikipedia biases: 1. gender bias (only 20-30 % are women).
conceptnet. debias along different demographic dimensions.
citations analysis gives also an idea about biases. localness of sources cited in spatial articles. structural biases.
geolocation on Twitter by County. predicting the people living in urban areas. FB wants to push more local news.
danger (biases) #3. wikipedia search results vs wkipedia knowledge panel.
collective action against tech: Reddit, boycott for FB and Instagram.
Mechanical Turk https://www.mturk.com/ algorithmic / human intersection
data labor: what the primary resources this companies have. posts, images, reviews etc.
boycott, data strike (data not being available for algorithms in the future). GDPR in EU – all historical data is like the CA Consumer Privacy Act. One can do data strike without data boycott. general vs homogeneous (group with shared identity) boycott.
the wikipedia SPAM policy is obstructing new editors and that hit communities such as women.
Twitter and Other Social Media: Supporting New Types of Research Materials
how to access at different levels. methods and methodological concerns. ethical concerns, legal concerns,
tweetdeck for advanced Twitter searches. quoting, likes is relevant, but not enough, sometimes screenshot
social listening platforms: crimson hexagon, parsely, sysomos – not yet academic platforms, tools to setup queries and visualization, but difficult to algorythm, the data samples etc. open sources tools (Urbana, Social Media microscope: SMILE (social media intelligence and learning environment) to collect data from twitter, reddit and within the platform they can query Twitter. create trend analysis, sentiment analysis, Voxgov (subscription service: analyzing political social media)
graduate level and faculty research: accessing SM large scale data web scraping & APIs Twitter APIs. Jason script, Python etc. Gnip Firehose API ($) ; Web SCraper Chrome plugin (easy tool, Pyhon and R created); Twint (Twitter scraper)
Facepager (open source) if not Python or R coder. structure and download the data sets.
TAGS archiving google sheets, uses twitter API. anything older 7 days not avaialble, so harvest every week.
social feed manager (GWUniversity) – Justin Litman with Stanford. Install on server but allows much more.
legal concerns: copyright (public info, but not beyond copyrighted). fair use argument is strong, but cannot publish the data. can analyize under fair use. contracts supercede copyright (terms of service/use) licensed data through library.
methods: sampling concerns tufekci, 2014 questions for sm. SM data is a good set for SM, but other fields? not according to her. hashtag studies: self selection bias. twitter as a model organism: over-represnted data in academic studies.
methodological concerns: scope of access – lack of historical data. mechanics of platform and contenxt: retweets are not necessarily endorsements.
ethical concerns. public info – IRB no informed consent. the right to be forgotten. anonymized data is often still traceable.
table discussion: digital humanities, journalism interested, but too narrow. tools are still difficult to find an operate. context of the visuals. how to spread around variety of majors and classes. controversial events more likely to be deleted.
takedowns, lies and corrosion: what is a librarian to do: trolls, takedown,
the pilot process. 2017. 3D printing, approaching and assessing success or failure. https://collegepilot.wiscweb.wisc.edu/
development kit circulation. familiarity with the Oculus Rift resulted in lesser reservation. Downturn also.
An experience station. clean up free apps.
question: spherical video, video 360.
safety issues: policies? instructional perspective: curating,WI people: user testing. touch controllers more intuitive then xbox controller. Retail Oculus Rift
app Scatchfab. 3modelviewer. obj or sdl file. Medium, Tiltbrush.
College of Liberal Arts at the U has their VR, 3D print set up.
Penn State (Paul, librarian, kiniseology, anatomy programs), Information Science and Technology. immersive experiences lab for video 360.
CALIPHA part of it is xrlibraries. libraries equal education. content provider LifeLiqe STEM library of AR and VR objects. https://www.lifeliqe.com/
Access for All:
bloat code (e.g. cleaning up MS Word code)
ILLiad Doctype and Language declaration helps people with disabilities.
A Seat at the Table: Embedding the Library in Curriculum Development
embed library resources.
libraians, IT staff, IDs. help faculty with course design, primarily online, master courses. Concordia is GROWING, mostly because of online students.
solve issues (putting down fires, such as “gradebook” on BB). Librarians : research and resources experts. Librarians helping with LMS. Broadening definition of Library as support hub.
How Machine Learning and the Cloud Can Rescue IT From the Plumbing Business
FROM AMAZON WEB SERVICES (AWS)
Many educational institutions maintain their own data centers. “We need to minimize the amount of work we do to keep systems up and running, and spend more energy innovating on things that matter to people.”
what’s the difference between machine learning (ML) and artificial intelligence (AI)?
Jeff Olson: That’s actually the setup for a joke going around the data science community. The punchline? If it’s written in Python or R, it’s machine learning. If it’s written in PowerPoint, it’s AI.
machine learning is in practical use in a lot of places, whereas AI conjures up all these fantastic thoughts in people.
What is serverless architecture, and why are you excited about it?
Instead of having a machine running all the time, you just run the code necessary to do what you want—there is no persisting server or container. There is only this fleeting moment when the code is being executed. It’s called Function as a Service, and AWS pioneered it with a service called AWS Lambda. It allows an organization to scale up without planning ahead.
How do you think machine learning and Function as a Service will impact higher education in general?
The radical nature of this innovation will make a lot of systems that were built five or 10 years ago obsolete. Once an organization comes to grips with Function as a Service (FaaS) as a concept, it’s a pretty simple step for that institution to stop doing its own plumbing. FaaS will help accelerate innovation in education because of the API economy.
If the campus IT department will no longer be taking care of the plumbing, what will its role be?
I think IT will be curating the inter-operation of services, some developed locally but most purchased from the API economy.
As a result, you write far less code and have fewer security risks, so you can innovate faster. A succinct machine-learning algorithm with fewer than 500 lines of code can now replace an application that might have required millions of lines of code. Second, it scales. If you happen to have a gigantic spike in traffic, it deals with it effortlessly. If you have very little traffic, you incur a negligible cost.
more on machine learning in this IMS blog
4 Ways AI Education and Ethics Will Disrupt Society in 2019
In 2018 we witnessed a clash of titans as government and tech companies collided on privacy issues around collecting, culling and using personal data. From GDPR to Facebook scandals, many tech CEOs were defending big data, its use, and how they’re safeguarding the public.
Meanwhile, the public was amazed at technological advances like Boston Dynamic’s Atlas robot doing parkour, while simultaneously being outraged at the thought of our data no longer being ours and Alexa listening in on all our conversations.
1. Companies will face increased pressure about the data AI-embedded services use.
2. Public concern will lead to AI regulations. But we must understand this tech too.
In 2018, the National Science Foundation invested $100 million in AI research, with special support in 2019 for developing principles for safe, robust and trustworthy AI; addressing issues of bias, fairness and transparency of algorithmic intelligence; developing deeper understanding of human-AI interaction and user education; and developing insights about the influences of AI on people and society.
This investment was dwarfed by DARPA—an agency of the Department of Defence—and its multi-year investment of more than $2 billion in new and existing programs under the “AI Next” campaign. A key area of the campaign includes pioneering the next generation of AI algorithms and applications, such as “explainability” and common sense reasoning.
Federally funded initiatives, as well as corporate efforts (such as Google’s “What If” tool) will lead to the rise of explainable AI and interpretable AI, whereby the AI actually explains the logic behind its decision making to humans. But the next step from there would be for the AI regulators and policymakers themselves to learn about how these technologies actually work. This is an overlooked step right now that Richard Danzig, former Secretary of the U.S. Navy advises us to consider, as we create “humans-in-the-loop” systems, which require people to sign off on important AI decisions.
3. More companies will make AI a strategic initiative in corporate social responsibility.
Google invested $25 million in AI for Good and Microsoft added an AI for Humanitarian Action to its prior commitment. While these are positive steps, the tech industry continues to have a diversity problem
4. Funding for AI literacy and public education will skyrocket.
Ryan Calo from the University of Washington explains that it matters how we talk about technologies that we don’t fully understand.
Tackling Data in Libraries: Opportunities and Challenges in Serving User Communities
Submit proposals at http://www.iolug.org
Deadline is Friday, March 1, 2019
Submissions are invited for the IOLUG Spring 2019 Conference, to be held May 10th in Indianapolis, IN. Submissions are welcomed from all types of libraries and on topics related to the theme of data in libraries.
Libraries and librarians work with data every day, with a variety of applications – circulation, gate counts, reference questions, and so on. The mass collection of user data has made headlines many times in the past few years. Analytics and privacy have, understandably, become important issues both globally and locally. In addition to being aware of the data ecosystem in which we work, libraries can play a pivotal role in educating user communities about data and all of its implications, both favorable and unfavorable.
The Conference Planning Committee is seeking proposals on topics related to data in libraries, including but not limited to:
- Using tools/resources to find and leverage data to solve problems and expand knowledge,
- Data policies and procedures,
- Harvesting, organizing, and presenting data,
- Data-driven decision making,
- Learning analytics,
- Metadata/linked data,
- Data in collection development,
- Using data to measure outcomes, not just uses,
- Using data to better reach and serve your communities,
- Libraries as data collectors,
- Big data in libraries,
- Social justice/Community Engagement,
- Storytelling, (https://web.stcloudstate.edu/pmiltenoff/lib490/)
- Libraries as positive stewards of user data.