Is it machine learning? Ares delves into the field of artificial intelligence

In the first part of the new series, we look at adapting the problem to the tool. Every day, some small logic made by AI technologies makes decisions that affect your experience of the world. These could be advertisements on social media or shopping sites, the facial recognition that unlocks your phone, or the clues you use to get anywhere you want. These secret and mostly invisible decisions are made by algorithms generated by machine learning (ML), which is the part of artificial intelligence technology that has been trained to determine the relationship between data sets and their results. We've heard for years in movies and on TV that computers control the world, but we're finally gettin

While working at Ars, I wrote quite a bit about artificial intelligence and machine learning. I've spoken with data scientists who have been building analytics prediction systems based on terabyte scaling of complex systems, and with developers trying to build systems that can defend networks against attacks — or in specific situations. I've also gotten into the edges of this technology myself, using code and hardware to connect different elements to AI programming interfaces. (Berlexa for example creates frightening results). Read more Twenty minutes into the future with OpenAI's Deep Fake Text AI

Many of the problems that can apply to ML are obvious tasks for humans. This is because we are trained to notice these issues with an eye - which cat is the hairiest or what time of day is the busiest in traffic. Other appropriate ML problems can be solved by humans in addition to sufficient raw data - if humans have complete memory, full vision, and an inherent statistical understanding of the model.

But machines can perform these tasks faster because there are no human restrictions on them. And ML allows them to do these things without the need for special mathematical planning in humans. Instead, the ML system can learn (or at least "learn") from the data provided and create the same problem-solving model.

This boot force can also be a weakness. Understanding how an ML system arrives at its decision-making process is usually impossible after building an ML algorithm (although the ongoing work of creating explainable ML). The quality of the results depends a lot on the quality and quantity of the data. Machine learning can only answer questions that can be determined by the data itself. Bad data or insufficient data can lead to incorrect models and poor device learning. Advertising

Despite my previous adventures, I've never built machine learning systems. I know all technology professions, and although I'm good at parsing basic information and executing all kinds of database queries, I don't consider myself a data scientist or ML programmer. My previous Python adventures are more than just an interface hack. Most of my coding and analysis skills later turned to using ML tools for very specific purposes of information security research.

My only true superpower is not the fear of trying and failing. And with that, readers, I'm here to showcase this superpower.

Current Task

This is a task that some Ars writers have exceptional skill with: writing a good title. (Beth Mall, please report to receive your prize.)

Writing an address is tough! This is a task that has several limitations - the longest length (the largest Ars is up to 70 characters), but near is not the only case. Being in a small space with enough information to accurately and adequately blame a story is a challenge, with everything you need to put together a single heading (traditional “who, what, where, when, why and a few cases” facts). Some elements are dynamic - 'who' or 'what' with a particular noun eating into the character can really mess things up.

In addition, we know from experience that Aras readers do not like clickbait and fill the comments section with sarcasm when they think they are watching it. We also know that there are things people click without fail. We also know that no matter what the topic is, some titles lead to more people clicking on them. (Is this clickbait? There is a philosophical argument, but the main thing that separates "headline everyone wants to click" from "clickbait" is the sincerity of the title - does the story below a promise exactly make a promise? Does it offer a title?)

< p> Regardless, we know that some addresses are more effective than others because we do A/B testing of addresses. Each Ars article is assigned two possible titles, then the site briefly presents both options on the homepage to see which one has the most views.

Several studies have been done by data scientists with more experience in data modeling and machine learning that study what "clickbait" titles are (titles that attract a large number of people to click on them). It distinguishes the designer article from "good" headlines (headlines that actually summarize). The articles behind it are effectively effective and don't make you write long complaints about headlines on Twitter or comments). But these studies focus more on understanding headline content than actual click count.


To get an idea of ​​what readers like about a title - and to try to understand how to write better headlines for an Ars audience - I have a set of 500 titles from the past five years with the fastest clicks I got and natural language processing on. After deleting the "stopwords" - common English words that are not usually associated with the topic of the title - I created a descriptive word to see which topics get the most attention.

This is what Ars addresses look like.

One of the most popular words to have appeared in Ars headlines in the past five years. Zoom / One of the most popular words that have appeared in Ars news headlines for the past five years.

There's been a lot of Trump news -- there's been a lot of tech news in government over the past few years, so it's probably inevitable. But these are just words for some of the winning titles. I wanted to understand the difference between winning and losing headlines. So I took the body of all the ARs title pairs again and divided them between winners and losers. These are the winners: These words are taken from titles that won A/B testing... Zoom in on words come from titles that won A-Test / B ...

The losers are here: ...and these words are lost titles and.zoom/...and these words come from the missing titles.

Keep in mind that these titles are written for exactly the same winning titles. And most often, they use the same words - with some noticeable differences. Trump hasn't been seen much in the lost headlines. "Million" is very popular for winning titles, but somewhat less known for losing titles. And the word "possible" - a not quite specific - is found in losing titles more than in winning.

This information is interesting, but by itself does not help to predict the address of each of them. The story presented will be successful. Is it possible to use ML to predict whether a title will click more or more? Can we use Ars' accumulated wisdom to create a black box that can predict the most successful titles?

If I know it's hell, but we'll try.

All of this brings us to where we are now: Aras has given me data on more than 5,500 key tests over the past four years - 11,000 titles, each with a CTR. My job is to build a machine learning model that can calculate what an Ars headline is. I mean "good", one of your charms, dear singer of Arras. To make this happen, I've got a small budget for calculating Amazon Web Services resources and a month of nights and weekends (after all, I have a working day). No problem, right?

Before I started researching Stack Exchange and various Git sites for magical solutions, however, I wanted to explore what's possible with ML and learn about talented people more than I did before. This research is as much an inspiration as it is a roadmap for potential solutions.

Is it machine learning? Ares delves into the field of artificial intelligence
is-it-machine-learning-ares-delves-into-the-field-of.html It warns that Starlink and similar networks can block each other's signals

It warns that Starlink and similar networks can block each other's signals

Ofcom says the complexity of giant satellite networks raises concerns about interference.

A British government agency is concerned that Space... Let's talk about machine learning experiments that went right and wrong

Let's talk about machine learning experiments that went right and wrong

Join the original audition on Wednesday, July 28 at 1:00 PM ET!

We've spent the past few weeks burning large amounts of AWS computing time tr... Explosive iOS spy report shows Android security limitations

Explosive iOS spy report shows Android security limitations

Amnesty International finds the incompatibility tool used by the NSO Group worrisome.

The shadowy world of private spyware has long sounded t...


... Our AI title test continues: Did we break the device?

Our AI title test continues: Did we break the device?

In Part Three of Four, we look less at what went right and what went wrong.

We are now in the third phase of machine learning projects - that...