Inspecting dev.to tags and popular articles¶

More info on the repository README: https://github.com/derlin/dev.to-is-for-web-devs-and-beginners

⮕⮕⮕ READ THE ARTICLE ON DEV.TO: dev.to is for webdevs and beginners - I have data to prove it¶

Foreword¶

I am on dev.to for a while now, and looking at the featured articles, I keep wondering: is this platform only for web developers and beginners ? I mean, most of the articles that trigger comments and reactions are about javascript, react, css or beginners, newbie, etc.

I have a few articles, that I consider not that bad (not that great either), but they don't seem to take. Might it be because I am not writing about stuff that the community cares about ?

Instead of staying in the dark, I tried to better understand what works and doesn't work on dev.to. Here are the results of my peregrinations.

Getting the data¶

dev.to is based on forem, which has an API available: https://developers.forem.com/api/v0. Note though that some of the information is incorrect, or at least doesn't work as expected when tried against https://dev.to/api.

The data that I wanted were about tags and articles. More precisely:

  • metadata about the most liked articles of all time,
  • metadata about the most active tags:
    • the total number of articles for each,
    • the top articles for each.

Top tags¶

To get the top tags, I can use the endpoint https://dev.to/api/tags (described in the forem docs as returning tags ordered by popularity), along with a per_page parameter:

https://dev.to/api/tags?per_page=100&page=0

I can also fetch the https://dev.to/tags page, and extract the tag-card elements.

The problem is, they are not giving the same results... Using the api to fetch the first 100 tags, I got 4 tags that are now 404 Not Found: macosapps, southafricanews sportnews, latestnigerianewslat. This led me to decide to use the https://dev.to/tags page.

Number of articles per tag¶

For the number of articles published with a specific tag, there are two numbers:

  1. the one shown in the https://dev.to/tags, and
  2. the one shown on the individual tag page, i.e. https://dev.to/t/<TAG>

The problem is, they don't match at all ! (1) being often way higher than (2). For example, at the time of writing, the archlinux tag shows "34635 posts published" on the tags page, but "151 Posts Published" on the https://dev.to/t/archlinux page... To settle this, I scrolled until there was no more new fetch, and got 181 articles.

This led me to decide to use the individual tag pages (2) for results.

See the difference notebook for more info.

Top articles¶

To get the top articles, I had to look into the developer console to figure out how dev.to did it. Here is the query url (cleaned a bit to remove unused query parameters):

https://dev.to/search/feed_content?class_name=Article&sort_by=public_reactions_count&sort_direction=desc

The latter returns the top articles of all time. To get the top articles by tag, I can add the query parameter tag_names[]=tag1, tag2, ... (note the space). The actual dev.to site also passes a tag parameter, but it doesn't seem to be used.

The search endpoint has a limit on the number of results returned: max 100. To get more, I need to use the page and per_page query parameters to loop through the paged results.

Code¶

This is the theory. In practice, I put up a small python script to do my bidding, see fetch/devto.py.

Imports¶

In [1]:
import json

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

import itertools

pio.templates.default = 'plotly_white'
pio.renderers.default = 'notebook'

Tags¶

First, let's load our data into a dataframe. The top_articles_by_tag.json has the following structure:

[
 {
  "tag": {
    "name": "...",
    "num_articles": N  // (1) count from the https://dev.to/tags page
  },
  "top_articles": [
    {
      "id": 468082,
      "title": "...",
      "comments_count": 112,
      "public_reactions_count": 10893,
      "reading_time": 5,
      "tag_list": [...],
      // ... and much more
    },
   "total": N         // (2) count from the https://dev.to/t/<TAG> page
 }
]
In [2]:
with open('../top_articles_by_tag.json') as f:
    tags_data = json.load(f)

def sum_article_prop(entry, prop):
    return sum(article[prop] for article in entry['top_articles'])

tags = pd.DataFrame([
        [
            entry['tag']['name'], 
            entry['total'], 
            sum_article_prop(entry, 'public_reactions_count'),
            sum_article_prop(entry, 'comments_count')
        ] for entry in tags_data
    ],
    columns=['tag', 'count', 'reactions', 'comments'])

tags = tags.sort_values('count', ascending=False).reset_index()

Let's display the total number of articles per tag (count), and the number of comments and reactions on the first 100 articles for each.

IMPORTANT: remember that one article can have up to 4 tags, so a very liked article may boost the scores of multiple tags !

In [3]:
fig = go.Figure()

for column in ['count', 'reactions', 'comments']:
    print(f'Total {column}: {tags[column].sum():,}')
    fig.add_trace(go.Scatter(x=tags.tag, y=tags[column], mode='lines+markers', name=column))

fig.update_layout(
    title='Top tags', 
    xaxis=dict(title='tag', tickmode='linear'),
    margin=dict(l=0, r=0, t=30, b=0),
    legend=dict(orientation='h', yanchor='auto', y=1.0, xanchor='auto', x=.5)
)
fig.show()
fig.write_html("plot_tags.html")
Total count: 621,345
Total reactions: 4,211,041
Total comments: 211,510