More info on the repository README: https://github.com/derlin/dev.to-is-for-web-devs-and-beginners
I am on dev.to for a while now, and looking at the featured articles, I keep wondering: is this platform only for web developers and beginners ? I mean, most of the articles that trigger comments and reactions are about javascript, react, css or beginners, newbie, etc.
I have a few articles, that I consider not that bad (not that great either), but they don't seem to take. Might it be because I am not writing about stuff that the community cares about ?
Instead of staying in the dark, I tried to better understand what works and doesn't work on dev.to. Here are the results of my peregrinations.
dev.to is based on forem, which has an API available: https://developers.forem.com/api/v0. Note though that some of the information is incorrect, or at least doesn't work as expected when tried against https://dev.to/api.
The data that I wanted were about tags and articles. More precisely:
To get the top tags, I can use the endpoint https://dev.to/api/tags
(described in the forem docs as returning tags ordered by popularity), along with a per_page
parameter:
https://dev.to/api/tags?per_page=100&page=0
I can also fetch the https://dev.to/tags page, and extract the tag-card
elements.
The problem is, they are not giving the same results... Using the api to fetch the first 100 tags, I got 4 tags that are now 404 Not Found: macosapps
, southafricanews
sportnews
, latestnigerianewslat
. This led me to decide to use the https://dev.to/tags page.
For the number of articles published with a specific tag, there are two numbers:
https://dev.to/t/<TAG>
The problem is, they don't match at all ! (1) being often way higher than (2). For example, at the time of writing, the archlinux
tag shows "34635 posts published" on the tags page, but "151 Posts Published" on the https://dev.to/t/archlinux page... To settle this, I scrolled until there was no more new fetch, and got 181 articles.
This led me to decide to use the individual tag pages (2) for results.
See the difference notebook for more info.
To get the top articles, I had to look into the developer console to figure out how dev.to did it. Here is the query url (cleaned a bit to remove unused query parameters):
https://dev.to/search/feed_content?class_name=Article&sort_by=public_reactions_count&sort_direction=desc
The latter returns the top articles of all time. To get the top articles by tag, I can add the query parameter tag_names[]=tag1, tag2, ...
(note the space). The actual dev.to site also passes a tag
parameter, but it doesn't seem to be used.
The search
endpoint has a limit on the number of results returned: max 100. To get more, I need to use the page
and per_page
query parameters to loop through the paged results.
This is the theory. In practice, I put up a small python script to do my bidding, see fetch/devto.py.
import json
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import itertools
pio.templates.default = 'plotly_white'
pio.renderers.default = 'notebook'
First, let's load our data into a dataframe. The top_articles_by_tag.json
has the following structure:
[
{
"tag": {
"name": "...",
"num_articles": N // (1) count from the https://dev.to/tags page
},
"top_articles": [
{
"id": 468082,
"title": "...",
"comments_count": 112,
"public_reactions_count": 10893,
"reading_time": 5,
"tag_list": [...],
// ... and much more
},
"total": N // (2) count from the https://dev.to/t/<TAG> page
}
]
with open('../top_articles_by_tag.json') as f:
tags_data = json.load(f)
def sum_article_prop(entry, prop):
return sum(article[prop] for article in entry['top_articles'])
tags = pd.DataFrame([
[
entry['tag']['name'],
entry['total'],
sum_article_prop(entry, 'public_reactions_count'),
sum_article_prop(entry, 'comments_count')
] for entry in tags_data
],
columns=['tag', 'count', 'reactions', 'comments'])
tags = tags.sort_values('count', ascending=False).reset_index()
Let's display the total number of articles per tag (count), and the number of comments and reactions on the first 100 articles for each.
IMPORTANT: remember that one article can have up to 4 tags, so a very liked article may boost the scores of multiple tags !
fig = go.Figure()
for column in ['count', 'reactions', 'comments']:
print(f'Total {column}: {tags[column].sum():,}')
fig.add_trace(go.Scatter(x=tags.tag, y=tags[column], mode='lines+markers', name=column))
fig.update_layout(
title='Top tags',
xaxis=dict(title='tag', tickmode='linear'),
margin=dict(l=0, r=0, t=30, b=0),
legend=dict(orientation='h', yanchor='auto', y=1.0, xanchor='auto', x=.5)
)
fig.show()
fig.write_html("plot_tags.html")
Total count: 621,345 Total reactions: 4,211,041 Total comments: 211,510