headshot of the author

Hi, I'm Martin (he/him). Welcome to my personal website!

  • 🎓 Associate Researcher/PhD Student in Econometrics @ UDE
  • ❤️ Statistical Programming & Data Visualization
  • 👨🏽‍💻 Data Science, ML & AI

Mostly using this site as a personal repository for ideas and code snippets. I sometimes blog about topics in AI, Machine Learning, Computational Statistics, and Econometrics.
All views are my own!
Feel free to reach out!

Bluesky Intention Link Generator

Intention Links I thought it would be nice to have a link generator that helps craft Bluesky intention links for pre-populating posts. An intention link is essentially a URL containing predefined content that pre-fills a web app’s form or action. When clicked, it opens the app with specified content ready to go - similar to mailto: links for email. I generate a URL using Bluesky’s compose intent format https://bsky.app/intent/compose, followed by the encoded text and a URL. The code consists of two simple components: an HTML form for input and a bit of JavaScript to handle the link generation. ...

December 6, 2024 Â· 2 min Â· 276 words Â· Martin C. Arnold

Quarto to Hugo: Preserving Figure Captions

The Problem When rendering Quarto notebooks (.qmd files) to Hugo-compatible markdown (.md files), figure captions are lost in the process. While Quarto generates proper figure captions in HTML output, for example: <img src="fig-distribution-output-1.png" width="464" height="470"> <figcaption> Figure 1: Distribution of Sample Data </figcaption> The markdown output for Hugo only includes the image with an alt text, like: <img src="fig-distribution-output-1.png" id="fig-dist" alt="Figure 1: Distribution of Sample Data" /> I tried playing with yaml options but there does not seem to be an easy fix. So here’s a javascript approach that constructs and includes captions for quarto generated figures automatically. ...

November 14, 2024 Â· 2 min Â· 301 words Â· Martin C. Arnold

GitHub Action: Scrape and Embedd Google Scholar Publications

Intro Maintaining an updated list of academic publications on a website can be tedious, especially if you’re using Google Scholar as your primary source. To streamline this process, we can build an automated workflow using GitHub Actions. In this post, I’ll show you how to create a system that fetches publication details from Google Scholar, formats them, and integrates them into a Hugo-based website. ...

November 3, 2024 Â· 9 min Â· 1774 words Â· Martin Christopher Arnold

Machine Learning and SHAP values: Random Forest Example

Credit Risk Assessment with Machine Learning To demonstrate how SHAP values aid in interpreting machine learning predictions, we examine credit risk assessment using a synthetic dataset. Our simulated credit applications include standard features like income, age, employment status, and SCHUFA scores1. We train a random forest model to predict loan repayment success and use SHAP values to explain its predictions. Data Generation We first import some standard libraries. ...

September 2, 2024 Â· 8 min Â· 1608 words Â· Martin C. Arnold

Shapley values in Machine Learning

Shapley Values: From Game Theory to Model Interpretation In today’s world of machine learning, model interpretability is becoming increasingly important. One particularly elegant tool for this is Shapley Values - a concept that originated in game theory and is now revolutionizing how we understand machine learning models. The Game Theory Roots Imagine a group of students working on a term paper: Jens gathers research data and literature Karo performs data analysis and creates charts Thilo writes and edits the paper The paper’s success depends on everyone: Jens’ research lays the foundation, Karo’s analysis adds depth, and Thilo’s writing ties it all together. But how do you fairly divide credit for the final grade? Each student’s contribution is relevant, but their impact depends on the collaboration. ...

August 28, 2024 Â· 4 min Â· 677 words Â· Martin C. Arnold

Note: Docker Compose setup for TYPO3 Development

A Quick Reference Docker Compose makes it simple to orchestrate a multi-container development environments. Below is a streamlined setup I used for a TYPO3 project. This post serves as a personal reference for future projects. The Setup Put this in a .yaml file: services: web: image: webdevops/php-apache:8.1 container_name: typo3-web ports: - "8080:80" volumes: - .:/app working_dir: /app environment: - PHP_DISPLAY_ERRORS=1 - PHP_MEMORY_LIMIT=512M db: image: mariadb:10.5 container_name: typo3-db environment: MYSQL_ROOT_PASSWORD: root MYSQL_DATABASE: typo3_db ports: - "3306:3306" volumes: - db_data:/var/lib/mysql volumes: db_data: Web Service: PHP + Apache Image: webdevops/php-apache:8.1 – A PHP 8.1 image with Apache preconfigured (update to release compatible with T3 installation) Ports: Maps container port 80 to host port 8080 for access at http://localhost:8080. Volumes: Mounts the current directory to /app for real-time file changes. Environment: PHP_DISPLAY_ERRORS=1 for debugging. PHP_MEMORY_LIMIT=512M for performance. DB Service: MariaDB Database Image: mariadb:10.5 – A stable MariaDB version back then (Update to current release) Environment: Sets root password and initializes typo3_db database. Ports: Maps MariaDB port 3306 to host for database access. Volumes: Persists data using the db_data volume. Persistent Volume db_data Volume: Stores database files persistently across container restarts. Running the Setup Run the following command to start the environment: ...

July 13, 2024 Â· 2 min Â· 236 words Â· Martin C. Arnold

LLM Student Advisor: RAG-powered Chatbot for MSc Econometrics

(3.12.7) Introduction In this post, I’ll document how I built Metrica, an AI-powered student advisor chatbot for the MSc Econometrics program. The chatbot uses Retrieval-Augmented Generation (RAG) to provide accurate information by combining the power of a large language model (LLM) with reliable source material such as FAQs and terms of study. Metrica is designed to assist students with common queries about the program and provide guidance based on official documentation. ...

July 3, 2024 Â· 11 min Â· 2314 words Â· Martin Christopher Arnold

Note: Jupyter pyenv Setup for Quarto in VSCode

Ensure you have pyenv installed on your system. On a Unix-based system you can install it using the following command: curl https://pyenv.run | bash Now use pyenv to install the specific Python version needed for the Quarto project pyenv install <the-python-version> Navigate to your Quarto project directory and set a local Python version for the project cd the/path/to/my/quarto-project pyenv local <the-python-version> The last line creates a .python-version file in your project directory, specifying the Python version for this project. ...

June 25, 2024 Â· 2 min Â· 306 words Â· Martin C. Arnold

Interactive digit recognition in tensorflow.js

(pynenv setup with python 3.9.20) The story so far In my previous post about Convolutional Neural Networks (CNNs), I demonstrated how to train a model for handwritten digit recognition. Now, we’ll take this model further by deploying it in a web application using TensorFlow.js. This post guides you through the process of exporting the trained model and implementing it in an Observable notebook where users can draw digits on a canvas for real-time classification. ...

June 18, 2024 Â· 4 min Â· 848 words Â· Martin Christopher Arnold

CNN for digit recognition

(pynenv setup with python 3.9.20) Convolutional Neural Networks (CNNs) are powerful tools in image recognition tasks. A classic problem solved by CNNs is predicting handwritten digits from the MNIST dataset. Here’s how to train a CNN for handwritten digit classification using TensorFlow’s Keras API in Python. Dataset and Preprocessing The MNIST dataset (delivered in tensorflow.keras.datasets) consists of 70,000 grayscale images, each representing a digit (0–9). We first load the TensorFlow library as tf, preprocess the data by reshaping the images into a 28x28 pixel format with 1 channel (grayscale), and normalize pixel values to the range $[0, 1]$. The labels are one-hot encoded for multi-class classification. ...

June 17, 2024 Â· 5 min Â· 920 words Â· Martin Christopher Arnold