<![CDATA[bussieck.com]]>https://janbussieck.github.io/https://janbussieck.github.io/favicon.pngbussieck.comhttps://janbussieck.github.io/Ghost 3.12Mon, 14 Jun 2021 07:19:43 GMT60<![CDATA[Why I Started a Niche Job Board]]>https://janbussieck.github.io/why-i-started-a-niche-job-board/60c46afc47083c670ac0959aSat, 12 Jun 2021 11:12:03 GMT

Two years into running a newsletter on deep learning I started to receive more and more questions from readers about how to break into the industry.

Job posts on big sites commonly consisted of a long list of unrealistic requirements, on top of a PhD in the field candidates were supposed to have years of experience with tools that have barely existed that long. A ludicrous practice among tech companies notoriously frustrating for job seekers and especially people wanting to enter a new field.

What is more, the spammy and impersonal nature of large job boards turns a job search into veritable Sisyphus work, applications disappear into the ether and hiring companies are often faced with hundreds of applications that have been submitted in a similarly low effort and spammy way. When you finally make your way to an interview, you are faced with a seemingly interminable array of hoops to jump through. It's not uncommon that for a position requiring a PhD in computer science you will still be required to demonstrate basic programming skills. While offensive to quality candidate, this way of gathering applications leads to a large amount of lemons that employers have to weed out.

Anyone who has spent any amount of time on Linkedin can attest to the fact that something seems fundamentally broken about the hiring process especially in tech.

I decided to take a stab at this disintermediation problem by starting a niche job board for deep learning jobs. The first step consisted of interviewing companies and ask the questions that my readers asked me:

"How can I demonstrate my ability when I do not have a graduate degree"

"What are some types of projects you would like to see on a resume"

"What does a day in the life of a developer at your company look like?"

The following step would be to survey candidates and let them describe not only their qualifications, but also what they value in a job besides an income. Would they want to work remotely, do they have kids and prefer a family friendly company with child care compatible work hours to a startup that expects a lot of overtime.

In addition to that I will provide blog content around deep learning career questions, such as salary expectations, interview preparation and project ideas for specific roles.

This creates trust with an audience of job seekers and builds real relationships with companies and offers them an opportunity to go beyond the laundry list of qualification requirements and generic marketing speak in presenting their company and culture.

I expect that a lot more sites will be created in the next years that own a niche, build trust with an audience of job seekers and serve that audience better than the big job aggregators.

This unbundling of the hiring space creates ample opportunity for indie hackers to start job boards in niches that are too small to touch for big players.

I am trying to go that route with deep learning jobs and will document the journey and its success or failure as I go along.

]]>
<![CDATA[String<Min, Max> - Defining Types with Complex Properties]]>https://janbussieck.github.io/typescript-types-with-complex-properties/5ee9054642abcc9331f6674fSat, 20 Jun 2020 18:41:19 GMT - Defining Types with Complex Properties">

An interesting question came up during the Typescript introduction of one of my recent workshops; is it possible to define types that enforce constraints on their underlying primitives? What if, for instance, I want to define a string type that constrains strings to be within a certain range of characters, or match a certain format, or have a valid isbn check sum?

It turns out we can make clever use of phantom types and type guards to create a generator function that returns types which enforce arbitrary constraints.

But let's back up a little. For example to define a type for strings that have a minimum and maximum length, we first need to ensure that we can't assign arbitrary strings to it directly. Typescript's never type does just that, it indicates that a value never occurs, for example a function that contains an event loop and never returns for the duration of the program:

function eventLoop(): never {
    while(true) {
       	//process events
    }
}
let loop = eventLoop();
loop = undefined // Error: Type 'undefined' is not assignable to type 'never'.

By contrast returning void here instead of never would allow a value of null or undefined to be assigned to loop.

We can use this special type to define a type StringOfLength<Min, Max> as an intersection of a string type and a never type. This allows us to treat a value of this type as a string while preventing direct assignment to it:

type StringOfLength<Min, Max> = string & {
	__value__: never
}
const hello: StringOfLength<0, 8> = 'hello' // Type '"hello"' is not assignable to type { __value__: never }

It does not actually matter what name we give to __value__ since it is not directly assignable anyway. What good is a type if I can't use it, you might wonder. It turns out while we can't directly assign a value to a never type we can still cast a value as never and the most common to do so in Typescript is as return value of a type guard function. A type guard is an expression that performs a check at runtime to ensure that a value is of a certain type. A type guard function then takes some value at runtime check whether it meets a certain condition and returns a type predicate, which takes the form parameterName is Type , parameterName being the name of a parameter from the current function signature.

A type guard function for StringOfLength<Min, Max> would consequently check whether a string is within a certain range and return the type predicate str is StringOfLength<Min, Max>:

function isStringOfLength<Min extends number, Max extends number>(
  str: string,
  min: Min,
  max: Max
): str is StringOfLength<Min, Max> => str.length >= min && str.length <= max;

Any time isStringOfLength is called with some string, TypeScript will narrow that variable to StringOfLength<Min, Max> if the original type is compatible.

With this type guard function in hand, we can go ahead and define a simple constructor function for our type:

export function stringOfLength<Min extends number, Max extends number>(
  input: unknown,
  min: Min,
  max: Max
): StringOfLength<Min, Max> => {
  if (typeof input !== "string") {
    throw new Error("invalid input");
  }

  if (!isStringOfLength(input, min, max)) {
    throw new Error(`input string is not between specified min ${min} and max ${max}`);
  }

  return input;
};

const hello = stringOfLength('hello', 1, 8) // hello now has type StringOfLength<1,8>
stringOfLength('buongiorno', 1, 8) // Error: input string is not between specified min 1 and max 8

Using a combination of phantom types, type guards and constructor functions is an elegant and powerful way to construct types with complex properties and encode domain logic in our type system.

Try it yourself on Stackblitz

Addendum

André Kovac alerted me to the fact that using a plain string, a primitive type, in the above example might not be the best way to illustrate the necessity of using never to prevent assignment. Since there also is no string value that would be assignable to a type such as string & { __value__: any }, so it might not be apparent why in general we have to use never instead of any.

So let's look at another example; an object with a property value denoting an ISBN number and a property version indicating whether we are dealing with ISBN-10 or ISBN-13:

type ISBN = {
  value: string;
  version: 'ISBN-13' | 'ISBN-10';
} & {
  __value__: never;
};

We want to ensure that the type cannot be assigned without validating that the value is a valid ISBN, had we intersected the type with {__value__: any }, we could simply assign any object that matches the type structure such as

{
  value: "certainly not an ISBN",
  version: 'ISBN-13',
  __value__: 'bananas'
}

Hence we need to intersect with { __value__: never} to ensure that there exists no value matching our ISBN type structurally.

Writing the type guard and constructor function for this type is left as exercise to the reader!

You can read about how to compute an ISBN check sum here.

]]>
<![CDATA[useEffect under the Hood]]>https://janbussieck.github.io/useeffect-under-the-hood/5e9d7c13fb708a9e1ad89877Mon, 20 Apr 2020 11:16:19 GMT

The best way I have found to really have an accurate mental model of the programming abstractions I use whether compilers, promises or frameworks like react, it is to crack open the blackbox and understand the essential implementation details.
While there are a number of excellent posts on how hooks work under the hood, the inner workings of useEffect and how it relates to the lifecycle of a component continue to be a source of puzzlement for many.
As I’ll attempt to show when you peak behind the curtain the useEffect hook’s implementation is quite straightforward and fits elegantly into React’s reconciliation algorithm.
By the end I hope we’ll be able to confidently answer questions such as:

  • Why do we have to call useEffect hooks in the same order?
  • How are hooks represented by a fiber?
  • When and how exactly are values in the dependency array compared?
  • When and how are effects cleaned up?
  • Why can’t we take advantage of React fibers in my useEffect callbacks?

First, let’s briefly recap how React fibers and the reconciliation algorithm work; during reconciliation React builds up a work-in-progress fiber tree and computes a set of changes by walking the component tree and recursively calling render. Each React element is thus turned into a fiber node of  corresponding type  that keeps record of the work to be done. Think of a fiber as representing a unit of work that can be independently scheduled, paused and aborted.
When an update is called, React will add the update to a component update queue, for instance, when setState is called on render, React calls the updater function which was passed into setState. After the updater is finished the fiber gets a tag that a change needs to be made in the DOM.
The list of changes are then propagated up to the parent fiber and merged into its list of changes. This list of changes is also called effect list. When React reaches the root node the work in progress tree is marked as a pending commit.

Those changes, however, are not immediately committed to a rendering target such as the DOM.  That happens in the commit phase, this phase is atomic and cannot be interrupted, otherwise there might be UI inconsistencies.
During the commit phase React iterates over the effect list and makes its changes to the rendering target (e.g. DOM).

Let’s look at some code:
useEffect is defined in ReactHooks.js and its type signature clues us in to how it works; it accepts as first argument a function creating the effect, which optionally returns a function (cleaning up the effect) and as second argument an optional array of inputs (the dependency array) of variable type.
We see that the functions first resolves a dispatcher and then delegates to it.

//react/blob/master/packages/react/src/ReactHooks.js#L104
export function useEffect(
  create: () => (() => void) | void,
  inputs: Array<mixed> | void | null,
) {
  const dispatcher = resolveDispatcher();
  return dispatcher.useEffect(create, inputs);
}

The hook dispatchers are resolved depending on the current context, if it's the initial render and the component just mounted HooksDispatcherOnMount and otherwise HooksDispatcherOnUpdate is returned, correspondingly the dispatcher returns either mountEffect or updateEffect.

//react/blob/master/packages/react-reconciler/src/ReactFiberHooks.old.js#L570
const HooksDispatcherOnMount: Dispatcher = {
	...
  useEffect: mountEffect,
  ...
};

const HooksDispatcherOnUpdate: Dispatcher = {
  ...
  useEffect: updateEffect,
  ...
}

Without looking at the implementation, from our experience working with useEffect we know that these cases differ in at least one respect; the create function of useEffect is always invoked on mount, regardless of its second argument.

Let us first look at the more common update case; updateEffect delegates to updateEffectImpl  to pass in the current fiber and hook effect tags. I don’t want to go too much into effect tags here, suffice it to mention that each fiber’s effects are encoded in an effectTag, they define  the  work  that needs to be done for instances after updates have been processed, similarly there are hook effect tags carrying information about the hook effect's context, e.g. whether the component is unmounting or whether the effect should be invoked at all (the NoHookEffect tag).
updateEffectImpl first calls updateWorkInProgressHook to get a new hook instance, which is basically just a clone of the current hook or if we are in a work-in-progress tree the current work-in-progress hook:

const newHook: Hook = {
  memoizedState: currentHook.memoizedState,

  baseState: currentHook.baseState,
  queue: currentHook.queue,
  baseUpdate: currentHook.baseUpdate,

  next: null,
};

When a hook is called in our component it builds up a queue where hooks are represented as linked list in their call order with each hook’s next field pointing to the next hook. Since these are copied over from each render, we see why we cannot call hooks conditionally or change their call order from render to render.
The baseState and baseUpdate fields are relevant to useState and useDispatch hooks,  useEffect most importantly uses memoizedState to hold a reference to the previous effect. Let’s look at why.

//react/packages/react-reconciler/src/ReactFiberHooks.old.js#L1218
function updateEffectImpl(fiberEffectTag, hookEffectTag, create, deps): void {
  const hook = updateWorkInProgressHook();
  const nextDeps = deps === undefined ? null : deps;
  let destroy = undefined;

  if (currentHook !== null) {
    const prevEffect = currentHook.memoizedState;
    destroy = prevEffect.destroy;
    if (nextDeps !== null) {
      const prevDeps = prevEffect.deps;
      if (areHookInputsEqual(nextDeps, prevDeps)) {
        pushEffect(NoHookEffect, create, destroy, nextDeps);
        return;
      }
    }
  }

  sideEffectTag |= fiberEffectTag;
  hook.memoizedState = pushEffect(hookEffectTag, create, destroy, nextDeps);
}

The most interesting thing happening here is that if there is a currentHook we fetch the previous effect from the hook’s memoizedState field to get the previous dependencies and compare them to the next dependencies. If they are equal, we push an effect onto the queue with the NoHookEffect tag and return, which means that the effect will still be run during commit, but it won’t be executed (its create function won't be invoked). Finally, if the dependencies are not equal, we push the effect onto the queue with an effect tag that ensures the effect will fire.
As a side note areHookInputsEqual delegates to Object.is instead of a plain object reference comparison to catch javascript quirks such as NaN === NaN // false.

We skip over the source ofmountEffectImpl here, since it only differs from updateEffectImpl in that it does not check the dependency array and simply pushes the hook on the effect queue to be executed.

That is basically all that happens during reconciliation; values from previous useEffect hooks are cloned, the new dependencies compared to previous ones which were saved on the memoizedState field to determine whether the effect should fire or not and that information is pushed on the effect queue.

The next time we see our effect is after React has finished reconciliation, every render has been called and the list of updates to be committed to the rendering target aggregated. We are in the commit phase now and commitWork calls commitHookEffectList:

function commitWork(current: Fiber | null, finishedWork: Fiber): void {
	...
  commitHookEffectList(UnmountMutation, MountMutation, finishedWork);
	...
}

commitHookEffectList in turn iterates over the effect list, checks the tag to determine in which phase the effect has been added to the list and fires create or destroy respectively.
We see that in the case of an unmountTag  the destroy clean up function is called. In case, we are in an update phase, create is called firing the effect and the destroy function returned from `createz is simply saved on the effect for future reference in the unmount phase. If the effect has been tagged with NoHookEffect it is simply skipped.

// react/ReactFiberCommitWork.old.js at master · facebook/react
function commitHookEffectList(
  unmountTag: number,
  mountTag: number,
  finishedWork: Fiber,
) {
  const updateQueue: FunctionComponentUpdateQueue | null = (finishedWork.updateQueue: any);
  let lastEffect = updateQueue !== null ? updateQueue.lastEffect : null;
  if (lastEffect !== null) {
    const firstEffect = lastEffect.next;
    let effect = firstEffect;
    do {
      if ((effect.tag & unmountTag) !== NoHookEffect) {
        // Unmount
        const destroy = effect.destroy;
        effect.destroy = undefined;
        if (destroy !== undefined) {
          destroy();
        }
      }
      if ((effect.tag & mountTag) !== NoHookEffect) {
        // Mount
        const create = effect.create;
        effect.destroy = create();

        if (__DEV__) {...}
      }
      effect = effect.next;
    } while (effect !== firstEffect);
  }
}

Now we also see why the code we run in useEffect cannot take advantage of fibers which are able to pause in order to let other higher priority work finish before rendering is resumed. This is because the effect is executed inside of commitWork which makes atomic changes to the rendering target to avoid UI inconsistencies. This is important to bear in mind lest one is tempted to perform computationally intensive, synchronous work inside a useEffect hook.

I hope this basic understanding of how useEffect works under the hood helps you become more confident working with useEffect and avoid common pitfalls. It may also have encouraged you to pull away the curtain once in a while and take a look at the React source to deepen your understanding. The most difficult to understand parts of the code are often related to performance and other house-keeping, but you shouldn't let that shroud your understanding of the central pieces that are concerned with React’s core functionality.
Happy source reading!

]]>
<![CDATA[Decouple from Redux using Hooks]]>https://janbussieck.github.io/decouple-from-redux-using-hooks/5e858c4e0a554e19673f4bf4Fri, 03 Apr 2020 14:35:44 GMT

Received wisdom in the react community holds that you should subdivide your components into 'smart' containers and 'dumb', presentational components.

The rationale is to separate concerns. Logic and behavior such as data fetching, any interaction with the outside world, dispatching actions and other side effects go into our smart container and what our UI should look like as a function of the resulting data into our dumb component.

This idea leads to a pervasive pattern of creating container components solely for the purpose of connecting a part of the component tree to the redux store. So we end up with two components; one in a containers folder fetching data from the store and passing down actions and the actual component in the components folder.

To me, this quickly felt cumbersome and rigid, if I simply wanted a component to have access to a slice of the store, I found myself having to create an intermediary container and changing a number of imports in other files that use the component.

I also stopped putting every bit of state into the redux store and instead took advantage of react's new and improved context api to co-locate state that is confined to a specific, well-delineated part of the component tree. This raised questions such as whether consuming context should also only happen inside containers.

Besides, what are we really achieving by this kind of separation? Concerns about data access still has us change a number of files in the component tree and the hierarchy of our UI seems to dictate which components should be containers (by default the top level one).

While well-intentioned, the benefit of decoupling UI from state and behavior does not seem to warrant the overhead and complexity introduced by organizing files this way.

Luckily, we have a perfect tool to decouple data and behavior from our presentational components...

Hooks!

And wouldn't you know react-redux lets us consume its API only using hooks.

Let's look at a small (and admittedly contrived) example. Say, we want to implement a toggle button and keep the toggle state in the redux store, maybe it needs to be available globally, toggling an app wide setting.

This is what such a component could look like using redux classico:

import React from "react";
import {connect} from "react-redux";
import {toggleAction} from "./store/toggleActions";

const Toggle = ({on, toggle}) => {
  return (
    <button onClick={toggle}>{on ? 'on' : 'off'}</button>
  );
};

const mapStateToProps = state => ({
  on: state.toggle.on
});

const mapDispatchToProps = {toggle: toggleAction};

export default connect(mapStateToProps, mapDispatchToProps)(Toggle);

Yes, we probably want this to be a container components wrapping a presentational component (e.g. a button) simply passing on and toggle down via props, but for the sake of simplicity we're keeping everything in one component.

Now let's refactor this to use the new redux hooks api:

import React from "react";
import {useDispatch, useSelector} from "react-redux";
import {toggleAction} from "./store/toggleActions";

const Toggle = () => {
  const on = useSelector(state => state.toggle.on);
  const dispatch = useDispatch();
  return (
    <button onClick={() => dispatch(toggleAction())}>{on ? 'on' : 'off'}</button>
  );
}

Not much of an improvement, we reduced some boilerplate, but there is still a lot of redux code sitting in our component.

The beauty of hooks is how composable they are, we can just create a custom useToggle hook:

import React from "react";
import {useDispatch, useSelector} from "react-redux";
import {toggleAction} from "./store/toggleActions";

const useToggle = () => {
  const on = useSelector(state => state.toggle.on);
  const dispatch = useDispatch();
  const toggle = () =>  dispatch(toggleAction());
  return [on, toggle];
};

const Toggle = () => {
  const [on, toggle] = useToggle();
  return (
    <button onClick={toggle()}>{on ? 'on' : 'off'}</button>
  );
};

Now our component knows nothing about redux, we did not need to create a Toggle container or some abstract HOC wrapping our button, we simply use a hook to encapsulates the data layer.

This way our components are also closed to modification, should we decide to employ a different state management solution. Moving redux state into react context simply involves rewriting the hook (at least for consumers of the context):

import ToggleContext from './ToggleContext';
const useToggle = () => {
  const {on, toggle} = useContext(ToggleContext);
  return [on, toggle];
};

As I already alluded to, another disadvantage of the container pattern is that often the top-level component ends up being the container that fetches a slice of state from the store and passes it down to its children as props.

Take as an example a BookList container component that simply iterate over an array of books from the store and renders a BookItem in a list:

import React from "react";
import {connect} from "react-redux";

const BookItem = ({title, author}) => {
  return (
    <div>
      <h1>{title}</h1>
      <h2>{`by ${author}`}</h2>
    </div>
  );
};

const BookList = ({books}) => {
  return (
    <ul>
      {books.map(({book}) => {
        return (
          <li key={book.id}>
            <BookItem {...book} />
          <li>
        );
      })}
    </ul>
  )
};

const mapStateToProps = state => ({
  books: state.books.index
});

export default connect(mapStateToProps)(BookList);

A problem we might run into is that, if one book in the list is updated the entire list re-render which can quickly turn into an annoying performance issue. That is why it's a good practice to provide data as close to where it is needed as possible.

Instead of having to go in and add a BookItem container, we can just create a custom hook.

First BookList only receives an array of book ids, which presumably change less frequently than an any particular book:

import React from "react";
import {connect} from "react-redux";

const BookList = ({bookIds}) => {
  return (
    <ul>
      {bookIds.map(({bookId}) => {
        return (
          <li key={bookId}>
            <BookItem id={bookId} />
          <li>
        )
      })}
    </ul>
  );
};

const mapStateToProps = state => ({
  bookIds: state.books.ids
});

export default connect(mapStateToProps)(BookList);

The BookItem then uses the book id to fetch its data from the store:

import React from "react";
import {useSelector} from "react-redux";

const BookItem = ({id}) => {
  // we would normally pass a selector function here
  const book = useSelector(state = state.booksById[id]);
  return (
    <div>
      <h1>{book.title}</h1>
      <h2>{`by ${book.author}`}</h2>
    </div>
  );
};

We can neatly bundle that and even add the action creator for updating a book in a custom useBook hook:

// src/store/hooks.js

const useBook = (id) => {
  const book = useSelector(getBook(id));
  const dispatch = useDispatch();
  const update = (...args) => dispatch(updateAction(id, ...args));
  return [book, update];
}

Depending on how you structure your react redux projects you can include this hook as part of your redux-duck or export it alongside actions and selectors inside your redux or store folder.

It is now easy to import a hook to consume data from our redux right where it is needed profiting from the above mentioned performance gains.

What is more, we effectively removed any trace of redux from our components, granted we still need to wrap everything in a Provider, but the overall footprint is vastly reduced. Now, wherever the tempestuous winds of the javascript ecosystem may carry you, you have a clean way of interacting with any state management solution you choose in the future given it exposes hooks that you can compose.

Ideally hooks allow all our components to be dumb.

]]>
<![CDATA['On Writing Software Well' II]]>https://janbussieck.github.io/on-writing-software-well-part-ii-callback/5e73dfb21baf0e7fa30dd914Thu, 19 Mar 2020 21:24:28 GMTCallbacks vs. Listeners'On Writing Software Well' II

DHH remarks that he is a big fan of callbacks since they allow you to move a lot of incidental complexity off to the side while the rest of the code can pretend to be on a simple, default path shielding the programmer from a lot of the cognitive load that lives in callbacks instead.

To see what that means in practice, we are going to trace the mentions feature in Basecamp all the way down and pay attention to how callbacks are used to that end.

The entry point is the create action of the messages controller, which simply records a new message on a bucket (a bucket is an abstraction used to group certain entities, which will be explained in future episodes). The new_message method in turn simply instantiates a message, note that logic pertaining to the creation of mentions or actual recordings is missing from the controller.

class MessagesController < ApplicationController
...
  def create
    @recording = @bucket.record new_message,
      parent: @parent_recording,
      status: status_param,
      subscribers: find_subscribers,
      category: find_category

    ...
  end
  ...
  def new_message
    Message.new params.require(:message).permit(:subject, :content)
  end
...
end

A mention is a model joining a mentioner and mentionee to a specific recording:

class Mention < ActiveRecord::Base
  ...
  belongs_to: :recording

  belongs_to: mentionee, class_name: 'Person', inverse_of: :mentions
  belongs_to: mentioner, class_name: 'Person'
  ...
  after_commit :deliver, unless: -> { mentioner == mentionee }, on: [:create, :update]
end

Mentions are a simple concern which orchestrates when mentions are to be scheduled.

module Recording::Mentions
  extend ActiveSupport::Concern

  included do
    has_many :mentions
    after_save :remember_to_eavesdrop
    after_commit :eavesdrop_for_mentions, on: %i[ create update ], if: :eavesdropping?
  end
  ...
  private
  
  def remember_to_eavesdrop
    @eavesdropping = active_or_archived_recordable_changed? || draft_became_active?
  end

  def eavesdropping?
    @eavesdropping && !Mention::Eavesdropper.suppressed? && has_mentions? 
  end

  def eavesdrop_for_mentions
    Mention::EavesdroppingJob.perform_later self, mentioner: Current.person
  end
end

DHH points out a trick to track dirty attributes, circumventing a problem that many Rails developers have also run into; when you run an after_commit callback you can no longer access to which attributes changed invoking neither changed_attributes nor the _changed? methods, since they only persist within a database transaction.

We simply check before the transaction is committed in an after_save callback which attributes changed, make a note of it in an instance variable so that we can access the information later (e.g. in the after_commit callback).

Here, remember_to_eavesdrop records whether the content of the recordable record actually changed or whether a recordable which might contain mentions became active before we scan for mentions.

The eavesdropping? query method simply checks whether the instance variable is set, that mentions exists and that the eavesdropping callback has not been disabled via suppress. To the last point, DDH explains that while callbacks are supposed to contain code that should run by default, it might sometimes be necessary to disable them.

Finally, after checking whether we should perform any work and scan for mentions, the actual work is delegated to a job via eavesdrop_for_mentions, the job simply instantiates an instance of Mention::Eavesdropper which creates the actual mentions. Also, note how the method Current, a class that allows global, per-request storage of attributes, is used to pass the current user as mentioner to the job.

class Mention::EavesdroppingJob < ApplicationJob
  queue_as :background

  def perform(recording, mentioner)
    Current.set(account: recording.account) do
      Mention::EavesDropper.new(recording, mentioner).create_mentions
    end
  end
end

The EavesDropper in turn invokes a scanner that finds mentionees and creates mentions.

class Mention::Eavesdropper
  extend Suppressible
  ...
  def create_mentions
    recording.with_lock do
      mentionees.each do |mentionee, callsign|
        create_mention mentionee, callsign
      end
    end
  end
end

That is it, we moved the ancillary concern of creating mentions off to the side, by handling it in callbacks as response to certain life cycle events of our model as opposed to the 'main path' of our code inside the controller action. A developer interested in the main path i.e. creating messages is not confronted with the complexity of creating mentions right away. While it is true that this reduces some cognitive load in that specific case, it comes at non-negligible cost.
Note how we had to trace the feature of creating mentions in response to a change to a recordable record all the way from the controller, through the model's life cycle methods to a job and finally a service creating the mentions. Along the way we are given hints that this level of coupling is fraught with some amount amount of complication.

Tracking dirty attributes

First off, we need intricate knowledge about Rails life cycle methods in order to be able to track changes to a recording and know whether we should even check for mentions. I need to be cognizant of database transaction and how they relate to callbacks to even become aware of how to track model changes in after_commit. Talk about incidental complexity.

Checking for suppression in callbacks

Secondly, apparently, there are use cases where the client (whoever is initiating those model updates) might not want to listen for mentions, maybe I am seeding data or going through an admin API that I don't want to trigger sending emails. Quite plausible. In those cases, I need to check whether creating mention eavesdroppers has explicitly been suppressed. The problems introduced by this sort of coupling have been addressed in this post. But it again strikes me as very counterintuitive and error-prone to reach into a completely different class, whose internal state has been modified elsewhere in order to decide whether to run a callback or not.

Using Current to store request-wide state

Finally, a problem that results from handling these types of interactions deep down in active record models is that I still need information from the controller. In this case, a global object is used to register that information making it globally accessible in the entire application. That should be the clearest indication that I might be performing work in a class that has to know too much in order to perform it and hence might be the wrong place to do it.

The controller as mediator

That's enough for criticisms. I think the highlighted problems all indicate that we shouldn't know what the we are trying to know inside the callback, because we are too far removed from where those decisions occur; the controller.

I have always thought of the controller, more specifically a controller action, as a mediator encapsulating knowledge about a particular use case and deciding which models need to talk to which and what they need to know to accomplish their particular tasks. The controller orchestrates, passes on information and creates side effects, much in the vein of Gary Bernhardt's functional core / imperative shell.

At speqtor.com, we have a similar feature to Basecamp's mentions where certain updates to models create notifications for different users subscribed to that model.

A typical controller action looks like this:

def update
  load_criterion
  build_criterion
  authorize_criterion

  subscribe_listeners(@criterion)

  save_criterion

  decorate_criterion
  render_criterion
end

We like sticking to the same structure in every controller which makes them easy to understand and to spot where interesting things are happening (See the excellent Growing Rails Applications in Practice). Here, we are updating a criterion that indicates how complex a project is going to be. In this specific use case, a user directly interacts with our web app, as opposed to an importer job or the rails console. In this context we want a number of side effects to happen as a result of certain model events. This is achieved by registering event listeners, which in turn decide what is supposed to happen as a result of those changes.

In our example, we want to listen to successful updates in order to notify other users.

This happens inside the SubscribesListeners concern:

def subscribe_notification_listener(options = {})
  with_load_error_guard do
    listener_class = options[:notification_listener_class] || infer_listener
    listener = listener_class.new

    listener.current_user = current_user
    listener.changes = subscription_target.changes.transform_values do |val|
      val.map(&:to_s)
    end

    subscription_target.subscribe listener, async: true
  end
end

Here, we are instantiating the listener class, pass in the information it needs, i.e. model changes and the current user and subscribe it to the target model (in this case the criterion). Often we add other information available in the controller, such as the scope of the current project or user permissions. The listener, in turn, simply creates the notification.

Simple enough.

How are the above problems solved here?

Regarding dirty attribute tracking; since we haven't persisted the model yet, we can still access model changes though the attributes api, when the model is saved and the database transaction completes, the listener is merely notified of its success or failure.

As we are still inside the controller context we can also pass any information such as the current user to the listener without having to awkwardly store it in a global Current.

Lastly, the listener is maximally decoupled, we have to explicitly opt into creating notifications depending on the current use case, as opposed to anticipating every use case by checking related models for suppression.

An additional benefit is that, we can now easily background the listener without having to worry  about implicit state in the form of model suppression or Current registries.

So what should callbacks be used for?

I am not a big fan of hard and fast rules in software design, but sometimes it's prudent to have certain guidelines to stick to unless there is a very good reason for violating them.

One of them is that callbacks should only deal with immediate model concerns, which are in declining order of popularity:

  1. Maintaining data integrity and mutating the model into a valid state, examples are normalizing or splitting attributes.
  2. Mutate a closely associated model, for instance counter caches in a one-to-many association.
  3. Small side effects for related or derived data such as busting caches.
]]>
<![CDATA['On Writing Software Well' I]]>https://janbussieck.github.io/on-writing-software-well-part-1/5e7383d91baf0e7fa30dd890Thu, 19 Mar 2020 15:57:00 GMTComments and Extracting Rails Features 'On Writing Software Well' I

DHH has drawn back the curtain on how Basecamp writes software in a video series, tentatively titled 'On Writing Software (well?)', I find it highly instructive and valuable to talk about software design using real world software with alls its trade-offs, necessary messiness and complexity so neatly omitted in your standard textbook toy examples.

While I do lay out the contents of each episode, this is not a series plain transcript, but rather a way for me to engage with the challenges raised in DHHs examples, add my own thoughts and, at times, contrast his approach with the one we took for speqtor.com sharing examples from our code base.

Episode 1

While Code comments are sometimes necessary to explain certain decisions or trade-offs that aren't obvious from the code, more often than not comments are a kind of code smell.

You should ask yourself why am I writing this comment? How could the code itself be clearer and not need this comment?

Every developer is familiar with arcane, outdated comments in the midst of seemingly unrelated code, because the related code had been deleted. Another advantage of self-explanatory code apart from just being clearer (by definition) is that it preempts the problem of code and its explanation getting out of sync.

def remove_inaccessible_records
  # 30s of forgiveness in case of accidental removal
  unless person.destroyed? || bucket.destroyed
    Person::RemoveInaccessibleRecordJob.set(wait: 30.seconds).perform_later(person, bucket)
  end
end

The Basecamp codebase includes a method to remove all inaccessible records after a user has been deleting, because restoring a user's objects in the bucket is cumbersome a 30 second grace period was added in case a user is accidentally removed.

A comment explains not the control flow, but the configuration of the job.

We could simply add an explanatory variable elucidating the magic value of 30 seconds and hinting at its purpose.

def remove_inaccessible_records
  grace_period_removing_inaccessible_records = 30.seconds

  unless person.destroyed? || bucket.destroyed?
    Person::RemoveInaccessibleRecordJob.set(wait: 30.seconds).perform_later(person, bucket)
  end
end

However, the value does not vary, so why store it in a variable, it should be a constant. But instead of defining it at the top of the file, as we idiomatically would for public constants in ruby, we should prefer colocating related code and making the constant private.

private
GRACE_PERIOD_REMOVING_INACCESSIBLE_RECORDS = 30.seconds

def remove_inaccessible_records
  unless person.destroyed? || bucket.destroyed?
  Person::RemoveInaccessibleRecordJob.set(
    wait: GRACE_PERIOD_REMOVING_INACCESSIBLE_RECORDS
  ).perform_later(person, bucket)
  end
end

I would go a step further and separate configuration from my app code, especially since often you might want to have different values for different environments, for instance in testing environment you might want the job to execute immediately and not wait 30 seconds.

More importantly, I have a central place to go looking for configuration options in my applications and they're not scattered across my source code. In Speqtor, for example, we only send out a notification if no new notifications for a user were scheduled within a certain cool down period, so as not to clog up their inbox.

The config options are defined in config/notifications.yml

production:
  cool_off_period_in_minutes: 20
development:
  cool_off_period_in_minutes: 0.2
test:
  cool_off_period_in_minutes: 0

and included it in application.rb  under the rails namespace config.x for custom configuration.

config.x.notification = config_for(:notification)

Back to DHH, who show us an example of how some of his refactorings lead to new features in Rails. In Basecamp there is a join model for granting users administrative access to certain resource and a helper method grant that accepts a person argument and creates an entry for the person in the join model, if an entry already exists it simply returns the person record.

What might jump out at you about this method is that it commits the sin of using an exception for controlling flow. The dual offense of using framework exceptions in your code is that it also mixes two different levels of abstraction, in this case the top-level ActiveRecord API and constants from the bowels of ActiveRecord.

module Account::Administered
  extend ActiveSupport::Concern

  included do
    has_many :administratorships

    def grant(person)
	  create! person: person
        rescue ActiveRecord::RecordNotUnique
        # don't worry about dupes. Treat them the same as successful creation
        where(person: person).take
      end
    end
  end
end

The reason we are avoiding ActiveRecord's find_or_initialize_by here is that we might end up with stale reads, as find_or_initialize_by  first checks whether a record with the attributes exists using a where query and returns it if it does or else creates one with those attributes.

In applications with high load this could lead to the result returned by the where clause to being outdated, in which case the create might fail, because the record has already been created in the interim. Hence, we are first attempting to create the record and if that fails because it already exists we simply return it.

So what we actually want is create_of_find_by(person: person) which encapsulate this behavior and simplifies this code to a mere delegation:

def grant(person)
  create_or_find_by(person: person)
end

And indeed, this method has made it into Rails 6 and it's arguably what find_or_create_by should have been from the beginning.

Just a note on the topic of exceptions as a flow control; in this instance, I think it is perfectly fine to do so (and in fact the Rails method does just that), because we are relying on the database's mechanism for ensuring data consistency and simply pass the exception through to the caller. We could not have performed this check without dealing with the database exception, since this is the only interface offered to our application code.

]]>
<![CDATA[Deep Learning and the Innovator's Dilemma]]>https://janbussieck.github.io/deep-learning-and-the-innovators-dilemma/5e775954b1f6866d5f7a0712Sun, 22 Oct 2017 00:00:00 GMT

This article was originally published on deeplearningjobs.com

Mostly everyone seems to agree that AI, carried by a wave of deep learning breakthroughs, will 'disrupt' industries left and right. Of course, the term disruption is often somewhat carelessly and imprecisely bandied about to refer to technological or business model innovations that threaten industry incumbents. But what exactly enables an innovation to be disruptive given the fact that incumbents, generally, have both the know-how (in fact they are often the source of the innovation) and the resources to get a head start on any entrant is often glossed over.

Before we turn to the case of disruption through deep learning innovations, let us briefly explore an answer to this question presented by Clay Christensen and his seminal thesis on the Innovator's Dilemma. The central thesis is that incumbent firms operate in a certain context of customer needs, suppliers, target markets and competitors that form a value network which sets the standard of value for any strategy or business decision including where to allocate resources and which innovations to pursue. In practice this means that firms will often pursue sustaining innovations, that is innovations which improve and sustain the firm's position within the established value network...

]]>