A significant amount of the parallelism can be explained by the bi-interpretabiity of $PA$ with $ZF^{-\infty}$ ("finite set theory"), which is the theory obtained from $ZF$ by replacing the axiom of infinity by its negation, and adding the sentence asserting that every set has a transitive closure. For more detail and references, see my joint paper with Schmerl and Visser entitled $\omega$-Models of Finite Set Theoryhere.
But that is only part of the story since the parallelism between arithmetic and set theory is deep and mysterious. I can even say that my career as a logician has been greatly shaped by comparing and contrasting the metamathematics of arithmetic and set theory.
My 1998-paper Analogues of MacDowell-Specker Theorem for Set Theory, available here, gives a synopsis of the similarities and differences between $PA$ (equivalently: $ZF^{-\infty}$) and $ZF$ through the lens of model theory.
Added in the third edit: Here is a relevant quote from the above paper:
The axiom of infinity is of course only the first step in theprogression of ever bolder large cardinal axioms. As we shall seebelow, the negation of the axiom of infinity endows $ZF^{-\infty}$with a model theoretic behavior that $ZF$ can only imitate with thehelp of additional axioms asserting the existence of large cardinals.This is partially explainable by noting that the negation of the axiomof infinity in finite set theory itself can be viewed as a largecardinal axiom, not positing the existence of a large set - indeeddenying it - but attributing a large cardinal character to theuniverse itself. Of course the axioms of power set and replacementgive a ”strong inaccessibility” character to the class of ordinalswhich allows models of $ZF$ to share some of the model theoreticproperties of $ZF^{-\infty}$.
Added in the second edit: In this 2009-presentation I describe a scheme $\Lambda$ (named in honor of Azriel Levy) consisting of set-theoretic sentences of the form "there is an $n$-reflective, $n$-Mahlo cardinal" which has the surprising property that:
$ZFC + \Lambda$ is the weakest extension of $ZFC$ whose model-theoretic behavior matches that of $PA$, in several surprising respects. So $PA$ is used here as a guide to find an improvement of $ZFC$.
Most, but not all of the results in the presentation have appeared in print.