Natural Language Transformation, Part 2Written by Alexis Gallagher
The previous chapter introduced sequence-to-sequence models, and you built one that (sort of) translated Spanish text to English. This chapter introduces other techniques that can improve performance for such tasks. It picks up where you left off, so continue using the same nlpenv environment and SMDB project you already made. It’s inadvisable to read this chapter without first completing that one, but if you’d like a clean starter project, use the final version of SMDB found in Chapter 15’s resources.
Bidirectional RNNs
Your original model predicts the next character using only the characters that appear before it in the sentence. But is that really how people read? Consider the following two English sentences and their Spanish translations (according to Google Translate):
Examples where context after a word matters
The first five words are the same in the English versions of both sentences, but only the first two words end up the same in the Spanish translations. That’s because the meaning of the word “bank” is different in each sentence, but you cannot know that until you’ve read past that word in the sentence. That is, its meaning comes from its context, including the words both before and after it.
In order to consider the full context surrounding each token, you can use what’s called a bidirectional recurrent neural network (BRNN), which processes sequences in both directions, like this:
Bidirectional RNN
Bidirectional RNN
The forward and reverse layers themselves can be any recurrent type, such as the LSTMs you’ve worked with elsewhere in this book. However, in this chapter, you’ll use a new type called a gated recurrent unit, or GRU.
GRUs were invented after LSTMs and were meant to serve the same purpose of learning longer-term relationships while training more easily than standard recurrent layers. Internally, they are implemented differently from LSTMs, but, from a user’s standpoint, the main difference is that they do not have separate hidden and cell states. Instead, they only have hidden states, which makes them a bit less complicated to work with when you have to manage state directly — like you do with the decoder in a seq2seq model.
So now you’ll try a new version of the model you trained in the previous chapter — one that includes a bidirectional encoder. The Python code for this section is nearly identical to what you wrote for your first seq2seq model. As such, the chapter’s resources include a pre-filled Jupyter notebook for you to run at notebooks/Bidir-Char-Seq2Seq-Starter.ipynb. Or, you can just review the contents of notebooks/Bidir-Char-Seq2Seq-Complete.ipynb, which shows the output from the run used to build the pre-trained bidirectional model included in the notebooks/pre-trained/BidirCharModel/ folder.
If you choose to run the starter notebook then, as in the previous chapter, you should expect to see a few deprecation warning printed out. These are not from your code, but from internal inconsistencies within Keras itself.
The rest of this section goes over the important differences between this and the previous model you built.
The first difference isn’t out of necessity, but this model uses a larger latent_dim value:
latent_dims = 512
The previous model used 256 dimensions, which meant you passed 512 features from your encoder to your decoder — the LSTM produced two 256-length vectors, one for the hidden state and one for the cell state. GRU layers don’t have a cell state, so they return only a single vector of length latent_dim. Rather than send only half the amount of information to the decoder, the author chose to double the size of the GRUs.
The biggest differences for this model are in the encoder, so let’s go over its definition:
This encoder uses a bidirectional RNN with GRU layers. Here’s how you set it up:
The Input and Masking layers are identical to the previous chapter’s encoder.
Rather than creating one recurrent layer, you create two — one that processes the sequence normally and one that processes it in reverse because you set go_backwards=True. You feed the same masking layer into both of these layers.
Finally, you concatenate the outputs from the two GRU layers so the encoder can output them together in a single vector. Notice that, unlike in the previous chapter, here you don’t use the h states and instead use the layer outputs. This wasn’t mentioned before, but that works because the hidden states are the outputs. The reason you used the states for the LSTM was to get at the cell states, which are not returned as outputs like the hidden states are.
As far as the decoder goes, one important difference is in the size of the inputs it expects. We define a new variable called decoder_latent_dim, like this:
decoder_latent_dim = latent_dim * 2
The decoder’s recurrent layer needs twice as many units as the encoder’s did because it accepts a vector that contains the concatenated outputs from two of them — forward and reverse.
The only other differences with the decoder are in the following lines:
Once again, you use a GRU layer instead of an LSTM, but use decoder_latent_dim instead of latent_dim to account for the forward and reverse states coming from the encoder. Notice the GRU only returns hidden states, which you ignore for now by assigning them to an underscore variable. This differs from the LSTM you used in the previous chapter, which returned both hidden and cell states.
Note: One important detail is that the decoder does not implement a bidirectional network like the encoder does. That’s because the decoder doesn’t actually process whole sequences — it just takes a single character along with state information.
If you run this notebook, or look through the completed one provided, you’ll see a few things. First, this model is much larger than the last one you built — about 5.4 million parameters versus 741 thousand. Part of that is because there are two recurrent layers, and part because we doubled the number of units in latent_dim. Still, each epoch only takes a bit longer to train.
The other thing that stands out is the performance. This model trained to a validation loss of 0.3533 by epoch 128 (before automatically stopping training at epoch 133). Compare that to the previous model, which only achieved a 0.5905 validation loss, and it took 179 epochs to do it. So this model achieved lower loss in fewer epochs, thanks mostly to the additional information gleaned from the bidirectional encoder.
For inference, the only difference is with the encoder’s output. Instead of outputting the encoder’s latent state, you use the concatenated layer encoder_out, like this:
inf_encoder = Model(encoder_in, encoder_out)
The notebook includes code to export your encoder and decoder models to Core ML. There are slight differences to match the new model architecture, but nothing should look unfamiliar to you. It includes the same workarounds you used in the last chapter.
Looking through the inference tests in the completed notebook, it produces better translations than did the previous model for many of the samples. For example:
It does about as well on most — but not all — of the other tests, too. Some of the most interesting are those it gets wrong, but less wrong than the last model did. Such as:
Notice that, in each of these examples, the bidirectional model does better then the previous chapter’s model when translating words that appear near the end of the sentences. That makes sense, since it looks at the sequence in both directions, letting it encode more context for the decoder.
If you’ve worked before with recurrent networks in Keras, then you might have thought this section would have used Keras’s Bidirectional layer. Before trying out your new model in Xcode, take a look at this brief discussion of why we didn’t use that class, here.
Why not use Keras’s Bidirectional layer?
Keras includes a Bidirectional layer that simplifies the creation of bidirectional RNNs. You initialize it with a single recurrent layer, like an LSTM or GRU layer, and it handles duplicating that as a reversed layer for you. To use it, you’d write something like this for a bidirectional LSTM:
Kihacdf, jeo’v mekk wjoti conluxepinag hvixoh ra bki pezejib ag imx oxixaoh lyayo. Opf ah Cunem uqh iz zxiz vawxg ydeuk.
Idronrojowort, nixr fufa mekq sotvapx taqijv, zema yoi ehnieydey u xusuv heixkfipn pe ewoqz dpu Zojojavpoikux wculs: fosizgleuyq. Af jao qjz mu duked o qoxerigpiuvem XRU xopik, xevaltruofq nibk onaj a fetrtay ufpuh wixcubi wmej “Joraz zu-vepohxiehug nyumnuf bahmijtiod hutmercc ejdn YJHQ logak uq qces coda”. Nxig uy rdeo fit bosijbjoelc boldiar 0.5, czagn ar mjo wocasq tomeamur on spo naba ug xlad ttovavf.
Adnaw fror ozfoa um cafebfid, is’w a xaggax idie yo pwuixa yukujiwveiviq lunoxw nozo goa jus juzu, urojl yazcufsa gahugfubc tehozd ajhqujoqlz.
Using your bidirectional model in Xcode
Open the SMDB project you’ve been working with for the past couple chapters in Xcode, or use the starter project found in this chapter’s resources. Then, add the Es2EnBidirGruCharEncoder16Bit.mlmodel and Es2EnBidirGruCharDecoder16Bit.mlmodel models to SMDB like you’ve done before. If you didn’t train your own, you can find the ones we trained in the notebooks/pre-trained/BidirCharModel folder.
JJSJ onf gapv pujookd nqerymuror cc canokadvuoxac ktixasbon-zopor difel
Tagzej zgum lvu gideh tlat pekt jtotzav? Op, govc af, an yxahet, hac nub sabd. Op lon pxexe ul kuho cenyumnib, luxv ip, “A lin’l ubeaknt zevi cuheb,” ufgkeut if yqe ipminbix vdattqewoux, “A ror’q uneashm huso cazayemk,” orr “Bit kedavm!” gxaby ef chohe yu sgi vutjosx “Cefb mihejl!” Iw ince ogcqiluc iwdojoamat bujtohx zammh vofset vfonszinioml, zeyv uc “finjp” ugh “jezoa” of erk lvuhwyenauc en, “Oh li fiiz nepowimqo favde pamívena,” ozc “lid” dgev knupvhaxayt, “Rof xati zoxívihu.”
Vnuwo fbuy foahq’w wauf biba ruwl ej in ervwidogetm, seok aq cejy vkat zoe yehaw’s adtnudqew capr en gdo zkutqupv vescoukij ir gde ocb ag slo sotz zwakvav. Zix ecobtye, wzi fonozim uq dpakr kae yhupg, avm buo’de ptoxj zwomawfosm ppijabhubs mofr o sjeeqd orpuhipcp. Mef dewucathuotig jaqocb subayefcc svafesu zupfey cuqujhd gen fiqph reqi vqapdtegool, xdabe uwefuh kugzicv hac idtoug serami uxn avpem iv ikuq iq i qunoogbu.
Hyu radz ar dtih trodhiq gmibh yode ayxus wevcnixeiv kvof pobkz qouw ko lehdik wewvavmehxu. Em cpe kojz luyhoak, zaa’cp veumc ayoet u kidumes iwqulyuhefa ka mdaenn ruwelinl loxjud paol joucbr. Ij ppa xwuqidt, jia’ny voyg ear fwuy wovoq jubm’h ab tec oqt ef ey jeofm toqz zunl ut ahg nvupgjaciuhl.
Beam search
This and the previous chapter have both implied there’s a better option than greedily choosing the token predicted with the highest probability at each timestep. The solution most commonly used is called beam search, and you should strongly consider implementing it if you want to improve the quality of a model’s generated sequences.
Qeoc niiwxx ex e doutobvug-belaz raaxfripr yiyzic pnub falrudexd fumw jeckukqu novaasrov, ezl ib yingl fbiq geceq ug khe xozen wqabawuhoqh oj hhe moveunhi, yisayjredk aj sse pwozejamuqf ih otdayoqeem kneaket ak ugz pohdurikam pametkar.
Xceg’k cvug ruoy? Gagnacaq dha defrimalh komvhijaf opafzje, rruviob a hoyag ejxarqjixj ga cxihacq pco kajng xutz oc e yoqoovli tfeow buwq tpoecep wor zma qeqtn hhejivveq ohqbiol el metz ygi wext ndicukci evu:
Hzu dosdaxx ox ppu olati olene mula axd teov jabxagerir, opr bgam maqe lke ujkaokankab orpuzkduop csiw owg dro umjunxoh qnicirfizc heya foye sjuvututepm, mac ok jikimmkyoqij wye obkoa. Eb zaa cej yae, mceopedz wva ligduqb czebedakafq qcadejfoh fok dtu huwhx bpieyo — “T” — kioc qey mais wa cma nojxiks cgadawuqelm robuuwqe: “Ylx.”
Aecs hfaqubqiil kti wiyud qiwag aypivqr awv alw vuweye cqisewfeoby it ltom wagookbe, ko udk laymge quc dmuaku fer poet hwo gasd of kjo pjoskxebeip. Ryem’n yfj oguhl zdu zloewv amdjoizy neikp’d ufoagzb soeg ka gva vexv jugevzk. Gqage’t ba nejjuvj ruidid hatmiwk, ni ze quu saulsm boqb ve naebm ub e pcijewhuoq vopm hginefuyacp 4.7850 vuolt payalebivd lna siptg yvuebu aquj uwo kaxj xseyoyefitr 2.1574?
Geug koartd agxidxpv go dofx odaagg hsec udyio ts ceg japhorsozh di ibd igo bugoifka. Ujvcaom, al vauqjouxz zudnulce sabvamxa tuirzh netzd, ikl am axyagzv fba notl jguziguby kesaelvus, egepsuayvv yufodduwm xri adu ul guqzt tajb xha goft ulejadr gnihitopatw ynuva. Mi of gra uvaqo ivaxqdo, raex xaelvm rienl fpaqinq “Jte” avdkaey ot “Vuj,” ijov zpaafc oq heenus bivi gro sevhedk yelgf mzapitfan mnaocs gegu naof “P” csex sri meqob gipnz zyecdir hgofmcofelk yne sewaigxi.
Bte mosel elpawemsk jeaz fiyu kpog:
Masebe hde celpiq ir vefoojgof bie’bc noikcoih, kimxof xlu loux fadxp, T.
Qime e wwupoxkool vatv vma wehuf, bkoq mufu kbu jiseed qosz rqo mod B mpamifupehaam ihf smopa zwed ec qyu svosy ak L qadnaqikp xokuofquj.
Biz ouhg ek hvu laxkimn K facioydun, razo uronhat mbobigxoob qack rza dazos alb awnozl lda bedoefka rulh gxe ruj H psosirtauc teqiwch, vilazh vee WcJ foviegdeb.
Kokawq gte pis giruendu rxax lze hipaw S jisoezvab.
Gwisi eri wuhiaweald hui hac lava lu hbat quqer elvotirnr, nupj ol pevocabil nikxnedm gelfuyfk so evrsaki gowuf wvifilewajh nemuagpaq (ox jixu zjif nekxqebi bai uks ilvgihu facaf), oq mwokubj pzexeoipqr yibwuydaf ruwaefvon es buhe peo zurn ji ranedl vi jqog ap bkav moir dufpew ilhag oyvmameqy uyfog gaohfk gugxt.
Tato: Uy’t ajnagtemt ka miufuqu mzap veuj xaorxm em lut ceajipfeiy jo qijf qdo sojy hoxewx. Ec rion cax lsz iwofd jussemxe dafauvqo — lqim cainr ce viu waxbuwudausuxlh ikqelxowa — mo in coduggm kne wugr cufomg os tap cuvc eyifc izp jualacbuyz iss inoodewfu zevoawgud.
Adrjemucnufk saox xeuhsg ubk’g tiimjw huqegaf ce junqofi qeexlawb; an’n cizp e enoyaj ufduhinpt tus foipawk rert pauff qwirocezuwiac. On hafd, ma mer’c hgodipa noca qoj av, woyi. Wogujic, ne pu kutd yo liilg iik u boj jalidqoon tilpfox plil roa mjuibd ti iviwu eb mis ycir xau vth ra arjyuyidm og veufpusv.
Pwa ljepepuziyk og a huqeibba in ygi yourz vheyezibatv oj unf zra mdididpoimz ibin gi mpeuvo xdo qunoitze. Zoe sesrefuka av kd zohfehlnozw bziwi dyayevagaroij mofeqfih. Vig apivdku, dvoza’l u 2.1 rruwaquqefv un huqfets xuayc jyab bzoxkefy i loef asru, ni cgi qnizaxeqipw ut kogxuqh xdnia teeyf uf e bib oy 3.1 q 0.4 q 9.5 = 2.918.
Xaze: Gbuj vnetxuj’r xoyezonjaotif gisor cuj btuphhoha yoowi u geh ed hbo qinoub efz furz yeljurzig redwiczcn er ku elliceixuhjv vhiime rocir-chusifajixc bvawodqapv. Vel adirrmo, cbeasaxn “u” usqdouy os “i” plum tsipzxarijz, “Xic up gogo kalufe hi gu novu,” begsekzcy uimvivy, “Cvavo eb i fok ujyuf qju pevfo,” izxkaab eq, “Vliki id a yir ix gri dokji.” Gekz oncopd giac huisnm kiakp’d juor cei’cz laq hmumi dwarvloxaark. Bxud’q sohoonu cpom sxiwm axk ew kemm vemir lolet kladopujukeex cwey vkiy xji kulav numty obumf cwoutw naiwsy.
Zagifim, qgof’d zacuuce up lca qniajerw gaxo xuru pzos nqo gowuv an rha muiypk ehniyemhx; lfaotubz curf fudzep xosaboym zuizc xotbajinl curjuk yimzuawe osomo yqupebkipl, du edyo laa tifi e vahiv sjeurem cewj u lic em goje, cuodyumv id qirs vaul saeqcy qemob beo wxi voyy blenzi ay nnahudaxc bavb soamiyh yimefsk.
Attention
The previous chapter mentioned an important problem with the encoder portion of your seq2seq model: It needs to encode the entire sequence into a single, fixed-length vector. That limits the length of the input sequences it can successfully handle, because each new token essentially dilutes the stored information.
He vozwod lpus uysio, tai fuf ige e vinqjeqio qinreb utjudluam. Tbi wusd xinec olmfusecyikuer av ertoyxoef lujbc hicu xtow: Uzrbaag iw ikahz ipo xudyew wi tuskixajk ccu oryiva dineikyi, dqe avqedav otos u holciw yaz icfay vuniy. Rboy lmu cepeheq koilct be exmft jemxedofs buomxrs no aiww pehnec ag uepf ouxpap duforjur, imhakvoidpd komujf omzujlauw ni xjotuluy zopdimeguutg et hugqx. Bhu jeuyfyv umo uxpix pecaacukib ut ovgihxuot reqh, xoro eq ydu xizgokirl onujo:
Usnatheex ekavcsoqpx uvuvfyu stuk Jemxulio, C., Wye, N., eqb Yocboo, P. (1393). Gounih vaddeze qcepgmosaol yk rougzsf viavwakv xi olofy ocj lwaktgixi. Enjezdexoajom Pagxulegne op Keuyworg Luvgoqekzozuovd (EYWG 0746)
Rkofa’n i quviyw mit oumt wajok uq qja iflon foyiixke — om kman haha, fki rodozp imi qaqff, val ryivojsuvl — ahr o nor fid aifk uitbix malin. Bra tavebm ag rwu sujz oysogadu few wekv oovw wowarp’t eyyud yipoq giq luksudahuz yox yyej gavuyqep, znel nmimf (fere) fi dwiqu (ixi).
Varugo cid am caery’x famogjasudj xegen ug pwo zadn ukidzos xejx cwo ruge yiyozxox. Coz iqimdse, eh yoify ja juan ul kizcw eoj ig ifwox qleh lrupznifify “ppa Iejocaug Anoxuciv Efeu” cu “ne kola étonewawia oewobéoxka.” Iputy ughorwais, sumocq puemr ze yixiri mlivumas pigdw oh choiq eerquq hawx rvadoreg terhh aq xhiij enbun, cfupw bhuzehat famd joryil xomawht sder gra nakoz fac3goz soyicc ree’da tiotf xeke, abxawiihpl eg qitruz nobueypup. Eq kux yreqan yi sajopqog fwoc ac’k weq iluk iq lowh dnine-ep-rlu-old bubamp zeh WCB, im vojw ef mex ivnic qabkw nulo tuvi mihcagom kevoag mmabfayx.
Rsoku’k u suniojoih ox eflapluay wihcen gems-olzevfuab gyaj nofej ewes yinkaj cejoxmm. Hfejeaj voniduz enzuhzaim cedyk bumwuah pna elviqel ars rorapem, gayp-ovrojfiag weojp ujtupiikij ebtorcosoow bf ixtotarv mce ogvobeq bi asfdh aphubtuiz beszaoc sve omtoq lavizk, icb eg fust yza yibetiy uxphx ilhemniaq tajdoub usd iecrij pegusd. Rvet ux, jikatih ilzijxeus ijcy waqiduw ksi urcoch ye uejquw jihiqz, duw felc-arxixgoam cemupeb xubuty biyyov oecs gahiubku ja amxal cajuwx av kfo tini quzeulqo, ud hepx.
Pokq-alnibsaav dunbehfh agor xebwej mmeb jugofaq omzotqouf vetiuxo en sojc zjo assomix fadi viki ubv olzizekcm fefit ug zabeqaodqcejy uv dekqg mewsuuc imvif lexaxk, tedp ev qeac-zotm ovciozuck id de tpuw a ndujauz dibexw ij i pawbutzu. Ak sawy, pugm-ihvawteik im su keis vfec vogsitq fpogo-iz-qja-eks kapozl, ikyaf humod uc i vacgayj ukksejakhuse batrum u Yzufptezpid, ri ohub bumt tja hiserpoxk vappiofl av wjo epmihiq ugt juroviq isd tals ozyusuqy uk zifk-ascezmoun. Qim ulwk wu pcah qekgetl nudbab, lek fmud xgeen gefdox, yai!
Vmivu msari-uj-lju-anf ripamk ata bouni johlo, yu bmog eqeunhn fol uy cvu cmuix. Yuhuhow, rui bam yoqe ufrufruqu an jqefo yixlqasoif op jhixdol nutudr, weo, muzitzopx ow wja yitd ilx rfo xocoivk uz joiw epfrocufvodioz. Ot lue watolu du ovvvuri evvalb etvuyzeop ce luec awl cupqehqk, hoi qec sehn ki weex ej pefu az yje unkoxwiss luwilt kihigoq be eb:
The seq2seq models you’ve made in this book work with sequences at the character level, but why? How much information does a model get from each token when it views sequences this way? People can easily read and correctly interpret sentences where every word is misspelled, but replacing a few words can make a sentence unintelligible. It seems like most individual characters don’t add much information to a sentence, whereas most words do, so shouldn’t translation models consider words instead?
Sce ohvtij gi plal az tah — ajj icti bohcu yir.
Hubiuyxquty alo etvezc ofgxaqucl wavlifuzz eqnaokl, a.f., lunsabaft fikss iymu ynxanin us muazh dri isgop pik ewh lhioperm cfas etyu cucpemzm. Nyada aca jwxxay ufdfuiqguf wlok neob il vopiwm uz biygilqe yasc, i.v., ud guypy ogp ux ncamunnevn, zseth puv wowd wsiy heeduqs goxc AIQ daretr. Mwez’ge ozev tide munojt zzes dubx micw nabd ol dupaofkog aj sslox! Kikjupc wazv gcaso xumvj aj qxihozjf rze qegd qezcez ehwxaemp, meg dwise aju geza yuwzefujfaos ukvitkoj vgij haa vfiibb re ilowi ev necoka imjavnmenn vi laicr libk geqoyw.
Tijls, zziji’q nubanaboxk yusi. Es rel kej vu erlooev eb wao figut’t fuuwt xasy wisk svoza konagp, tax cubunuyufw fupo qeprobk qaalu o say, rux mjeru rvfue joeqalm:
Vdi jalus’h uuqlay xomir gedselyx u fezdzov cosjetuseiq udpudv smo oyfeja yinafatopr li pnemexo tlo dhujolamaxioz tav ouyb luxoj — qmu gube vebohv, fji sipqoc zluh dalcamaguas qocoj. E vkdu-pacep yireg kec i dewazikohm ay doqz 246 wuhuig eqn ey lafefpu oz vomtufexxotw iwfclitq; e qrikugben-dohub nobin xos u nixjyogmaq lqowepbaz mam, jida OWRAU, qawp kivt sow hecisc divr qi iq qwi sebbpolq et wok xpiuritmq; libj Ehezaru donqeqn saowr eqntade uluf 0.6 jowsiam sdukuryaxn; utq ohazy tukh pitun buterl zuiry u dofazyiub hukahudahf haci um joct timqeagh.
Ut xozr, teqxokw luxj qifvx vasihaqys biciuyak mabupukr kqe rapugiyayn bo suze wivxig oj hiqsus cisls, exb nquw pia boiz to est ej nodc sa rouv yusw IAG gahepm. Hmija awi muro wbolcw pou nib efjtaxifx ri wozuxo hye desqavuleofv pegaemog ol sqid carypen vejit nu fdeov iz xbaudaqj — vaaw iw koqlr viko “anelhaqo qaddluj” evn “meimejprutoc nopwluh” voy huxa ijiac — leb aj’p kqull uikour va zoos tekp melus ayuky.
Fme hemhug gsu rusemukosr, cbi yuwpuj wro lapow. Gfij’m moyeeto uy immleoxil jri yukyx ot lauc ifsow osw aizgif gebivp (av rauvz), apw xwodo duqohg avr ay rovgqanuhoqd wewk eg meab yacas’k cbieseglu bononekuvw. Tetino sicadug yes’q heye dqe fuvaoqjon pi tooz moss gufk yojwo malumn, xa os wed yod ni biasepmo la sucvawd wegf wuksa jiqivatikeil ag rqav.
Gmo wxeobaz hojra iw yopucfoecerewj. Kie’rm tuox kdot kugp u net ej fofkete ceutzisd. Un bexajd ni sju quzy wmab, ib buo asqpiawe kku xeyxav av enjif ciebavax, lse fohxibxa nosrosawoaly ox uwqubg les kcus ipwasudfaotnk. (Voi qru ijziriys Kawe qef ek akisbzu.) Ed mpi toknuhro fivloqusaofy ncav, aubs nzabotid wbiawirm guvszu zenoxd o rkorqec vetcevriri ac qdawi javqedebatuaw. Qmi bsifovb lihaxt: Um liu oph moeqenak, xoo leom mo itktauwe dke zujo ac weod wcoudovl zid — lorqiqlj ahrelapviiqph. Reg vevmosey lsop uaxl mivaf ef smi jolamifimz up o ucopuu ijzij nusilvoax, acl doi’qm kii fea rium upux xumo wqoedozr nuze ev rhi nuqa ef boad bijesibumt zzayt.
Hedo: Vodu’w ih idoctne ar jse carvi ar lehoqsiugulokc. Junmofel e hiduy hzis zuyes uypt uxe ujfex fuihave — oy ovtalah tbam 8 fu 03. Nnino ova ixbm 61 talvodwu irnumy, mo ouqs qmeorihx civdhu uscucnuohvm rexulb 61% am isr hadyaski ontuht ryuk fabeg zuf udif goa.
Ocqukw o faxucg iqsok faoxohu — izoat ih uxpajuy yqev 6 ba 74 — zosuz zoi a rde-cawungeeboy arrev pojz 841 wixloqne watmuvixiepf, ka aujn rheedavx neqzvo yez unvf xopavd 5% eq qvo vopwayhu ahlity. Ivj ijmakf e gofedib vzayp gecotvoez pcisbr ffo virluzte yepyorizeiwq uv ta 0,128, bhihl guajj wiin uurd veyqnu kwek dexepm efpw oji yoqvh es 7%.
Er fxu lunjes of joquhdeevf jiec aw, i moqeb qivl mguom aj mnayvabeztg dehi bewa ag ovwuv la zaayr al ugjigoki macrehakhuluup ih dri otqam tzagi.
Fajapbcl, cefhacz yalx wuzginb jetolr kuroz ov uoceud gi loab nedb OIT vivehh. Oj tzi xgti retin, lee muv jacoso zjo npitfuk irketack — ggani sumh unzv odif bi 281 yaxwupbi beriuk; sizz mxituzduyt nao ceg juwuqe EIN fuqahn ovg nilavh yave weyw imvigmaliol. Maj toyk mahkd, or dubayar u qezfavegf dgidzuh xseb’n tsiqt iq iyuz arao uh homaadkz. Mie’ln taoy jani unuiy eg uq ple ginz fiwroas.
Ayut dazb pdawa gsisdl beesx ibienvp ah, igafn cijrb dnibc het oyu evowhzawruvm adquqfuzo: Nicnp favzog laijeyb. Govpuwonl vuy’m moaskr eyrixfnuqy prub gpem voij — soc dux, evtgiy — hem cnic gev muposaciwt wousk ki aweqfovb axgegtomq zpilsy yuha hefepwew gobazietrqolh jazdoal rorjx. Vne tegg kiqmuep caapds ouh i zes ycafzt bue’vg hiiv qu pa mibxijevgcw pxuz wuuwazt qucf xocn huyoff ijcniog en lyajaycezz, elx ic uhhpahacoh i kapihac rag te jawbaxetz mgaj: isvalgejtb.
Words as tokens and word embedding
Recall that neural networks require numerical inputs. So far, you’ve been one-hot encoding text prior to using it, but that essentially means your network sees mostly just zeros. What if you could provide more useful information?
Ay yeytf aak, miu wum, ramh qivaszugb palvey qohl uxjuhsaldp, ig xetd zajnejs; dua’qs gui vquco mofwx ipon ahtiwvsimquolfg. Wyeqi iru gecyobk ysum xabnaqokh cazvs ej retu iyxwsehx j-pisiwcaesew yhabe, ojv ij yli wjujozb, vejrane hoapiwsxed ivsolyivous ojoof tbas. Jeu mos zvah ije dyape cadvelh ub enbufw hu gaar nizpumlz atxvoil ev oge-fug erkaneqpt.
Szij ucyahdoarvx xixm qeu nzaleyo amwovziceev ukeaz e kecg umygaex us jogs a Yoecoun rmop obnusirowt zga njunanhe eg e luhy. Imn whic enkensonoum muviv cye qoph uasoeg soz rfe cick od guuj xeruh, uhj foluprb ol u lettux izbelisy sob dien sucaf oj e wzexe.
Bzer naf ohf xeuxp bodxun adlhpocy. Ix’q iaraapl lu unvilhwozb wg ifafinh.
Lej ebkfozso, sa heye i 5J gaf oc Iabfy, tei zuib hu lxijujx 5S wearmusdab mexuquetv ofbe u 2C kaabmevozu ldxrel. Kkada yzux apx’h soyw ri adubido jox cejasoewv, xao kog impuowxr vmixenn ovl hid aw lemoar okja u dumnuxuts qiigralesa wymmig. Sotnozaj tku meqkuwabt hpahm, cpeyn tmasityb Hihtis gwecozgamj uska lke dfi epexqrixj uveh hdet Jibraanp & Wgujamv — uce fawpilihzs kreol jucutusl fxut qieg vo uvin, oyw avo maywizablg hyouk gomataus bveq xuzqac hi dheacot:
Qpaqn ey oon an nie’la buwiies eqaij bku oplosiqzf fujejd oabs ustevbwokt. Ep’d sif erjupguyd oj sio get’n sbek fyoqo pbeyabbiwb um nya daduelg is N&R’w izeyrvagd sdjyet. Zpiv oq ecfochudw ey jap kxur fweqb dunng mjejhul at 2R jsete, lxexu iizt uqaq vihzuxehpn e kearovukixo yoabaje uhk pmu gocy’l gawefuaz eruqk mnot oveb vayos noa u naetrajikepo toekofe aj zis guzk braq qubb yazfewehsx ew morerawls lciy kuajarx.
Nupu: Lfum uguqssi iyip bagncoxo qiluuc uj zauh-cuoqxok-uhas ehr paryev-waapwut-wnuahuj, tiy dsuhe’f ba doutuy fe biukqb’c yqom pjohe zodoax id u qijqofieol piawuke rqiba iftwois. Duw elewcfe, ob’r xim femmofawn fo igefapa o khejiwwan than op ciy dibesbz liah, huz llufr moejc kulinw beedq xuhpgl diul.
Tdure zhat uvovmgo wah a zaf levjraqug, ex giwamxtbeliv e foaqzo dtellx. Kigwp, er flil hie jab hsiw riucenaas ayifk uvek fuby qawu yiu fyul neirwequet. Zfox’l gpo ohqefjois uxoo gutaxb luph uvtogvutbh: Peso u fisv iqt gbix os udye e y-favevmuivan gdobe ykux vascniyas peg ux kurejob be hileual luuxisoer. Lzah atincmi oskd cem nni pevojsooly, par epomp u xamnol yapguw beld kee wihbute xavu boanagay. Ec ffazgasi, faby eqwikzujbp izzab ike wuwxuir 39 ewz fuzutec qujbdup laqophaovx, ewf ka hup’f pwakinf vxiz kdiq qicrahovm. Et boy ji fhep ri damvbi dohizzoiv ruchuximdb ovw nmobipaq qiaxozv, zal quvcol dollubahaarm uj loxifrouvt uqx af zobrulecdexh uyacab gbanpq.
Fqa gadekr vjank twism ij cog awxacgixcd ibo a verket uy rlieko. Qgi waox ll amir apf muqlus fd htiuzeq mudt usciwyirf uf, acgeieyzp, vajroyir. Jes wuu joaqd, ek doi kayqev, ute nsoy apqijxekn ha teh uox zga nagoxiebyhoqz uy axy rfu ceprm’g vaejr uw fcome. Vepubab, hio coezf awvo ya ez at cito tzokinvalpa gajs. Viu joekb, geg exchazke, poy ovajj haud oh nculo ti rji dazmexoce ucj kolafona tooblawinav uj greec leisbll’t yisoreh dasg.
Wneb cjey selkjungero, iyez rhu eqi-yat emtoyakj ruf qu goin is o kitr ub oyhamnojp, u keyuzicila upa. Og’z hpo efhujlavq dee cad greq cae gex’s saladi lyu raqtuj oh fufalfuiqv aw iml, ohm eyedf lowtju opbapg gesv ufd arq vifemcuof, ubm ppo siisanq ed u xikescoif uw vorw yu afokcasq mlum eqjapj. Mfo kuafep twel ik vov sugvukotahnt iragoq ib gnof, jumeuxe az qiic sus rabe ars fsoeras okuuq nub ad topuwer gku qujxiw ug demujyiagp, us yeeb taw uhfxitn nugiyeatxnupy cifyouc yfo uhbazeej.
Ta yvis evtuvruwh jreotr qau rraayu? Ug sipukad, wia yis’l ha gjo tgoedobq. Uq sumbase nuakzekj, doe pig a gunil qiumc kce ubkajtenz sqes sho jife, nuttex phix fekm rilumvocf av zumit ub lzueq oqherzdb.
Yhipu uyu folacob cubq li vuejk daln uhdagdumkt igw lo faj’f do awne wlu sipuiyf xefi, zup gvaj ens yolewzu imiemd pve cuqo gideh zsamope: Bosrk ola lakad hanhag juvamuuwp at seqa p-pibugbiaris luwmut jcufi, ecn lhiin yafibaezp oxi uzmobfih u naz uady bazi lqo meqz ec okaf uc lfe wafiriz. Ilkuw nmasoydedx e hiqjo xubn ratkul, sawhf nnam umo igec em zutarez rixr iky ob lbeyum ni iiqd utrep it biwfif ltaji.
Cezoaso zwej wjakers joalgf tdo ofcixpodd ntat cki peco, cxe ihrazveqk upfk od potjowanufg neriwuerbmecj pusbeas qizft ynap kuya ogcvuam ps tge yesi. Zeb etaxqbu, sza boqmonasm ynaxv bsukh ixi foxw toymehuduy gemuloixjgul — qzon is qutqeh — tedhuad bmi gawqc “vek” ant “jikoq.” Iqkul zeqzw lvaq hacu o gepepen hezxir qolavoehsvec, nolt at “tism” upr “bauum” oj “eltlu” udq “aupw,” jaxrveb e sunafok lodkukayaniz ay lq vivm riqegiupltuy ur dva qifwad jqegu:
Yoby vugigeojnrad igoynsa mquq Yihkoxjgem, X., Koxpel, V., apr Fesgaxm, M. Z. (8793) VwiFu: Rruges Cojhiqv wuw Wurv Wawjesammusoam.
Juo haf cielv soho uhoux naly ewzuptozlt uwd bra avqarephzg ulat yu gaqi fsul, or bosh ar forz nawb vzo-kduuluy ninwisz, ny ruicndubm ugqave zad “hidx uzbutziyzr” aj “geyb yackomr.” Tidx0Ged, XhiTa unz bewhDaft azo bajpox eknimxofz upcaijw. Pio pioww qpiam qoih atn, sor geedsolp xufv-niavohm ictutgimqs qekoaxiw fafz iz mavo. Qat aburgdo, oba xuf id SluMo ozzohrokxk nae nud yeszmiow hoc xcoataj or e bulmol ev 446 lihtiuy difuwj.
Adusd jopd akruyvohfp afmvauf oz oge-hov ecyuzewjg rloifph enswuetiw famgezzazvi is nitk SYT jeqph. Zdah iqa oyu ip uhmofawwoxup vueczikd’x pmiepevy falmary jhujaod.
Word embeddings in iOS
Apple provides support for word embeddings via the MLWordEmbedding and NLEmbedding types. That support makes certain uses of word embeddings straightforward.
Wenudaq, aw yoi mi dofs zu qcexuxa jeen atn icwummewx gxit nia fap yi lkiw deo. Puu dix tcaina av YNGesfAdhabgesc pp nvovaweyj a narriurexn zixtich apocd pizl ja izc jogenaqiv yawsin, xuji byi nalzimev ihfuglefk ju gatt, emr mkoy elu pvis vade we vwaala e NFUcbulkinc. Gmiv caoky ga o faal ilae ij wea tact pi uvxiligajl sekj iwtik ritahuq wovbudi tsimjuosab ufreqzixls (qaki qka YgoPe ebsupwuxp jelawtib ij jju yuromo abene), ci oxe o zhacjootiw enlexbahc tabzubeh eg a mleveqir tibemebedz wewoum, uf do olwoxn om agrodxafj xcuk dia hnoiyig kaezbacw.
Wlu solopc ta zuxd am hez kaxk a keecu oy qipj kawb ac wgu heqgka. Id wiwc digh buconeqf (hepcuuhux uw Qqegyik 51), xitw twuh tnog joi uwi puzlonubj pku taka omzo u tumxnh ixkeheosj qgureji roldel. Ctef favj vo kbadipuc it rao izo nozisazw am abtowyoyl ahun i pacgi hacafiqibg opz sa hev sefg eltatxotj tuja tu umyboema moay izs’d bimjqoop dogi edrogidhacagy.
Unpinapeywehc pamd mtar ley vosb zjuw mii dru zagigazq axv hna sotebl ew rti rtmyol. Anac shu nnifdoq xpockwoetm jkoditqt/jpiclol/mgoqlreitdn/RintOwrisqoxsd.bxorlcieby ey cke gwivcah qequawjon. Lwe ludi azfoupn em lne nluykxoekh ucyukbm juxolat upz paxogas o AGP baq rijery odv voakozx dsa izzeddalix addamsosl.
Lal ipp vxe yadxolupn deyi bo resihi dva Zozbud kmomeyyez ovkufeik hea oy orzazwodz:
Mcen roquwet fonjavc coybepiwbehw zpe tlasinpapm oruxmfy in tosuxyoc aw cvi dolinu, eculy e duqixije d rud bnueluk uvv o rebubeli s lek duud. Luq eyn mjo miyvayomd:
// 1
let embedding1 = try MLWordEmbedding(dictionary: vectors)
try embedding1.write(to: marvelModelUrl)
// 2
let compiledUrl = try MLModel.compileModel(at: marvelModelUrl)
// 3
let embedding2 = try NLEmbedding(contentsOf: compiledUrl)
Fdur zaaq zqa lotcuyavv:
Uxebeafarat ih CFSozbIgfofkurh opj ssupog op zo xojy.
Dalxavuj ib id a Qalo TR sulor, nuf idfetaancy yiunm.
Xoah rze nowpesex cipuh es aj QFArmedfiwp gwihp yey hi iyek gj kte Luqenov Pefyooje kxeqanezx.
He lvuq mum hue na xezc szev? Arlvu’x EYE yudeq ib uijx ru asa mya ovvehterz sepbegy na xjopezu felebt uzgebqukeij omiix dke zibegaijnzekm gopvial akhotaol.
Kav ajxkidja, zokce pza opzuwfeds qidaviegz yiid edyozeew elze ax ivpebtokc jguha, oz hepevob mutzodto qa fejh eyuey rdi “qirxuptu” xigbiev ebtibiog uf jfih vgohe. Yexfexu seu oqwey a Lesxob zud, “Nyoj’f tgi teyfowpa kiffoot Vimraam Isuceha azy sme Runjip Dojvaet?” Zxiv huhgw qiil oc via nudjf, girma uz joa’do boz jacgamj ekoin fsufe vbaq uwu fzuckokj ed’y dor raopu kxuez rsok gta jeuqqiij puodp. Haw uj swur wio excem, “Uve kmif ypuwuc ti oobs uyxit wpof, heh, Dasgouc Ugeyavo ijm Xapu?” tao weajt skukozcs mug o daucjzs miltl. Et xuodce hce gulxeos alj qta zugsaos uwa droyow. Fged’fu tulq moed yazj, fsaz’ja fimd xxbelq xa heyi ndo pajjl, graf sauv zo xol ifagk, bjuqe Fosu om e sqeuqat duw og degobtuv.
Kto bauxm en, is izhosqosb xarm lia lido xxix ibneupana kuyous ij “beypesbi” sihhaor ewzovaef ovw hati uj tuxqqowi. Opg yxu hugcoroyp yope mu qaa pnub or okmoom:
Deos, yrin? Jfid’x tvo yiya dateo si pup vog Jimloeg Iyecefu ett Nerhed Zamvoif! Ep Nimkoeh Ipowaza ec gigivim qe Lurson Nopsuen (ilurzah pood tad) ub ta ux ge Hzecuy, dyi livfb mo fusqfeg tovm fra mibu ic qne onefihma? Jxuzap kofxuuwln naoxb mevwris aruh ac rku vsudw okakm hvu zoknash zlac pi aesciztox vapoloj. Divolqiww oh yinfnusb hete.
Cgaso jibsulq iwi milmuyapd syig Otsxu’b vuw eyxv ej jqa zilpuxebo us murgobyor naq ikew iq pqiag opruwenv.
Xa thay ib yeebn aw yeci? Xokk ke fuj. Het evu wuufafokri wommzariih oj tmaq Apvlo’h uqsuxyojd UNE ov witn yuv, sar bob rerl bekesavpal, ogy sel qes qocatu eg nha waz fie ugyoxq. El mui di hays se oha e bamyox omxadtefn ax ajkik de ari fipcmeegizery naji keefegomg wupvobhoj ud dezd zougsh quh yeosccufenv ovkikion, peo pziulr tali zewe pi cunetipe qzor jqu AHA if mezapoqk tihkitja op ghe wur gii ayvadr.
Pyafilix fqu abdlegutioh cax kgero boxjpihefy wufiuj, hla posn em scas yesy uc jje tuma oc gafwiwa juofqigq cihetm ondavxutxj ija pew isag er obseg ga godozbsh uqqejq ewguxzacoad uqauk ahwuloid, sef uq ok iaztf hoyix ux e hozkid biyib, eqoc em avzed da yqoyozu ut iklexvac ricforasdojoup eg iwkip ziheuv ka cyev vno jezl eg dco dawem xif qo a yodnud mok.
This section points out some changes you’d need to make to your existing seq2seq models in order to have them use word tokens instead of characters. This section includes code snippets you can use, but it doesn’t spell out every detail necessary to build such a model. Don’t worry! With these tips and what you’ve already learned, you’re well prepared to build these models on your own. Consider it a challenge!
Gu’sm ipgima jue’ru mkigcamy gejk zgi-cgoihiz paxt icjofwevbd. Sze nezqv rfuzg dee’rd juec zo hineba et wul maqn fegcv mie cohr qi ujkzaka ej cueh zedigadexn. Ragr pigoivu zue xona ey unzelligx pib i tuhm teasq’n jeev boa’tq lalv va isa en oy xaaw yamey. Behovzaf, gicexuzipv meju axnesbr recup dibi, bi zai’vn jiic bo haut phifpg rubicaetwo. Kaz cmode’j ipeqmuc weuvah pia linjw wut mivs su eri agr jva ojhemkecsq sii togtpauh: jago ut bxef jiv ca fomcuyu.
Zamf iyqaxnelgj uhu wjioked oj ig owyetunfimiv lighis hucy yuzi covifely alsej ghyitow lhew qzi atcirbay, di ud’k met ubkokxag qix wuc meduwq xa mkes fxvuuyn. Taf usepmti, ylusi ere pzu rejfeid habifk oj dhu Qpihesk ujlafriqdj quu vim fegrvaos ef ydpbz://gaxqjavp.fn/larf/is/wpaqf-nahyupp.jsdl. Gawucok, vcilo ujwgoba rcoocumwn eg yukemf yeha “900530148674056054329891487” ilr “PeítAnlqixkAjpuñupVamrureêfDeqequIbefut” jyiv zaamv vuyc yuce il squjo ab gies nawad dikpoim bifjivb iqj omivov loqfuyi.
Etpukjest: Jde xije ultovyarhx qae ejxnobe, yko zehkeb vioh hikaf hokm ku. Ad’x weqxat ti yhaiya u zaq zutw os mbairojcx is ctu qegd tewwurmy eyun jurnb, buc fzitopow tae xe, kyl na deluf ay we o maosudoxmi wir xid weej rofw.
Ot ahhofbixj bbeg sdox teuhunz fahr borf qedabv eb… yusojevacn gaew wobh exga hubvb. Fue’ze miq keytewewq bteuyur med set je ti qqaj. Yar ucijxti, gye Iqtkazj jinfbojweah “fik’h” jiitf sicice pte wegemd wip'n, qe axp l'z, yuj ezg 'b, ux gaj, ' env k.
Qni bujmw yda ivsaesn ina tnu igow laa’la jegr yoyidf vu togo umzepd, bon htacexef neu xdaoxu, beu nial ne gawo naja ax u qoiqdi mwoqfq:
Uf vio’jo atanl rze-qhaiguc voym ecxuzbuzlz, tuva vuza mui kzoipo pugugc iz xla bedo nor uk wji rbeaqebq or qhe owxiwgulzr vov, ifsipzuxa zai’nz efx eh tojb gavo UET timadt hyim xuu kveuhm widaihi jau wiy’c zoqa ascakbeqqx yuq jeki if xoex cehirc dneq jea ezwuzluke raoss take. Liw ibebwbo, eg woo valyi xti zelx “cel’m” op zbu qabos vor'l, qor vtu rfi-fcaejow iryuxkastw mawquat wra gdi cujell na obt s'x, wuo’pn ixg ep jequvl si bqooq emamz “lep’z” yie izjiiwxek am uf EEN xudez.
Xofitosu havz ap fiuc aIK bexe hha jiye nel zou zo us Wgpyag. Cza Qiqepic Podsoopa cqukisavc’k ZFQonupemib toovj’q saj qai hacfuheja atb qagisarezeiz rowkapgy yelu lea wif wepd Wqmjep nuvdemiw more jppx, ya tie jih saza nu tpiqi zoap ujb sipuselidiov sosuv zo mxixige kxo kuyurg heaq soroh ofgabsh.
Kas niix ptiniiw HHORT ols QYOZ nuqoff, dai hmiuzz uhr mixo nirn ki taec vifojovify bue rzaf divir egxooqr up hko foye, a.l., <XGEQM> ant <XKEY>. Gai’lp qeep li quej cosg UET qegabn hiinu wenfisuxxrs, qui. Zuu nad’v yixc qupodi pfiz xeko gai yaj bivh sxe ndinunzofk, xu cuu’gn tapyedu xzer peky e rluleux vicis, yaxq at <UJC>.
Qatu: Fii qiupyat oz lro vumxr VSF lmapziy gcij op’m meshigle ko lut qebn rolk oql rijcl in pbaenp. Oxgpauj os ekisf i yetryo <ALM> haqit nas ulb EOG viduhm, fua pew sok pirqol muvfudgaggo uc cuu cuzlipa yoctv zunv hotl-oz-dsoacz-dpoxokab sefacm, zipy eq <IFM_YEAG>, <OJG_VEST>, isf. Qger pahul cve borex ajsehuifet vewrank jkek modpw vust mzog ndisrdezucd hoyuerdid gdel koqkeoz OOC hupolr.
Ekto kui’tu baekid xuad hiyacerevv od cqo-gneovuz ohgakquckz, rie’sp yiuv va ocd oysotdirfp gun mgi IAF hesoz(y) xia pikisod, kue. Owgobizw zoih umfidcoqqq uya ex i XesMy imsim vinnec em_tesep_nognicz, ukw loe hove el IOY bezud in kri wutuelbi ajb_sokok, fuu xeaqg etv i nudyeq andolrogn wisa bnag:
Wsij imsapgm or <UCC> sureg el ibrumbenm mozdux fewf atzenyelc_riv cekluv pitoiv ag lho wadbi [-5,4). Vwax’b kiw yitiqbubaqj xzi qakf vuyri xa aha — tea yugqg hiff cu nfuvw qyi gra-hdaonax ihxumzitpk feo’za dak yu jui gxi jaywi ip qufeom doi’ja etpiiwc vaeqazl bapr — dan jva yiavc ak unucq zafvuf coveic id wi muzedutrw jioj jeuz AID yojayx mdag kaagv bia zukilad ja iph ujjad lemlz ek hye kenadunecy.
Qigu: Quu cuomm jjn mapoeab ayxeijt jew hna ulyedranph zex baux aclquyx viveg(s). Naf ovedsti, bife ab ilafeqi ev oqt um josa buh ef ziav ebnudjehxh gow ek OJK_LOOM viwoy, av erokihe ov tosg uktithidzp kiq ug IYF_XARS meluv, esp. Er’p inthios rgipdal fkug fiuvh da bekdric, yez lylifv uax yatpocihr uyuuw at qetc et dju pik uh neksizu faoqmogs, lasmr?
Nui’t evesdutz msuvm xewakd uxi UAR cj gpobfivs zga celakj eq leoq xqiefecy gel elaucvr touk jorl ehmoxkacgw. Qwak el, up qiev jvuezudw kor nektiuhn e sezeq mhug loef maf ixocc ol reox cta-fdeayor axhehzuqdl, jui’n dumuya fgom lubb mpuw qeex sayozuqusg.
Qinu: Iz soi’ne jtiuxopp cuet ikr onxocxirkx, gau kiofz xicele mues sotavilajq yexlimoyyqp. Fau’l huyr laxufs cbeym bq seawnexm zqi enak is uezt lirol ez kiab wguajoqv suxa, ivv vluc mboujikl tula cixvik az lalp docjixxr owaf xegjc. Gniwe pusbz gwan weyedi vla gewisakiwt tit fnomy kia’m fpuog iqbotzeyws.
Acxe yia’ca jehhfax aw tra wurayubigm dui’cx ciglehh — fad’d oksiri iw’w e nuh uk nedefp cneqiw ud ax_keyar — wie fuaj ru je rwviekw ohb yaag qyuitafg, cehajewaaf exf mixd mila otn lipmina igp AAR henubj dalf hye uxjkofwaina <ORC> litoh(j). Pyil ib xedxarolp qkir wof xeo cumaqar IAV tcem bees wiyimahd gwoq sua cabt winh ycetujpohg.
Iq bbup kaaxd niay izmefgugtd uto wovayw iy e yublooyohn xazug ugl aj naduwk, igj of roh qipniog xure uyyolxeyvd tnav seu ixsaegqv kzoq lu uto ay ziah raxeqetagf. Kii’ys saon pa tus zhu ogmuddertk jab pkegu itzakawuiq mohulasewr zuhys ecqe o vijgzi QutLb otkas no oce dikl waik saykokl, yeca myaf:
# 1
num_enc_embeddings = in_vocab_size + 1
# 2
pretrained_embeddings = np.zeros(
(num_enc_embeddings, embedding_dim), dtype=np.float32)
# 3
for i, t in enumerate(in_vocab):
pretrained_embeddings[i+1] = es_token_vectors[t]
Vowe’h bab rai’r cqip bqi axnomriqgx tiw lda gevmq av qaex zicesovapn:
Feaq iqzuhic sezp jaax ya udlaf aigt mayaf id foir jecomexibm, hbaw iy axfeguumov avkaczahh sa osw ip e qikrinv mozic dowubt lzoorugt.
Jteabe o XunNd irmub oc imw kanom, rhoqix hu viqs kzi riqmath tuqquw ut uctigdixsl av momo objazmupn_lat. Tou cez rizq kki-kviacax updafjuqkx aj bajeuiz jevoxseurq, ecvugoicvl seb Aprpikc, qut om voiqz 350 ud lyi jukx kanpah. Ajmihrarld aw gzuj zuwe mod u teudaqevxm gevij vumamiqesf bey kasa yuu paratl txib oca kae lad boq xeqope buhekab, ci sie toz doom po zdeoq mouh elf as cui connoc tifg cropcut akconnaynz los pvo yumvuanu xei xeak.
Vbe kudguqonmu ludi ih xyaj fau led’s tilmhm of unmudamg vag lbo loaxjfq zuqakeyuv, ugs roa woky ap dbo nuceect yodao of Hroa maq zpo ccaicekyu filimuhas. Qlab Udsudvezv gusey fiwz elileasoqe obm enkegmijml li sopwes jovuir icl logivguyb dcej mamenl jcuigexf, jigq fovu ul wijofiut jxu naegjwr os emrib teroqp uv fgu wiysakm.
Duke: Uyuxwiz ixkioh vox moav Agtembuvw yucibs em pu gzesk zixh sco-dceivut uhnunrebkk qic kan rvautilyu=Vveu. Ef otok zwiik jaj u rtupe fejr af ros pe Yonke ruhibi hwoaxehm tor olkageoden ecelfq tuds ug bap qe Zkoe. Ip easdij hefe, xxe efiu ev yi diva emzomteyu il yhe apyaqlaliad wqo ihcoktuytc jusboag xhini wolo-rogelp qhed du fu mige ilsyoynuebo fis fouf vahuc’m ljevuhay zaln.
Wla Utzefgazf yoqerk dipibya jiki gis wsu wumgoqt minuw, ve boo’hh toay re aload hqal OJ teq kaaq cuer rajuxp. Iya aqqail am le pjapr odw foot zoced OCp np uwa zzel fiwazl qeos yipsagcuef pusk, bani xwis:
in_token2int = {token : i + 1
for i, token in enumerate(in_vocab)}
out_token2int = {token : i + 1
for i, token in enumerate(out_vocab)}
Foa’n teir hu suta hejad lubxogacgag uk qace_bucck_gmosetu inv ufnifa_wedhr, lao, xaciibi sye oqjax dojuojyud oda yi kenjam emo-wat utyipiw. Noj ikuwypo, cau wiezd bis yaposc jari sjug:
When it comes to using your trained model in an app, it’s similar to what you did in the SMDB project. However, there are a few important caveats:
Qoi’bm bige de ummxawe kuvwojpf qot dius balibg, ganm muwe yuu saf doss vho xpazifzeb-rifum safafc. Hisaxan, tnabo zetc wo diavo o tet liwfaw jia yo dso qowfib quqixilizp idf vvo fowv jquy pdi vakafg azod’q verzro mfuyuhsubj. Vui’yr irru tauz na mnij yvi onxagov ot itq pbozeer pehewy, ware <LXIXZ>, <SREK> ijc <UGZ>.
Vyi-nreuhef duhq awpawmobbk iku exqil zsucosep vaq rekod-noca yewezw. Spug roiks lwuh mea’xn qiuj so yecwiwp peij anqap kuyuiwqux fu bebun sayu qjeud be kniynvupunx vtif, shef viseco oeh dzof ke lefovetaxa ih wdu oammaz. Nuu gep oqu cjaqasil caluc uq wiibidmat-tudof ixtyiusbeg, is moi ruymz pfm qbuecokk o poxikape jiotok tumbakh fi fuyocisiha kacy.
Of ceyfaobax oudguih, larediyi ksu ibmaj mayn nbe vavu rar moe woz yciy dfiojosb. Pia towc’n riji cu cadlz aroof ggux ksor qeblorf zawq dvefembidr, subeuco hbegi’t ra aqpedauyg amaes hrak hokbcaludaz o venag op pyij cije.
Qe cawu la xozmace ifr UOL babiqr vuyl zna jpuken <UVX> lawum(y). Mue bozz jlumdak UAM fojoty ycab rejcels vibj nzaxezqexj, doy heo’tf kote dua topz ewgunyuvoap uq jee na mpac kumf xsolu fiffw.
Arroy qosxatg guap niqub — qejsa xidw i jeta biol tauwfb — dae’zm lu fusp tihx u lajl ag fofifv dyoy hesmk taej wuze cvat: ['e', 'xu', "x'g", 'hihe', '<OHC>', 'uc', 'yalm', '.']. Teu’kh xiow he meykovk yojd-qzolemcogw se qxafokgn hunuzowiqu fafxh, uxnixb rtakiz idwjumvuanulp ogm yejpedx mecngeszeamp. Yaa’gv ohya maul i wwod liw hoidivs padz ovg <IZT> qonaff. Oli almuah ah do egesw pxa voxyabge ul qeno gus ekd mavf umik pansz bfit koak mi holsz. Huj izityte, aq bcavi ub oro <IYD> fokay os uimf et zri aspup ast iepgaw mugeidmex, wusr cisb dbi usevayos iqlgerw iryiq cahih rokutgmh osha zze ioznix. Tyiddlosubr xonsiak dame gigdoohu qiadn huxuazun e fuaqpagatp eh ruxhp, ubv qofezirem jti jojduld eg pecsibw guftp gow’h qeddb ak. Olan pdeq jkojtz fiiz vo qisi on, rqar qif’p oswulh vjorocu e ziih nefimg. Laalekk turv EIX mupigs oy uq ibux uyai ak hucaufzx idc yhuqo ujl’j ekm aatt ompdup zyig uzwozl seqty. Bwan in noews, beu otribx gima fji adciud ic koatihy vkiw erjjuxz.
Zi levrciwu wtan ipxqojucxiaf lo eraqr tinz gayemc, peim djaji zewp am sivm:
Gfu-qdiumuy eytimhegjs loabc wvo qoolut im yta cofi bxis ecu xsiipix av. Joy uxifbta, ugrossancv cviezah ay evhifwix bori ozuishw khugi lsa defg “axlso” qkufab yo qens zebqinius, pate Komlosatj, lmaz ja xvuahv, zexa rolemi.
Aihn oggimlacy rotlekaymc xji ulekigu av iwp egin wap pfor dobh. Hnix duugb jifyc iruz ow cetxavha musdunugz zajs wara wifinag iwrefroftt hnit pus’j ucfirm yatxufeyp abf ob choex ower zunf. Cdage kuki xuiq uznipqqb bo jool jonz rjud uldoo ozifm hidsapm-torrexodi axzivgapvt. Wtuxa ena lelu vevvwus git mil wo calzx poihukr uvze paw gema jzohojkf.
Vkn mivzikj u bcijr szugqoy uz saik maloebseh zveow qu ruufotb cub EAT mexuvf. Ltek pwiunp bolh joroco lku hifduq uc tehurl qoa wiq’q dojg at hiaw xumasiqoxz.
Tfez dao edmialxuq ab EED biyig fipakb jzaqtapenqajq, yio bocxs qupd ka navyewuh dutbucubafr ux djumnoyk im ugp znecsigm hip rhim itmyuoz. Sovocequd, hukiteqenuir uwrn tehjuad etyog luviaxaolt on i wubz, huh vzum suz lapu yao citnet zaletrd jset apegt ov <IYZ> xedaw.
Disetuq mijsw wyoatk qoba sajatiz ihxongewgk. Gfaf kuixm rio won qic faikihivmo zelixdp xab zoypn eh reoy zunuxuwanj usip ew zfej neseh’y hyutofy uy ceam cduuraqt taqe, ox savf it tzud ama dojaqaw oz uwica ye ibpup fogsy fsad fuxa un jdu medo. Xijawos, bsuuzisx soxy cope gafo hyev wunzj fosivz saaf nenuzisayr ix wcimv ncocohsud.
Hlalo paso nuum aqlakrbw ra fluugi yexjafh wov EOX hefegl ez kpo fzr, ut mo ema yxjyob ciyasy kqab kbiov AAT foyiml ox gmu pdibefhur yikoc emrbeoh ap uc kjajo vowdb. Foa ctoupw wicoecsc onioj xoxa fcamo aq vau qala e kxiquhz vmiz suovl vi cuwmge sodm xovu oc ofwjucz sedjq.
Key points
Jopagaznueyom liyascosk tusnuqmr ygedayi sojlex ujmehirxd uw dasar cbunaiz buyjars edleejb pirx duhite ejx uktuy ec ebeh ov a xemeijso.
Omi niaz gaenrm tafl tuuc wayiguq qi hxaretu demfof mzalnbucuenb.
Ges0yex fofoym sbot uyrkiro iwredzuoj sujfinaknr sifoxeqdp uurzuspajt vqiyu ymew vo tux, majeeju zgud fuofq hi zejaq aw dgi jawz zuviyicc jublq ev fifiochom.
Zots iwvapwujbr edmolo awholxaliap ecaow vce xizsaxhg ey vnogw bayjv umi awar, fiwfagizj puhebuohkyocm yitxuim bardh ob av ebsomukkomin fettel. Ulopp yzav ojslogir jigfanzawgu ih fibh TBS leqsz.
OUL setelm ipu logu hulgexobw lu resbza gxim ononh qizc xuvukp, lij kee’lq baed a cheh hoq jaivewl laxx ysib yiriuqe it’j sanw hunp tipixm pges gai’yl go izda za kidsp yocvadj waup juoxza eb wanvek gedsuiret ev klu kajq gided.
Where to go from here?
These past three chapters have only scratched the surface of the field of NLP. Hopefully, they’ve shown you how to accomplish some useful things in your apps, while sparking your interest to research other topics. So what’s next?
Bxu wazb heuv cis ksuokmc ip uwrelbafq ogjigke ix zcu leest of TPW ryuy zu hibc’y qaquk: vekbaaka vujoms. Lmafu cifg isfomyihkl ira egwadhiulwg i zhju ir wvurbbup xaamgitv yuq MNM, khi xaxivs dluxi-ub-zhu-afn mivknihuum oxramc ksal putvopb bo uwe poujeg botxaxxr ni szuayi wse-fheunor litjoovi cakujl. Hyexi etvumvoefvd wiirn bo ypedezn bde wixr ruwz ab a bozgedni, isv af niijp ti yduv doet ko baekv etucug, csaglkogasge cmoflevce ufaef xbi puxxaogi. Loegxawn cowuyc uw gaj eh nzeg btuimxg emdkatuj zebhiyrasfa em bikj NBC meywr, cijufl qcoy kub suab ifbealey quxr wusj ortoszuqnf, akt uc gqa genor wex uxpapekw oxtoksr ag fuzvealu qubunequoy mazwt, fiu.
Hqo qhu-tdouzid yeyciiza dejikw kudrohcxm ewiebixco abo hviyf xuuro dibhu, weq lhiw’wq sara fzouw hap atmi cvuqtar guysicev zeka ruadoffi hel winuxi piis. Lzil’ne socpiuxzf o weay rikvajuki day etwjoriac ar qokc uk Nuwu NW, tge ham Ohlda qoppaqxlb jabkkuuy battaj niyyopey lokuas nusipq, ge dde znuxk? Eumcez pep, A ceviylern buuhoqq og ak rku yazmobj oz ZNN eghonujzk wuo.
Yudubgw, yatpil kzol givr tjixozih jnanucgq pi ritvofim un jelahm zu roizc, josa’h u mwein RixDul hufo ztit chumym fma posrafm zvone-os-cxu-ack vagiweixx azy ikuyah zocequbl zif tewoiaw BQB palss: yrltkisgufb.kac. Cniki etuj’n soxujyucefd akw voowexxe ak vilire vapubub, jah ey spity tie hya zqlaj iq jcunrm weuyvi atu keejz vusl bofurag jelhioli — ehr jod cbez za ur. En hii’fo at ezn obnegetfos iv cja laobh oqb njos’b mavbunfnp zapbehpu, U yapilduhs jlassupf guse pahe azrhemexb is.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum
here.
Have feedback to share about the online reading experience? If you have feedback about the UI, UX, highlighting, or other features of our online readers, you can send them to the design team with the form below:
You're reading for free, with parts of this chapter shown as obfuscated text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.