RIPE 82

Daily Archives

Routing Working Group session
.
RIPE 82 HTTP 20 May 2021
.
14:30 (UTC + 2)
.


IGNAS BAGDONAS: Hello everyone. And welcome to the Routing Working Group.
.
Another virtual RIPE meeting and another virtual Routing Working Group in that meeting. Let's see what we have here for the administrative part.
.
Housekeeping is exactly the same as it is usual for the virtual meetings. Please mind the microphone etiquette, the chat in particular is used only for talks between the participants and not for asking the questions.
.
Please use the 'question' tool for that and we will read that out to the presenter.
.
The minutes for the past meeting have been posted. There have been no comments on them. If you are curious and did not participate last time, take a look there and if you have something to comment on, please do before the end of the meeting.
.
Afterwards, there will they will be finalise and archived.
.
So, for the agenda part, we have four presentations this time, we have a quite packed agenda, therefore we will move straight into that part. Job.

JOB SNIJDERS: I would like to invite Emile to connect to the system and launch their presentation to give us an update on RIPE's routing information service.

EMILE ABEN: Hi. I hope I'm not muted, and give me a couple of seconds to share the pre‑loaded slides.
.
And we have slides!
.
So, I work at the RIPE NCC, and I am going to do an update on our routing information system, which we typically do in the Routing Working Group.
.
For those of you who don't know RIS, it's Routing Information Service. We ‑‑ our goal is to provide insights in the BGP routing system, and we have been do think that since 1999. We collect data on the control plane. And there are some numbers there. 1,300 BGP sessions. It's an important data source. It's ‑‑ for a lot of net ops tools, we are our own RIPEstat, but also external tools, like bgp.he.net, BGPalerter, that's maintained by Massimo at the NTT currently, so these are all things that sort of like use this to shed light on what's actually going on in the BGP routing system outside of your own networks.
.
What we do is we create open data. We produce MRT files that have updates and periodic dumps of the states of the routing tables that we collect. And we also have a livestream, like we call it RIS Live. BGPalerter is using that.
.
So one thing I want to bring to your attention is that ‑‑ yes, so think about our role in the sort of the Internet ecosystem, and I think it's pretty important role, of course, I like the project.
.
I think public routing collection is important for the health of the Internet. So actually you can see outside of your network what's going on and actually there are of course other big route collector projects out there. And that's useful for having a diversity of governance of these projects, diverse in technology so if one goes down, the others are left. It's robust against failures, outages, funding security issue, just name them. And to that extent, we have started collaborating with our projects, specifically route abuse, we do tech transfer talk about issues we're dealing with, and another thing that I think is pretty important is that we are trying to synchronise on types of metadata that we are collecting, so we are collecting a vast lake of data, and you want insight from that fast lake and sometimes that's a bit hard. So, having some pointers there into what data we have, what peers we have, what data we are collecting from them, what's special about peers and stuff like that. We're trying to document that there is a living document, I'd love to get some more feedback on that, because I think it will make the project better.
.
So, as the operational status, it's more of the usual.
.
We are more peers, more data.
.
Actually, the number of peers didn't grow that much since 2020, but the amount of data we have collected is actually quite significantly growing.
.
And we have only one extra RRC. The RRCs are the route collectors we actually put, and mostly we put them at exchanges but we also have a couple of ones that do BGP multihop, so it doesn't matter where the peers are for them.
.
We had one multihop in Amsterdam and now we have two. The first one was considered full, to the extent that it collected ‑‑ there was such a chunk of data that we had to produce all of the same time that's better to have an extra one. We are doing things slightly different with that one. It's a virtual machine and we are asking peers to provide some of the metadata that we are after, so the geographical location, because most of the multihops, we don't know where the physical router is so we don't know where we are collecting geographically and what the feed type we're getting, is it full or not.
.
And actually, we have a one more route collector planned in Dubai, because if you see a picture here, you can actually see that the Middle East, we don't collect data from there. So ‑‑ and we have an agreement with the host in Dubai to get peers from various countries across the region. So that would add to our diversity, so that's a pretty good development, I think.
.
Well, of course, there is a little bit of a downside of these expansions is that we have seen additional delays, and we have received some complaints about data delivery, too much of a delay. So we started measuring that and, for instance, for the last month, we ‑‑ 30% of data had a delay of over an hour or more.
.
We can explain that by more noise, bigger tables, more peers. But we would like to be more timely. So ‑‑ of course, we are going to improve the back‑ends, but another way of providing more timely data is to not chunk up peers into RFC‑based collections, but we'll still keeping that going, but we also plan a prototype of per peer MRT files. So you'll have, in addition to the normal files, a lot more files that can be produced much quicker, especially if one or two peers are really causing delays in data we produce.
.
And we have become a little more restrictive in accepting peers.
.
So that's one thing that we are also doing different. We have a new role in the team, the peering coordinator role, and ‑‑ what we used to do is we have passive inputs, we had a form and we would just configure whatever was ‑‑ we got in. We are now discouraging multiple sessions. We have seen not a lot, but ‑‑ very redundant tables being sent to us, and that just adds to delays and adds a time to ‑‑ decreases time to insight.
.
We encourage diversity in the feeds that we're getting, so if we have a feed from somewhere that we haven't collected from there before, that's of course preferred, or ‑‑ we will prefer that.
.
And sometimes we're not asking for a full feed. We still have the contact, so if at a later point a full feed would be useful, we can still change that, of course.
.
One thing that is definitely different is that the peering coordinator now also actively goes out to seek out high value ‑‑ like, diverse peers, and as a first step what we're doing is trying to get all the tier 1s feeding data into RIS.
.
And for the future, we are trying to find methods to systematically go for finding new high value peers. That's a topic I am particularly interested in, so ‑‑ but what we would like most of all is input from people who care about this project, what we should do. If you want to have a chat with not just me, but also the rest of the team, the peering coordinators, we have a room in SpacialChat that will open up after this session, because we all want to hear the other presentations too.
.
We're interested in feedback in use cases and there are some topics that there we are particularly interested in, and with that, I hope I stayed within my ten minutes, and I'd open up for questions or hope to see you in the RIS jam room after this.

JOB SNIJDERS: You are perfectly on time. So there is time for some questions. I have one question myself.
.
What is a tier 1 provider?
.
(Laughter)

EMILE ABEN: That's a good question. We basically use the list because ‑‑ so if you change that list, we also go after you. Thanks, and hope to see everybody.

JOB SNIJDERS: Thank you for your time and for your updates. Next up we have Nathalie Trenaman about RPKI, which is either weirdly problematic keying infrastructure or some other approbation. Nathalie, the floor is yours.

NATHALIE TRENAMAN: Hello. I am visible and I am sharing.
.
That's better. Hi all. My name is Nathalie Trenaman and I am the routing security programme manager at the RIPE NCC. In this presentation, I will give you an update on what we have been working on to improve RPKI and what is on our roadmap.
.
I have ten minutes and a lot of things to share. And as you might know, RPKI is booming and the RIPE NCC operates one of the five trust anchors for RPKI.
.
This means that we really have to step up our game to ensure that the certificate authority that we run, that it's stable, resilient and, of course, secure. And in the last year, we have made a lot of progress in different areas, and we call the overall project the RPKI Resiliency Project.
.
It consists of the following areas. You can see that on the slide, and in my presentation I will cover each area in a little bit of detail.
.
So, starting off, as RIPE NCC, we receive more and more requests for compliance. This has to do with larger companies, global companies start using RPKI, and they want to have answers to certain questions.
.
That's why we started last year to develop an RPKI audited framework because we looked at all the different audit frameworks out there and none of them was really applicable or a right fit for RPKI. So we teemed up with a company called British Standards Institution, who develops standards, as they do, and worked together to find the best fit for an RPKI audit framework.
.
So we we have chosen to go for a SOC 2 Type II audit framework because that allows tailoring to a certain extent, and we tailored that to match the key elements of RPKI and it allows us also ‑‑ this was very important to us ‑‑ to publish a SOC 3 report with the findings of such an audit.
.
This, by doing this, this audit framework, by developing this, that also allows us to share that framework with other parties that want to have extensive audit done for their RPKI framework ‑‑ for the RPKI infrastructure and procedures, etc., etc.
.
Now we're in the phase that we have finished the audit framework and that BSI performed a gap analysis. So basically they identified the 179 controls of things we should do, and that's a long list ‑‑ and at the moment, they identify that we're not complying to 49 of those controls. That sounds kind of dramatic, but when we went internally over the list we see that around for 50%, we have the controls in place; we just have to provide them. So think of log files, stuff like that.
.
So, around I would say, 25 are actually missing, and that means that we have to ‑‑ yeah ‑‑ work on those. That's what we're going to do the rest of the year.
.
Now, the thing is, with SOC 2 Type II, which is interesting, is, first, you have to have all the controls in place, so all the 179. Then you have to wait for six months before you can have the actual audit carried out. I never heard of that, but apparently that's normal. So that gives me also the time to find a third party, because, of course, we are not going to have BSI perform the actual audit so that has to be another party, of course.
.
Moving on. Another assessment that we needed was to know how far we comply with RPKI in relation to ‑‑ an crypto in regards to the RFCs. Over the years, a lot of RPKI RFCs were published. Some relate to the RPKI core, some to the publication servers. So did we interpret it, over the last decade, all the RFCs correctly? Good question.
.
So, we asked Radically Open Security to perform that assessment. That was done last August and September. They published ‑‑ they gave us the report in October. Then we started working. There were 35 findings of where we either did not implement the RFC to the letter, or we work with a draft before it was an RFC. So anyway, 35 findings resulted in 35 actions, and we decided to be transparent about this. So we published two documents on a new part of the website for transparency: That's the original ROS Report and the report that we wrote as a response to that report from Radically Open Security.
.
So if you are interested in that, you can find it. Let me know what you think.
.
Then, an entirely different beast is the Certification Practice Statement. When you run a certificate authority, it's mandatory to have such a certification practice statement. It must be visible on your website and it must be accurate.
.
Now, last year, we started to completely rewrite the CPS because it hadn't been updated in a decade, so it was high time that we did a complete rewrite of that. It's a long document, and it's very detailed, and very procedural. So, it was a lot ‑‑ it took a lot of time to completely rewrite it following an RFC, because there is an RFC for that, obviously.
.
So, we had it reviewed, what we wrote was reviewed by APNIC, and then we published it in January as RIPE 751. Quite happy that was done. But of course we're still working very actively in updating RPKI. There is still RFCs being published, so we plan to review the CPS every two years. Next time, I will ask the Routing Working Group to get involved in the review, because I think it is a good practice to do this in a more transparent way and ask different eyes to look at it.
.
So be prepared, in two years' time I will come to the mailing list and say we have updated ‑‑ we plan to update the CPS, these are our proposed changes, would you care to review this? And I hope the Working Group will help us with that.
.
Than, a fun fact:
.
Exactly today two years ago, we enabled RPKI route original validation on the RIPE meeting network, and as you may know, last month, we, as AS3333, started dropping RPKI invalid BGP routes. The main reason we wanted to do this is because we want to either on dog food/haute cuisine. We saw the price on the routing mailing list and we were very happy with the support you expressed there. So, first and foremost, a big thank you for your support.
.
Next, we checked who would be impacted by dropping these routes. And we reached out to 215 organisations associated with these 597 routes. Quite a big undertaking, and a big thank you for the Registration Services team who did all that work.
.
I was very curious to hear what would come back out of that, if we would get a lot of concerned members or angry e‑mails. Nothing of all that. We had a handful of questions from members, mostly asking us how to create ROAs, so, yeah, we did a lot of answering questions there and helping people out. I still have some calls scheduled after the RIPE meeting to talk people through this process. It's quite simple when you see it but some people would appreciate a little bit of help.
.
Another thing we decided was not to build a back door to my.ripe.net. So if people had a typo, for example, in the ROA with the wrong AS number or something, then we would drop their route, effectively that means that they can't access my.ripe.net any more, or dub‑dub‑dub.ripe.net.
.
The thing is, we decided not to build a back door because that inflicts all kinds of other mess, we would not know who we allow entry through that back door, So ‑‑ but we had to do something, of course.
.
So, we added a warning for LIRs and end users, so PI holders, that are about to lock themselves out of the LIR portal, so, for example, if they are about to make a typo, and we match with what we see in BGP and we see, oh, this might cause problems, they get a warning, and it's quite a clear warning, we believe, people actually have to type something there to confirm that they understand what happened.
.
Credits where credits are due. I want to thank the ops team here, because the registrations team e‑mailed everybody, but in the end, the ops team adjusted the configuration, and Menno and Marco in particular, who did the work on the technical side.
.
Then another bit of news.
.
Different kind of news:
.
So, in RPKI, there are two protocols that you can use to fetch the information from the repository. RRDB and rsync. In April, we received a report from a user that there was a data autonomy issue that resulted in a non‑repeatable read. My colleague sent extensive e‑mail to the Routing Working Group, and another one around 45 minutes ago, explaining the issue and what we are going to do to tackle this.
.
So we worked really hard the last few weeks to fix this issue. We made it our number one priority, obviously. But fixing this wasn't trivial. We had to improve the process in the RPKI core so we had to update the whole three in one go and we also have implemented as a second solution, krill‑sync, which is from NLnet Labs, which generate our sync files from RRDP. We have tested this internally, we are positive these approaches solve the problem so we have now basically provided a public testbed, and we would really like your help to test things before we go to production.
.
So, the team sent the e‑mail to the Working Group list. Please, please reach out to the team if you want to help us with testing. That would really help us.
.
Another project we have been busy with, and you heard quite a lot about it this week already, is the redundancy of the publication service. At the moment, the rsync repository is hosted in‑house and RRDP is in the AWS Cloud. This has been the case already for the last four years at least, maybe five.
.
The goal of this whole exercise is to have a very, very high uptime, so availability is key here, because this is simply critical infrastructure. I mean, this is ‑‑ the repositories is what people consume, so it has to be available.
.
The first step that we needed to take to do anything was to make sure that we built somewhere that could take a publication on multiple places, whether that be on Cloud or in‑house or whatever, but we want to be able to have multiple publications servers that can be accessible and have the same data at the same time. That's not easy.
.
So we worked quite hard on that in the last few months as well.
.
This is now done. We still have to decide on the fallback scenario, if that would be in‑house or in a second Cloud provider. Felipe wrote a Labs article, also in the Services Working Group there were some presentations. We really need your feedback here. So, please reach out, tell us what you think. I am very curious to hear.
.
Now, in the last RIPE meeting, I announced that we are going to deprecate the RIPE NCC RPKI validator, and, since then, I reached out to ‑‑ I tried to reach out to all known users where I could possibly find an e‑mail address, and ask them to migrate away and inform them about our plans.
.
Now, we see a trend line going down, but it's not going that fast, not as fast as I hoped. So, we still see around 3,200 active clients. Please, please help us spread the word because, as somebody said, friends don't let friends run sunsetted, deprecated software. So please, please spread the word. If you know these people, tell them.
.
On our website, we have alternatives. You don't have to stick with the RIPE NCC RPKI validator.
.
So, we are stopping on the 1st July. That's a hard date. Dramatic pause!
.
Almost done.
.
Coming up soon:
.
We will have, by popular demand, the RPKI roadmap on the website. So, what are our priorities? What are we working on? What can you expect?
.
Also by popular demand, a status page. Now, this will not be a status page just for RPKI, it will also include other services, so bear with us but it is a work in progress.
.
And, of course, I want to, also for compliance but also for reporting back, not only to this Working Group but to all users, I want to publish quarterly RPKI health and progress reports. I think it's important to tell you where we are, and not just every six months here in the Routing Working Group.
.
Plus, we're working on a more streamlined way to incorporate community feedback, because, now, I get a lot of e‑mails to myself or on IRC or wherever: Nathalie, can you do X, Y, Z? And it's not very transparent. It's not very clear what ‑‑ which questions we get, which feedback we should do first, look at. So we're looking more in a way to have that streamlined and transparent.
.
All right. One more minute.
.
The RPKI roadmap. So this is how the roadmap looks for me for the team, and our plans, basically.
.
So in the first line you see the operational and ‑‑ procedural and operational compliance, this is the BSI stuff that I was talking about. Closing the gaps, selected third party audit party that can do that, and then, six months later, after we close the gaps, the audit execution.
.
And then you will see that this is an interval thing, so we just did the RFC security compliance, we implemented the findings from the ROS Report, second week of June we start a Pen Test, then we will implement the finding. Then, hopefully before the end of the year, do a Red Team Test, then again implement those findings, and then in June again next year, another Pen Test. So it's a continuous cycle of security assessments because I think we need the habit of that and get used to that.
.
Then, RIPE RPKI route origin validation, it's done now. It works. Happy news. Done!
.
On the technical infrastructure side, the rsync fixes. As you can see, it's almost done. I'm quite happy with that. Then, for the Cloud deployment stuff, multiple publications in test, that where we are now, hopefully to production soon because then we can continue with the rest. Looking at multiple availability zones, multiple regions, fallback in‑house, second Cloud provider, everything there. Hopefully before the end of this year.
.
Something else that comes back every few years, of course, is the new HSM. So we have got hardware that we need to replace, and this is scheduled basically for most of Q2, so starting soon, and then we already did quite some pre‑work for that, but we still have to continue that. It got halted a little bit because we had the full team on the rsync problem solving.
.
Right. That's it. I am looking forward to any questions.

JOB SNIJDERS: There are questions.
.
Thank you for this update. This was highly informative. And I would like to join your call to decommission and turn off RIPE validators. RIPE validators can be crashed remotely and be used in a Botnet orchestration, so, people, turn off these pieces of software, they are dangerous.
.
Paul: Could you guide us through four questions in the queue and four questions also means that the queue has now been cut to allow time for the other speakers.

PAUL HOOGSTEDER: First question from a certain Mr. Erik Bais:

"Hi Nathalie. Did the RIPE NCC RPKI team set up an RPKI client testbed to watch the rsync testbed and what was the result of that? And if not, since the problems are with the RPKI client, why isn't it tested? I would assume that an RPKI validator client testbed with the various software would be helpful?"

NATHALIE TRENAMAN: And I completely agree, and of course that's an obvious yes to your first question, did the RIPE NCC RPKI team set up an RPKI client testbed towards the rsync testbed. So the answer is yes, very recently. So we don't have much data yet to share, because it's quite recent, but, yeah, we have RPKI compliant running there as a test. I hope that answers your question.

PAUL HOOGSTEDER: Next question from Benno Overeinder at NLnet Labs:

"Thanks for sharing the roadmap plans. Are there also plans for a publication server for RIPE members to publish ROAs, a service like APNIC is offereing?"

NATHALIE TRENAMAN: Yeah, I know what you mean. So that's publication as a service, as you will, Because that's what we see with the delegated repositories, that operators find it quite hard because this critical infrastructure, they ask, basically, other parties to ‑‑ since they are going so resilient, can you also ‑‑ can I piggy‑back on that? Yeah, it was on our roadmap, then it dropped off the roadmap because we had to change the priorities with ‑‑ because we first have to be completely resilient ourselves. So, it got postponed, and hopefully we can look at that again next year.

PAUL HOOGSTEDER: Third question. AJ: "Hi Nathalie. Now, that RPKI is a critical infrastructure not only for LIRs, how can non‑members report and ask questions regarding problems or inconsistencies with anchor or distribution?"

NATHALIE TRENAMAN: Good question, thanks, AJ. Well rpki [at] ripe [dot] net is not just for members; anybody can e‑mail ‑‑ use that e‑mail address, it goes to the team and to me. So, rpki [at] ripe [dot] net is where you can go with that. I hope that helps.

PAUL HOOGSTEDER: And the final question from Rudiger Volk:
.
"Is there a publication realtime of the routes rejected by AS3333 or is it planned?"

NATHALIE TRENAMAN: That's a nice one. Thanks, Rudiger. No, at the moment we don't publish that, and I see your point for realtime because that changes, of course. Yeah, I'll take that back to the ops team and see if we can do that, if that's even possible and how and where. Thanks.
.
Anything else?

PAUL HOOGSTEDER: No, no more questions.

JOB SNIJDERS: Thank you so much, Nathalie. Next up we have Massimo Candela from NTT Limited, who will give an update on NTT's RPKI deployment and what the current status is. Massimo, the floor is yours!

MASSIMO CANDELA: Thank you very much. My name is Massimo Candela. I work at NTT. This is going to be a quick update about our RPKI deployment but mostly it is to introduce to you what we did in terms of automation and monitoring to ease our RPKI operation, and I hope this can also be helpful for you.
.
So, you may be aware NTT adopted RPKI but also we are doing monitoring of BGP and RPKI. For doing this monitoring we are using a tool called BGPalerter, which is open source, and it was also cited in a presentation. So, this ‑‑ why we are doing this monitoring? Well, essentially because we want to be informed soon if we do any other. And let's say, in my opinion, the main, let's say, root cause for a lot of operational problems while dealing with RPKI, are related to the fact that RPKI and BGP are supposed to work in sync but they are two completely different planes and so you have to put some effort in keeping those in sync.
.
Just quick examples. You may create a ROA, and after you start announcing a prefix, but some RPKI repositories, they have a publication time, you have to consider, and then maybe, in the meanwhile, you are announcing something that is not valid, or another thing could be you create this ROA, announce the correct prefix and six months later one of your colleagues starts announcing a more specific and the prefix length conflicts with the max prefix length that you specified in the ROA, and again, you are announcing something invalid.
.
So basically you have to put some effort to keep everything under control between these two planes, and I see that two factors influence this:
.
One is a lack of automation, which definitely increases the risk of committing or having incidents because you have to put human effort to keep everything under control while you could automate that.
.
And a lack of monitoring can increase the impact of the incidents, like you don't get notified timely, maybe your customer will let you know that there is a problem.
.
So, we put ‑‑ a lot of effort was put on our goal to do something about it. And we created a system, a centralised system, where we do a lot of our operations about our prefixes, which you can see here, the main user interface. Basically, we have a list of our prefixes and there is a quick overview of the health status if they are visible, if they are RPKI valid, if they are monitored, and we can click on one of these, we are able to add the various parameters, but, now, the only thing we are interested in this presentation is the RPKI part.
.
So, what ‑‑ this panel allows you to basically manage the ROA. So, what you see at the bottom is what are the ROAs that they are affecting the selected prefix. You see that they have a status, which is in this case stage and stable, we will see what that means. But for now focusing only on the fact that there is operation of the scene. On the left, there is current status and on the ‑‑ on the left ‑‑ yeah, and on the right, the future status. The current status is essentially a normal RPKI validation, done with public RPKI repositories; it tells you what the prefix is currently.
.
The future status, instead, takes into account all the ROAs that there are in this list below, and in these ROAs they don't really exist yet on the public repositories so basically it can give you an idea of what it will be once you push these ROAs out.
.
So essentially, this is the core part of what we did. We envisioned this approach based on four stages for statuses for the ROA.
.
When you create one, it is marked immediately as staged, but this is not ‑‑ I say "create", but it's just an entry for now in our database, just something that we can only see.
.
And what we do is we merge this with the public ROAs, so we merge the public and staged ROAs and we calculate the future status based on this merger and if what is currently announced or that's supposed to be announced because we can also set that, is RPKI valid? Only in that case we basically are able to take all the entire ROAs that you saw in the list and commit all at once practically as a single operation.
.
Only in that moment, we reach the next status, the ROA are committed, which means that basically, in that case, we are ‑‑ they can be sent to the public repositories or whatever you want. And what we do is that the ROAs will keep staying in this committed state as long as we are not able to see them from public repositories, so only in that case we mark them as public and we know that, for now, BGP and RPKI are in sync, at least for now. So we start monitoring about both BGP and RPKI. We will keep this monitoring forever because, of course we want to monitor not only that what we need at the RPKI is correct, but also nobody in the future changes anything at the BGP level that conflicts with what we did in the RPKI one.
.
But anyway, we have a fourth status, which is stable, that happens after the 24 hours without incident. And this one is mostly to basically prop‑up the operation and consider it done for now and, for example, close tickets automatically.
.
So now, we implemented this, and this is the part I think more interesting for our wider audience, we implemented this by using mostly open source software, BGPalerter, which you can find on GitHub, and RPKI client, which also you can find on GitHub.
.
So most of the logic, it is implemented already in BGPalerter. It is a software that is monitoring of both BGP and RPKI. When we released this, our main goal was to make it ease to use so that more people ‑‑ more companies will start to monitor their operations. It has auto configuration. There is no installation required. It's basically a binary that you run if you want. And there is no data collection required. This is thanks to amazing services like the RIPE RIS that you saw in the presentation of Emile, which is a clear example of a community effort that few other organisations, except the RIPE NCC, would have been able to make it up.
.
Now, so it does monitor BGP. You get visibility, but we care only here about the RPKI. You will get notified if you are announcing an RPKI invalid. Prefix is not covered by ROA, or if there is a malfunction and your ROA disappears, all or any of these ‑‑ any added ‑‑ your ROAs that is covering your prefixes may be affected.

Additionally, there is an option to stage in‑test ROAs before you publish them. So you can create fake rates and release the documentation. You can find online how to do that.
.
And this is ‑‑ you can embed it in whatever you like, reverse whatever, this is also with chats and this is the first two messages are a DIFF that you can get if any of your prefixes change, or the last one is RPKI invalid. Just as an example.
.
So, my presentation is over. Just a quick update. So we are working on also other things for expiring ROAs. Thanks also to Job for the support provided first for the RPKI client. There is a new monitoring for downsteam and upstream peers, so basically you will get a ‑ if a new upstream and downstream peer appears that was not supposed to be there. There is more work on improving ROA staging or creating a pool, a REST API, so you can basically, instead of doing push, you pull for the others, and much more. It is going to be a big relief.
.
So my presentation is over. And time for the questions, and also I leave here my e‑mail and my Twitter account in case you want to contact me later.

JOB SNIJDERS: Thank you so much for up this update, Massimo. I am going to hand over the microphone to Paul for the questions.

PAUL HOOGSTEDER: We have got a question from Mr. Randy Bush:
.
"Massimo, very cool. Are the graphs of the rates of errors or discrepancies?"

MASSIMO CANDELA: Of ‑‑ okay, I get what you mean. If I can believe numbers of ‑‑ because at the moment we are using this system for internal operations and I don't have, at the moment, numbers to give you like how many incidents happen, and it's something that, for sure, I will have to ask authorisation to release, but, no, there is no public graphs for now. I hope I answered the question.

PAUL HOOGSTEDER: I think so. Next question from Mike Booth, Liberty Global:
.
"Where does BGPalerter gets its RPKI data? I understand is it RIS or any other sources?"

MASSIMO CANDELA: It gets it from wherever you want essentially, because at the moment, there is a concept of VRP provider which are some common ‑‑ like, there is like the CloudFlare one, the RPKI client one of Job, there are various, and they are public, so you can select any of that, but that means that you are basically delegating somebody else to provide you the file. What you can do is also do your own validation in‑house and feed the VRP file that you will like, for the RPKI data.

PAUL HOOGSTEDER: Thanks. No more questions.

MASSIMO CANDELA: Thank you very much.

JOB SNIJDERS: Thank you so much, Massimo. That brings us to our next presentation, Alexander Zubkov will share about routing loops.

ALEXANDER ZUBKOV: Hello everybody. Today, I want to talk about routing loops, and you may know this is a situation when some packet routes infinitely in loop among some routers, and we can have such a situation during the protocols convergence, and for BGP in Internet it can take minutes sometimes. And another option, it can be caused by stack routes in the routing protocols.
.
And also, it can be caused by configuration errors in such loops persistent and until the configuration error is fixed, and the easiest way and I think the most popular one to get a situation is with unused IP space or NAT pools.
.
For example, if some provider assigns some space to its client and the client uses only part of that space, so remaining addresses is routed back by default route, for example, back to the provider, so we get a loop here. And the situation is easy to fix on the client side, if you know the route of the addresses used from the provider so that any unused addresses are dropped and are not looped.
.
Also, if a provider can implement some anti‑spoofing policy on the client's interface, in that case it can break the route loop from misconfigured client also.
.
So, what's the problems with loops? It means, for example, my former Internet provider called it a cosmetic issue. So the main, biggest problem is put the possibility of a link utilisation, for example, if we have a packet with TTL, that a loop 2 hops, we have 100 times amplification. It's easy to understand.
.
And so such link can be a target of DOS attack, or, if it's linked to the provider, you can get an increased DOS attack.
.
Also, the user, for example, some very bad network loops can be used as a DOS means to other networks, or some guy described using the route loops for inferring the variability of spoofing from ‑‑ ‑‑
.
Here, I picked some articles describing such problems. For example, I put quotation here, guys found some infinity loops in the Internet and some of them lasted for six days.
.
I also made a simple setup of three servers and switches to test the problems caused by loops, the loops of switches and here are the results.
.
I have two yellow lines with the same flood rate, when the flood is directed to the SOA (?) and the other is direct to the loop. As you can see, the first situation, the server wasn't affected at all, and when I flood the loop with slightly more than 400 megabits, I situated 40‑gig channel and I got five times slowdown.
.
And the interesting result is, it's green. When I tried to saturate the loop with small packets and I got no slowdown, and I think it's because how organised the switches may be some fair scheduling among ports or something like that. But anyway, we have problems with large packets, and if you have, for example, slowdown or route, the other bottlenecks and switches, and so they may be affected too and also they may be affected more by large packets.
.
At Qrator Labs, we have a project called Qrator Radar. You can register there. We collect different problems with BGP or Internet routing there. You can register to see what's happening in your aut‑num system. They provided me with historical data over these years of loops observed in our system, and I see a downward trend here, I hope this is not the migration to IPv6. Here, around 22 million loops. But I made my own research also for this presentation. I scanned the Internet and I have got 28 million unique replies, and this is around 1% of all active IPv4 space.
.
But there are more loops because not all routers and are all the supplies and there are many lost requests. And 1% is not so much, but the IP addresses are located in 25,000 autonomous systems, which is 35% of all active space. So, every third AS contains a loop destination.
.
And loops can be found also in autonomous systems of guys who I think should take the biggest care about connectivity and stability, and not small, but big names are there too.
.
And the fun fact is that I sent 4 probes for each AP but in reply I got more than 4 replies on average, there are some amplificators there, and in some cases I have got more than 100,000 replies.
.
I also tried to come to unique loops, it turned out to be a hard problem, but I estimated that at least there are hundreds of thousands of them. And there are more than half a million of route IPs involved in those loops that was allocated in 20,000 autonomous systems.
.
I found loops which ranged from 1 to 34 hops, and 2 hops is the most popular. And there are at least half of such loops, and there are loops that spend 7 AS systems, so 8 countries, for example, and the longest loop in the terms of time, I found, is up to 18 seconds.
.
Here are statistics of the destination by country. The United States is absolutely here with more than 6 million of loop destinations. And here are statistics in Europe only, Germany and Russia are ‑‑
.
Here, you can see statistics by autonomous system that two leaders here is some national Internet in India and the second is Lumen, but Lumen has several autonomous systems, and if you sum them up, they will have more than ‑‑ more than 600,000 of flood destinations.
.
And I also want to show you some of the interesting loops I found when looking at my data.
.
For example, you can see there are ‑‑ there is a loop of 2 hops, but one router is too diligent and replies every time. Or, for example, you can see one router IP is answering but if you look at round‑trip time, you can see a clear 2 hops button there. There are also loops when you have loop of 70 hops but only 2 hops replying. And these all are repeating patterns, of course.
.
For example, here, you can see it because it's there are loops of 34 hops and I have an example of 34 hops loop which is located in one ‑‑
.
And there are also some strange loops I called in this loop a flat loop, because it looks like a packet is going back and forth. And you will note I found such loop, for example, I completely don't know what's happening there, and I think someone enslaved a network administrator and he hid such message to ask for help.
.
So, to ease the pain from previous slides, I wanted to show you some fun traceroutes with artificially‑created names. But they are all now with the exception, there was some traceroute where you could see Christmas tree was some rhymes, but you would not see Christmas tree now, but there is a loop now instead of it.
.
So thank you very much for your time and I hope you don't do loops in your network.

JOB SNIJDERS: Thank you so much for this very informative presentation. The number of loops was much higher than I anticipated. So thank you for doing this investigation. I am going to switch the microphone to Paul, as there are some questions.

PAUL HOOGSTEDER: Yes. Mr. Randy Bush:

"Nice study, so it is bad, but how to monitor and how to ameliorate?"

ALEXANDER ZUBKOV: For example, we monitor at Qrator those loops. I mean ‑‑ if you mean your ‑‑ in your autonomous system is a concept you can register and look at least with what we can see to your autonomous system there. And what's "ameliorate"? I don't know that word. I am sorry.

JOB SNIJDERS: Ameliorate means how to prevent or how can we solve the loops. Randy Bush likes to use big words sometimes.

ALEXANDER ZUBKOV: Loops caused, it's hard to solve, but they are temporary, so we can live with them somehow, but other loops, like routing for non‑used IP space is very helpful, and BGP 38, of course, also.

JOB SNIJDERS: I see a question in the audio queue. Rudiger.

RUDIGER VOLK: Thank you. I wonder whether anybody is hearing me?

JOB SNIJDERS: We can hear you loud and clear.

RUDIGER VOLK: It seems not to be the case. So how do I get this done?

JOB SNIJDERS: We can hear you, Rudiger. Okay. We will try again.

RUDIGER VOLK: Okay. Let's see whether it works now. Well, okay, first for Randy's question, how do get knowledge? I certainly can recommend using the Qrator ‑‑ rather, the scanning results, they are collecting is only available to people who in some way are authenticated (iate err) as being in charge of the ASes. So, the bad guys are not getting in easy ways the information where to attack.
.
Looking at some of these stats that Alexander has provided, I suspect very much that at least for one of the German ASs for which I have only been in charge in ‑‑ no, I never was in charge of that ‑‑ I very much suspect a lot of the loops are due to use of default routes in the inter‑domain case. Quite obviously when you use a default routes, you are making the assumption that everything will be well defined and quite obviously in many cases, there are gaps which invite the loops. Do we have any indication of what the typical causes for the loops are aside from the default case that I mentioned?

ALEXANDER ZUBKOV: Unfortunately, I did not collect such statistics. I think maybe in one of the papers I read in references, they did some work like that. But I don't know. But I do not see at least other ways how to do it.

JOB SNIJDERS: It is challenging to figure this out. I see a few more short questions. Paul, do you want to go over them?

PAUL HOOGSTEDER: Yes, before we close the queue after this.
.
First one from Jean‑Daniel at FranceIX:

"Have you got stats about internal 1 AIS versus transferred multiple AS involved loops?"

ALEXANDER ZUBKOV: No, I don't collect that feed. And also, I think it's, like, not very good statistics, because on peer links you can see addresses from one AS but actually it would be a loop of between different ASs, but I don't have such statistics.

PAUL HOOGSTEDER: Okay. Next question, short one from Dmitry:

"Can you share the Christmas tree AS number?" He would like to test it next year.

ALEXANDER ZUBKOV: You can see the DNS name of the destination on my last slide. And you can verify the loop.

PAUL HOOGSTEDER: Last question from Lars Prehn:

"How do we know if it's really routing loops rather than just load balancings at complex infrastructure? I mean, traceroutes are rather well known to sometimes report nonexistent paths to relying on various packets."

ALEXANDER ZUBKOV: I'm not sure, like, clearly I understand. If you see the packets ‑‑ the packet traverses same route several times, it's clearly a loop. Maybe there are some weird situations when ‑‑ I saw some weird situation when several ‑‑ one IP repeats several times in loop, for example. In this weird loop you can see I highlighted some IPs ‑‑ IP will loop several times. I think it's maybe it's some ICMP with ‑‑ included in the MAC address or incoming route. So multiplied several ICMP routes. I saw also when some IP can repeat and loop several times in a row, then other IP goes. But all loops that I found, at least which I checked, and I verified it with the result, I clearly saw that the patterns is repeating. I don't know, maybe there are some like other problems there, but I received ‑‑ TTL exceeded for big TTL, so the packet did not reach the destination, so I don't know what else can it be.

PAUL HOOGSTEDER: Thank you.

ALEXANDER ZUBKOV: Thank you very much.

JOB SNIJDERS: Alexander can be reached at green [at] qrator [dot] net. The e‑mail address is in the lower left of the slide deck, I am mentioning this because of the chat, I see that there is a lot of enthusiasm ‑‑

ALEXANDER ZUBKOV: It's on the first slide too and we can talk in SpacialChat also.

JOB SNIJDERS: Fantastic. Alexander, thank you so much for your time. This brings us to the end of our Routing Working Group session. I would like to thank all the participants and especially the presenters for their excellent work, and see you all in six months, hopefully in the virtual, in the physical, we'll see what happens. Thank you for your time. Have a good day.
.
(Coffee break)